Implementing DDSketch with Redis and a LUA Script for Fast Quantiles
Load testing, like any performance monitoring system, requires ingesting large streams of time series data, covering several concurrent metrics. You might be simultaneously tracking database query response times, authentication response times, and response times across multiple API endpoints, for example.Â
For most load-testing metrics, quantiles paint the clearest pictures, with P95 and P99 being standard for almost all performance monitoring platforms.
You could average response times. But a reading of, say, 150ms doesn’t tell you much since it’s likely pulled down by a handful of outliers.
Minimum and maximum values would chart best/worst experiences, while divulging nothing about archetypal experiences.
If you know that 95% of your users get an application response in less than 50ms, you have some idea of both outlier and normative values.
The problem is that calculating exact quantiles for several different metrics across hundreds of thousands or more data points requires an unjustifiable amount of computing power. Worse, it adds delays that keep you further from real-time visibility into your users' experiences.
We set out to solve this problem in order to provide accurate and (near) real-time statistics/feedback. Our approach to this problem is based on two components:
- Fast quantile calculating using the DDSketch algorithm.
- Redis, with Lua scripts for data ingest, short-term storage, and computation
It feels novel, broadly applicable, fast, and precise enough to be worth sharing (but we’d also love to hear how you've approached the same challenge).
A Horizontally Scalable Architecture
The requirements for our load test metrics engine are:
- The data ingest and storage must be fast.
- The statistic computation must be incremental and mergeable (we don’t need to process the entire data set each time, and mathematically, the system can scale horizontally).
- The engine must be isolated and self-contained, so concurrent tests on our platform do not affect each other or drag down the system.
Combined, they ensure quick, accurate, and reliable results with virtually no limits on test size or parameters. To accomplish that, we created a fully mergeable, horizontally scalable system.
For the purposes of this post, we will refer to Redis instances running our aggregation LUAÂ script as a "Metrics Redis." During tests, our Virtual Users (VUs) are sending data to one or more Metrics Redis instances. The number of Metrics Redis instances is determined by the number of VUs in a load test. As test sizes and ingest requirements grow, we just expand horizontally. If needed, we can even stack multiple layers of Metrics Redis instances feeding into each other, because mathematically, this algorithm is fully mergeable.
Eventually, an aggregator polls the Metrics Redis instances to pull in the data and merge all of the quantiles before sending them to durable storage.
Implementing DDSketch with Redis and LUA script
Quantile estimation algorithms have been around for a long time. Q-digest goes back about two decades and the P2 algorithm is nearly twice as old as that. Both were originally designed for limited memory and CPU resources and tend to be less accurate than newer sketch-based alternatives like GK, HDR, or t-digest.
Quantile “sketches” significantly reduce the footprint of the data with comparatively small impacts on accuracy and different algorithms take different approaches to error guarantees, range boundaries and merge operations. Adrian Colyer wrote a great article comparing the attributes, sketch sizes, and merge times of popular quantile algorithms.
Based on our requirements and research, DDSketch was the fastest fully mergeable algorithm. Choosing the algorithm was just the beginning, though. We still had to figure out how to best implement it.
We wanted to store the entire procedure on Redis so everything happens atomically and there’s minimal back and forth during ingest. That meant we’re using a LUA script. Our script runs directly on the Metrics Redis instances and executes the DDSketch calculations. In addition to calculating the quantiles, this script also handles keeping track of the count and total of each metric, so we can easily calculate the average and RPS. When a Virtual User (VU) publishes a metric, our ingest Redis function is called.
With DDSketch, the gamma value determines the accuracy of the quantiles. Lower values are more accurate, but take up more memory. A good starting point for gamma is between 1.02 and 1.05.
An aggregator runs a repeating job that connects to each Metrics Redis, reads in the metric data and sketch, merges everything, and calculates our desired statistics. Our aggregator is a Node.js service written in TypeScript. The code below illustrates how to read and merge statistics and sketches from multiple Redis Instances, and how to calculate a quantile from a sketch.
Finally, we save the end results in durable storage and send a WebSocket event to update the results in real-time.
This system provides a great experience for our users and will scale to meet their demands. It won’t be bogged down by more Virtual Users, longer tests, or more metrics.
We’d love for you to try our implementation and see everything in action. Write a new test with Multiple (AI working in the background makes it super easy) and watch the values start streaming in, making their way from a Metrics Redis, up to a JavaScript aggregator, and onto your test results page.