Implementing DDSketch with Redis and a LUA Script for Fast Quantiles

Yev Spektor
February 8, 2024

Load testing, like any performance monitoring system, requires ingesting large streams of time series data, covering several concurrent metrics. You might be simultaneously tracking database query response times, authentication response times, and response times across multiple API endpoints, for example. 

For most load-testing metrics, quantiles paint the clearest pictures, with P95 and P99 being standard for almost all performance monitoring platforms.

You could average response times. But a reading of, say, 150ms doesn’t tell you much since it’s likely pulled down by a handful of outliers.

Minimum and maximum values would chart best/worst experiences, while divulging nothing about archetypal experiences.

If you know that 95% of your users get an application response in less than 50ms, you have some idea of both outlier and normative values.

The problem is that calculating exact quantiles for several different metrics across hundreds of thousands or more data points requires an unjustifiable amount of computing power. Worse, it adds delays that keep you further from real-time visibility into your users' experiences.

We set out to solve this problem in order to provide accurate and (near) real-time statistics/feedback. Our approach to this problem is based on two components:

  1. Fast quantile calculating using the DDSketch algorithm.
  2. Redis, with Lua scripts for data ingest, short-term storage, and computation

It feels novel, broadly applicable, fast, and precise enough to be worth sharing (but we’d also love to hear how you've approached the same challenge).

A Horizontally Scalable Architecture

The requirements for our load test metrics engine are:

  1. The data ingest and storage must be fast.
  2. The statistic computation must be incremental and mergeable (we don’t need to process the entire data set each time, and mathematically, the system can scale horizontally).
  3. The engine must be isolated and self-contained, so concurrent tests on our platform do not affect each other or drag down the system.

Combined, they ensure quick, accurate, and reliable results with virtually no limits on test size or parameters. To accomplish that, we created a fully mergeable, horizontally scalable system.

For the purposes of this post, we will refer to Redis instances running our aggregation LUA script as a "Metrics Redis." During tests, our Virtual Users (VUs) are sending data to one or more Metrics Redis instances. The number of Metrics Redis instances is determined by the number of VUs in a load test. As test sizes and ingest requirements grow, we just expand horizontally. If needed, we can even stack multiple layers of Metrics Redis instances feeding into each other, because mathematically, this algorithm is fully mergeable.

Eventually, an aggregator polls the Metrics Redis instances to pull in the data and merge all of the quantiles before sending them to durable storage.

Implementing DDSketch with Redis and LUA script

Quantile estimation algorithms have been around for a long time. Q-digest goes back about two decades and the P2 algorithm is nearly twice as old as that. Both were originally designed for limited memory and CPU resources and tend to be less accurate than newer sketch-based alternatives like GK, HDR, or t-digest.

Quantile “sketches” significantly reduce the footprint of the data with comparatively small impacts on accuracy and different algorithms take different approaches to error guarantees, range boundaries and merge operations. Adrian Colyer wrote a great article comparing the attributes, sketch sizes, and merge times of popular quantile algorithms.

Based on our requirements and research, DDSketch was the fastest fully mergeable algorithm. Choosing the algorithm was just the beginning, though. We still had to figure out how to best implement it.

We wanted to store the entire procedure on Redis so everything happens atomically and there’s minimal back and forth during ingest. That meant we’re using a LUA script. Our script runs directly on the Metrics Redis instances and executes the DDSketch calculations. In addition to calculating the quantiles, this script also handles keeping track of the count and total of each metric, so we can easily calculate the average and RPS. When a Virtual User (VU) publishes a metric, our ingest Redis function is called.

With DDSketch, the gamma value determines the accuracy of the quantiles. Lower values are more accurate, but take up more memory. A good starting point for gamma is between 1.02 and 1.05.

local metric_key = KEYS[1]
local time_key = KEYS[2]
local value = tonumber(ARGV[1])'sadd', 'metrics', metric_key)'incr', 'times:' .. time_key)'incrby', 'count:' .. metric_key .. ':' .. time_key, 1)'incrbyfloat', 'total:' .. metric_key .. ':' .. time_key, value)'incrby', 'count:' .. metric_key, 1)'incrbyfloat', 'total:' .. metric_key, value)

local quantile = math.floor(math.log(value) / math.log(QUANTILE_GAMMA))

-- quantile == quantile checks for NaN
if quantile == quantile then'hincrby', 'quantiles:' .. metric_key, quantile, 1)

An aggregator runs a repeating job that connects to each Metrics Redis, reads in the metric data and sketch, merges everything, and calculates our desired statistics. Our aggregator is a Node.js service written in TypeScript. The code below illustrates how to read and merge statistics and sketches from multiple Redis Instances, and how to calculate a quantile from a sketch.

// Read and sum a numeric metric from multiple Redis instances
export async function readMultiRedisNumericSum(
 redisClients: IORedis[],
 key: string,
): Promise<number> {
 const values = await Promise.all( (redis) => Number((await redis.get(key)) || '0')),

 return values.reduce((acc, cur) => acc + cur, 0);

// Read and merge quantile sketches from multiple Redis instances
export async function readMultiRedisQuantileSketch(
 redisClients: IORedis[],
 key: string,
): Promise<Record<string, number>> {
 const sketches = await Promise.all( => redis.hgetall(key)),

 return sketches
   .flatMap((obj) => Object.entries(obj))
   .reduce((sketch, [k, v]) => {
     sketch[k] = Number(v) + (sketch[k] || 0);
     return sketch;
   }, {} as Record<string, number>);

// Calculate the quantile from a sketch
export const calculateQuantileFromSketch = (
 sketch: Record<string, number>,
 quantile: number,
): number => {
 const sortedBuckets = Object.keys(sketch).sort((a, b) => +a - +b);
 const totalCount = Object.values(sketch).reduce((acc, cur) => acc + cur, 0);

 const thresholdCount = totalCount * quantile;
 let cumulativeCount = 0;

 for (const bucket of sortedBuckets) {
   cumulativeCount += sketch[bucket];
   if (cumulativeCount >= thresholdCount) {
     return QUANTILE_GAMMA ** +bucket;

 throw new Error('Quantile calculation failed');
High-level diagram of Multiple's metrics engine
High-level diagram of our metrics engine

Finally, we save the end results in durable storage and send a WebSocket event to update the results in real-time.

MongoDB Load Test Metrics
Metrics and quantiles during a MongoDB load test

This system provides a great experience for our users and will scale to meet their demands. It won’t be bogged down by more Virtual Users, longer tests, or more metrics.

We’d love for you to try our implementation and see everything in action. Write a new test with Multiple (AI working in the background makes it super easy) and watch the values start streaming in, making their way from a Metrics Redis, up to a JavaScript aggregator, and onto your test results page.

View All Blog Posts