How We Cut Pipeline Latency by 85% With Adaptive Buffering

The default in most stream processors is a fixed buffer: hold 256KB, then flush. It’s simple, predictable, and wrong at every workload that isn’t the one the default was tuned for.

“A constant in your pipeline is a guess about the future. The future doesn’t care.”

The diagnosis

Our p99 was sawtoothing between 80ms and 500ms. The cause turned out to be trivial in hindsight:

Under low traffic, the buffer never filled, so events waited the full flush timer before going out.
Under high traffic, the buffer overflowed and we queued in the kernel.
In the transition between regimes, both pathologies fought each other, which is where the sawtooth came from.

A fixed buffer is two bad regimes glued together.

The fix

The flush threshold is now a function of the rolling 5-second event rate. The controller has three modes:

Cold path (under 1K events/s): flush every 5ms regardless of fill. We prioritize freshness over batching here.
Warm path (1K–50K events/s): linear interpolation between time-based and size-based flush. The mode itself is the gradient.
Hot path (50K+ events/s): let it fill to 512KB before flushing. The syscall amortization actually helps at this rate.

We also added a “panic flush” — if the queue depth crosses 80% we drop the size threshold immediately. Better to do an undersized syscall than to spike a producer’s tail.

The graph

The p-tail came down hard:

p99: 240ms → 36ms
p99.9: 1.4s → 110ms
mean: barely moved (within margin of error)

That last bullet is the right shape — we didn’t get faster, we got less variable. Means are noisy. Tails tell the truth.

How We Cut Pipeline Latency by 85% With Adaptive Buffering

The diagnosis

The fix

The graph

More from the blog

Building Real-Time Data Pipelines at Scale

A Practical Guide to Vector Search at Production Scale

Designing Anomaly Detection That Engineers Actually Trust