Skip to content

Smart Batching and Idle Strategy

Stop batching messages in your application. Aeron already does it for you — better, and without adding latency.

The core principle is one line: 来了就发,不要等 (send immediately, don’t wait). Publish each message the moment it arrives. The C media driver coalesces what’s already in the buffer into fewer syscalls via sendmmsg(). You get the throughput benefit of batching without paying the latency tax of waiting to fill a batch.

The knob that actually matters for the latency-vs-CPU tradeoff is the idle strategy — not your batch size.

Smart batching vs application-level batching

Section titled “Smart batching vs application-level batching”

Aeron’s smart batching is fundamentally different from application-level batching:

  • Don’t try to aggregate multiple messages to fit into a single MTU — there are diminishing returns.
  • The C media driver handles batching automatically at the transport level via sendmmsg().
  • Application code should just publish messages as they arrive.

Response time vs load: a typical system climbs steeply while Aeron's smart batching keeps the curve flat far longer

The transport naturally coalesces messages that arrive close together into fewer syscalls. No application logic required, and crucially, no added wait time.

Larger batches stop helping fast. The performance gain curve is classic diminishing returns.

Throughput gain vs batch size: each increase in batch size adds less than the last, flattening to near zero by batch 16–20

  • Going from batch=1 to batch=2 gives a big improvement.
  • Going from batch=8 to batch=16 gives almost nothing.

This is why hand-rolled batching loses. The throughput upside flattens quickly, but every message you hold back to grow the batch adds directly to p50 and p99. You pay a guaranteed latency cost for a marginal — eventually negligible — throughput gain.

The idle strategy controls how often a thread polls versus yields the CPU. That choice is your true latency-vs-CPU lever.

StrategyBehaviorUse Case
BusySpin / NoOpNever yields CPU, polls continuouslyLowest latency, highest CPU cost
Sleeping / YieldYields CPU between pollsLower CPU usage, higher latency

How to read this against your latency budget:

  • BusySpin / NoOp drives p50 and p99 to their floor by never sleeping through an arriving message — at the cost of a fully pinned core. Reserve it for dedicated, isolated cores where the CPU is yours to burn.
  • Sleeping / Yield trades latency for headroom. The thread gives the CPU back between polls, so a message can land mid-sleep and wait — inflating both p50 and especially p99. Use it when CPU is shared or contended and your tail budget can absorb the jitter.

For the internals of how the media driver coalesces sends and how each idle strategy is implemented, see The Aeron Files.