Smart Batching and Idle Strategy

Stop batching messages in your application. Aeron already does it for you — better, and without adding latency.

The core principle is one line: send immediately, don’t wait. Publish each message the moment it arrives. The C media driver coalesces what’s already in the buffer into fewer syscalls via sendmmsg(). You get the throughput benefit of batching without paying the latency tax of waiting to fill a batch.

The knob that actually matters for the latency-vs-CPU tradeoff is the idle strategy — not your batch size.

Smart batching vs application-level batching

Aeron’s smart batching is fundamentally different from application-level batching:

Don’t try to aggregate multiple messages to fit into a single MTU — there are diminishing returns.
The C media driver handles batching automatically at the transport level via sendmmsg().
Application code should just publish messages as they arrive.

Response time vs load: a typical system climbs steeply while Aeron's smart batching keeps the curve flat far longer

The transport naturally coalesces messages that arrive close together into fewer syscalls. No application logic required, and crucially, no added wait time.

Diminishing returns on batch size

Larger batches stop helping fast. The performance gain curve is classic diminishing returns.

Throughput gain vs batch size: each increase in batch size adds less than the last, flattening to near zero by batch 16–20

Going from batch=1 to batch=2 gives a big improvement.
Going from batch=8 to batch=16 gives almost nothing.

This is why hand-rolled batching loses. The throughput upside flattens quickly, but every message you hold back to grow the batch adds directly to p50 and p99. You pay a guaranteed latency cost for a marginal — eventually negligible — throughput gain.

The idle strategy is your real knob

The idle strategy controls how often a thread polls versus yields the CPU. That choice is your true latency-vs-CPU lever.

Strategy	Behavior	Use Case
BusySpin / NoOp	Never yields CPU, polls continuously	Lowest latency, highest CPU cost
Sleeping / Yield	Yields CPU between polls	Lower CPU usage, higher latency

How to read this against your latency budget:

BusySpin / NoOp drives p50 and p99 to their floor by never sleeping through an arriving message — at the cost of a fully pinned core. Reserve it for dedicated, isolated cores where the CPU is yours to burn.
Sleeping / Yield trades latency for headroom. The thread gives the CPU back between polls, so a message can land mid-sleep and wait — inflating both p50 and especially p99. Use it when CPU is shared or contended and your tail budget can absorb the jitter.

For the internals of how the media driver coalesces sends and how each idle strategy is implemented, see The Aeron Files.