Skip to content

Step-by-Step Tuning Methodology

Don’t tune everything at once. The fastest path to a well-tuned Aeron Transport is incremental: start from sensible defaults, add load, watch which metric degrades first, then turn the one knob that addresses it.

This page is the workflow. For the why behind each knob — the internals of windows, terms, and NAKs — defer to The Aeron Files.

  1. Start with sensible defaults.
  2. Validate your message size.
  3. Load test incrementally.
  4. Tune based on symptoms.
  5. Apply the L3 cache sizing rule.

Begin with stock settings. Resist the urge to pre-optimize.

  • 128K initial window size.
  • Default term buffer size.
  • Ensure OS / ENA driver send and receive buffers match.

A mismatch between Aeron’s buffers and the underlying OS or ENA driver buffers is a common source of silent throughput loss. Align them before you change anything else.

Keep each message smaller than the MTU (typically 1500 bytes).

You do not need application-level batching. Aeron Transport handles smart batching for you — adding your own batching layer on top usually hurts more than it helps.

Start with a small load and keep increasing it.

Monitor as you ramp until you see the first sign of stress:

  • End-to-end latency climbing.
  • p99 latency climbing.
  • NAKs (negative acknowledgments) appearing.

The metric that degrades first tells you which knob to reach for next. That’s the whole point of ramping slowly — it isolates the bottleneck.

Match the symptom to the action. Change one thing, then re-test.

SymptomAction
p50 latency increasesTune initial window size and send/recv buffers
p99 latency increasesTune NAK delay and term buffer size

Read this table as a diagnostic. A rising p50 points at steady-state flow control — the window and OS buffers. A rising p99 points at tail events — recovery behavior (NAK delay) and how much in-flight data a term buffer holds. Throughput follows: once p50 and p99 are stable under load, push the load higher and repeat.

This is the critical guardrail.

This ensures the active term plus other working-set data all fit in L3, avoiding DRAM spills on the hot path. DRAM spills are exactly what wrecks p99.

Worked example: if your L3 is 36 MB, keep term buffers ≤ 12 MB. The remaining cache leaves room for other hot data — connection state, application objects — to stay resident.

The methodology is deliberately incremental. Start with defaults, add load, observe which metric degrades first, then tune the corresponding parameter — and never violate the 1/3 L3 rule while you do it.

For deeper mechanics of any individual knob, see The Aeron Files.