Skip to content

Parameter Reference

A reference for why each transport parameter affects throughput, p50, and p99 — and how to size it. Read this alongside the tuning overview.

ParameterThroughputp50 (median)p99 (tail)How to set it
Initial Window SizeLarger window keeps more data in-flight → better on high-BDP linksSlight negative if excessively large (more queuing/jitter); well-sized has minimal effectLarge windows worsen the tail under congestion; size close to BDP is optimalTune near BDP; don’t starve the pipe (too small) or create deep queues (too large)
Term Buffer SizeLarger terms allow more data per rotation → better for bursty trafficVery large terms hurt p50 via cache missesOversized terms that don’t fit in cache increase varianceSmallest term covering max message size + expected burst, while fitting in cache
OS Socket Buffers (SO_SNDBUF/SO_RCVBUF)A few MB matches BDP and prevents dropsOversized buffers add queuing; 2–4 MB keeps p50 acceptableVery large = deep queues; too small = drops + retransmitsStart at 2–4 MB; adjust on observed loss/queuing
NIC Ring Buffers (e.g. AWS ENA)Matching ring sizes to bandwidth enables high throughputCorrectly sized keeps p50 low; excessively deep adds µs–ms queuingLarge queues amplify tail under bursts; undersized drop packetsTune together with OS socket buffers to avoid double-buffered queue bloat
ParameterThroughputp50 (median)p99 (tail)How to set it
NAK DelayLonger delay reduces retransmission traffic; marginally helps on lossy networksLittle impact unless loss is commonDirectly affects tail: larger delay = longer gap persistence; shorter = faster recovery but more duplicatesTrade off bandwidth overhead against gap-recovery speed

These move the tail the most — they attack variance at its source.

ParameterThroughputp50 (median)p99 (tail)How to set it
NUMA Locality (CPU near the NIC)Removes cross-socket traffic → higher achievable throughputCuts memory-access and PCIe traversal cost by several µsAvoids remote-NUMA traffic → significantly stabilizes the tailRun IRQs, driver threads, and app threads on cores local to the NIC’s NUMA node
Term Buffer Fits in L3 CacheCache-resident working sets reduce stallsReduces average access latencyMarkedly reduces variance from cache misses + DRAM contentionProfile active term footprint vs. effective L3; adjust term length or shard streams
Kernel BypassRemoves syscall overhead + kernel queuingEliminates context switches → median to low-µsReduced jitter makes latency deterministicBest for dedicated, isolated cores; requires more operational effort
Thread Pinning / Core IsolationPrevents run-queue contention → higher max sustainable throughputReduces context switching + cache thrashOne of the strongest levers — avoids noisy-neighbor and scheduler jitterIsolate cores and pin Aeron’s agent threads — how-to
JVM Prewarm (JIT warmup)Minimal steady-state effect; only the cold ramp-up is slowerModerate — a warmed JIT keeps the median on compiled fast pathsOne of the strongest tail levers — cold code + first-touch allocation cause severe spikesDrive representative traffic through all hot paths before going live