Skip to content

Parameter Reference

A reference for why each transport parameter affects throughput, p50, and p99 — and how to size it. Read this alongside the tuning overview.

Buffer & window sizing

Parameter	Throughput	p50 (median)	p99 (tail)	How to set it
Initial Window Size	Larger window keeps more data in-flight → better on high-BDP links	Slight negative if excessively large (more queuing/jitter); well-sized has minimal effect	Large windows worsen the tail under congestion; size close to BDP is optimal	Tune near BDP; don’t starve the pipe (too small) or create deep queues (too large)
Term Buffer Size	Larger terms allow more data per rotation → better for bursty traffic	Very large terms hurt p50 via cache misses	Oversized terms that don’t fit in cache increase variance	Smallest term covering max message size + expected burst, while fitting in cache
OS Socket Buffers (`SO_SNDBUF`/`SO_RCVBUF`)	A few MB matches BDP and prevents drops	Oversized buffers add queuing; 2–4 MB keeps p50 acceptable	Very large = deep queues; too small = drops + retransmits	Start at 2–4 MB; adjust on observed loss/queuing
NIC Ring Buffers (e.g. AWS ENA)	Matching ring sizes to bandwidth enables high throughput	Correctly sized keeps p50 low; excessively deep adds µs–ms queuing	Large queues amplify tail under bursts; undersized drop packets	Tune together with OS socket buffers to avoid double-buffered queue bloat

Recovery & loss

Parameter	Throughput	p50 (median)	p99 (tail)	How to set it
NAK Delay	Longer delay reduces retransmission traffic; marginally helps on lossy networks	Little impact unless loss is common	Directly affects tail: larger delay = longer gap persistence; shorter = faster recovery but more duplicates	Trade off bandwidth overhead against gap-recovery speed

Hardware-level levers

These move the tail the most — they attack variance at its source.

Parameter	Throughput	p50 (median)	p99 (tail)	How to set it
NUMA Locality (CPU near the NIC)	Removes cross-socket traffic → higher achievable throughput	Cuts memory-access and PCIe traversal cost by several µs	Avoids remote-NUMA traffic → significantly stabilizes the tail	Run IRQs, driver threads, and app threads on cores local to the NIC’s NUMA node
Term Buffer Fits in L3 Cache	Cache-resident working sets reduce stalls	Reduces average access latency	Markedly reduces variance from cache misses + DRAM contention	Profile active term footprint vs. effective L3; adjust term length or shard streams
Kernel Bypass	Removes syscall overhead + kernel queuing	Eliminates context switches → median to low-µs	Reduced jitter makes latency deterministic	Best for dedicated, isolated cores; requires more operational effort
Thread Pinning / Core Isolation	Prevents run-queue contention → higher max sustainable throughput	Reduces context switching + cache thrash	One of the strongest levers — avoids noisy-neighbor and scheduler jitter	Isolate cores and pin Aeron’s agent threads — how-to
JVM Prewarm (JIT warmup)	Minimal steady-state effect; only the cold ramp-up is slower	Moderate — a warmed JIT keeps the median on compiled fast paths	One of the strongest tail levers — cold code + first-touch allocation cause severe spikes	Drive representative traffic through all hot paths before going live