Scaling Aeron Cluster
How you scale an Aeron-backed matching engine depends on how mature your exchange is. There are two models, and the right one changes as you grow. Start simple. Split when you must.
This page is about the scaling topology, not the wire-level mechanics. For Aeron internals — clustering, the archive, log replication — defer to The Aeron Files.
Early stage: single cluster, multiple pairs
Section titled “Early stage: single cluster, multiple pairs”A young exchange runs everything in one place. One matching engine, one Aeron Cluster, every trading pair inside it.
This buys you two real advantages.
Pros:
- Strong consistency and cross-pair logic (e.g., margin calculations across pairs)
- Operational simplicity — one cluster to manage
The costs show up later, under load.
Cons:
- Scaling ceiling — all pairs share the same throughput budget
- Blast radius — a bug in one pair’s logic can take down all pairs
The throughput ceiling is the killer. Every pair competes for the same single-cluster throughput budget, so your busiest pair and your quietest pair share one fate. A hot BTC/USDT book eats the headroom that ETH/USDT needs, and there is no way to give one pair more capacity without giving it to all of them.
Mature stage: multiple clusters per machine
Section titled “Mature stage: multiple clusters per machine”A grown-up exchange splits each pair into its own Aeron Cluster.
Now each pair scales on its own terms.
Pros:
- Horizontal scalability by pair
- Failure isolation — one pair’s issue doesn’t affect others
- Independent tuning and versioning per pair
The win for tail latency is concrete: a noisy pair can no longer push another pair’s p99 around, because they no longer share a throughput budget. You also get to tune each cluster independently — a high-volume pair and a thin one can carry different settings instead of one compromise for all.
But isolation has a price.
Cons:
- Cross-pair coordination pain (need external mechanism for cross-pair margin, etc.)
- Operational complexity — N clusters to manage
The tradeoff in one line
Section titled “The tradeoff in one line”So the decision rule is simple. Run a single cluster while consistency and simplicity matter more than capacity. Split into a cluster per pair when you hit the throughput ceiling or need failure isolation — and budget for the external coordination layer before you do.