Skip to content

OMS / Matching Engine Best Practices

Building an OMS or matching engine on Aeron Cluster is less about the framework and more about the operational patterns around it: how you deploy, upgrade, snapshot, and recover without ever telling the market “we’re closed.” This section collects those patterns.

  • Rolling upgrades of cluster member nodes — upgrading a 3/5-node Aeron Cluster one member at a time, without stopping the world.
  • Deterministic state machines — keeping the replicated service replayable (no wall clocks, no randomness, no external I/O in the business logic).
  • Snapshot discipline — when to snapshot, sizing the recovery window, testing restores.
  • Session and duty-cycle design — backpressure handling on ingress/egress.
  • Failover drills — leader loss, follower loss, AZ loss, and what the runbook says for each.

Related foundations elsewhere on this site: performance tuning, operations & resilience.