Skip to content

Aeron Cluster and Raft Consensus

Aeron Cluster is a fault-tolerant cluster for stateful, event-driven applications. It gives you in-memory state at microsecond latency, with a Raft-replicated log underneath so you never lose a committed event. This page covers the design principles, the two RPCs that make Raft tick, the improvements Aeron layers on top, and the five safety guarantees that make the whole thing safe for a matching engine.

For the byte-level internals — log layout, term files, archive recordings — defer to The Aeron Files. This page is the operator’s mental model, not the wire format.

Aeron Cluster is built on Aeron Transport, not TCP. That single choice drives the rest of the design.

  • Very high throughput and very low latency — built on Aeron Transport, not TCP.
  • State held entirely in memory — no database on the hot path.
  • Input events written to persistent storage — via Aeron Archive, for node recovery.
  • Replication to other nodes and sites — Raft-based consensus for consistency.
  • Multiple services on a single cluster — share the consensus infrastructure across services.

The layers stack like this:

All state lives in memory. The persistent log (Archive) exists only for recovery. That is what enables microsecond-level processing: no disk I/O on the hot path, no database queries, no network calls to external systems. The result shows up directly in your latency profile — p50 and p99 stay low because the request never touches a disk or a remote service to produce a response.

Raft consensus uses only two RPCs. That is the whole protocol surface.

  • Invoked by candidates to gather votes during leader election.
  • A node transitions to candidate state, increments its term, votes for itself, and sends RequestVote to all other nodes.
  • Other nodes grant their vote if the candidate’s log is at least as up-to-date as theirs.
  • Invoked by the leader for two purposes:
    • Log replication — sending new log entries to followers.
    • Heartbeat — empty AppendEntries to maintain leadership (prevents followers from starting elections).

Why only two? The simplicity is intentional. Fewer message types means fewer edge cases, fewer bugs, and easier formal verification. Everything in Raft is either “who should be leader?” (RequestVote) or “here’s what the leader says” (AppendEntries).

Aeron Cluster implements Raft using three building blocks, with several improvements over the original paper.

Built on:

  • Aeron Transport — for inter-node communication (UDP, not TCP).
  • Aeron Archive — for persistent log storage.
  • Consensus Module — Aeron’s Raft implementation.

1. A canvass phase before elections. Before a node becomes a candidate and starts a formal election, it first canvasses other nodes to check if an election would succeed. This avoids unnecessary elections that would disrupt the cluster — a node won’t start an election it knows it will lose.

2. Parallel replication between nodes. The leader sends AppendEntries to all followers simultaneously, not sequentially. Commit latency is therefore bounded by the slowest node in the quorum, not the sum of all nodes.

3. Natural batching during replication. Multiple log entries are batched into a single AppendEntries message when they’re available. This reduces network round-trips and improves throughput without adding artificial batching delay — a throughput win that costs you nothing on latency.

Raft’s correctness rests on five guarantees. Together they ensure the cluster can never diverge.

At most one leader can be elected in a given term.

This prevents split-brain — you can never have two nodes both believing they are the leader for the same term number. Enforced by requiring a majority vote.

A leader never overwrites or deletes existing entries in its log.

The leader only appends new entries; it never modifies history. This makes the leader’s log a monotonically growing, immutable sequence — critical for consistency.

If two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index.

This is the induction property that makes Raft work: if two nodes agree on entry (index=42, term=5), they must also agree on entries 1–41. Enforced by the leader including the previous entry’s index and term in every AppendEntries — followers reject entries that don’t match.

If a log entry is committed in a given term, then that entry will be present in the logs for leaders of all higher-numbered terms.

Once an entry is committed (replicated to a majority), it can never be lost — every future leader must have it. Enforced by the voting rule: a node won’t vote for a candidate whose log is less complete than its own.

If a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index.

This is the ultimate consistency guarantee: all nodes apply the same sequence of operations to their state machines. Combined with deterministic execution, every node converges to identical state.

GuaranteeWhat it ensures
Election SafetyOne leader per term
Leader Append-OnlyLeader never rewrites history
Log MatchingAgreement on one entry = agreement on all prior
Leader CompletenessCommitted entries survive leader changes
State Machine SafetyAll nodes apply same operations in same order