Skip to content

Leader Placement and Graceful Step-Down

In a Raft cluster the leader carries every write, so where the leader sits determines latency. When nodes span availability zones, a leader in a remote AZ adds a cross-AZ round trip to every committed operation. This page compares how Aeron Cluster and SOFAJRaft handle leader assignment, and the operational pattern for steering Aeron’s leader back to the low-latency zone after a failover.

For the byte-level election internals — terms, canvass, vote records — see The Aeron Files. This page is the operational view: what leader placement control each implementation offers, and what to do when it isn’t enough.

In the lowest-latency configuration, the leader sits in the same AZ as the workload that drives it. Consensus traffic and the hot path stay within one zone, so p50 and p99 stay tight.

After a leader failure, Raft elects a new leader from the surviving members. That election is decided by the protocol — timeouts and votes — not by zone. The new leader can land in a different AZ, and from that point consensus crosses an AZ boundary on every order.

Leader assignment: Aeron Cluster vs SOFAJRaft

Section titled “Leader assignment: Aeron Cluster vs SOFAJRaft”

The two implementations take different positions on how much control an operator has over leadership.

CapabilitySOFAJRaftAeron Cluster
Explicit leadership transferYes — Node.transferLeadershipTo(PeerId) hands leadership to a chosen peer by sending it a TimeoutNowRequest to start an immediate electionNo native API to transfer leadership to a specific node
Election priority / preferenceYes — per-node ElectionPriority (NodeOptions.setElectionPriority); higher-priority nodes are preferred, 0 means “never leader”No priority mechanism; all voting members are equal candidates
Pin leadership to an AZ/nodeAchievable via priority + transferNot natively supported
Trigger / influence an electiontransferLeadershipTo triggers a targeted electionNo command to trigger an election or move leadership to a chosen node

Because Aeron has no leader-designation API, the intuitive fix — kill the badly-placed leader and hope the right node wins — is unreliable: the next election can hand leadership to another remote node. The ClusterTool commands (is-leader, list-members, suspend/resume, shutdown) let you inspect membership and gracefully shut a node down, but none of them request an election toward a specific node.

Without a native transfer API, the reliable approach is validate, then re-elect — pre-check the desired target node, then step the current leader down to force a fresh election. This is more dependable than force-killing a node, though the election outcome is still not guaranteed to land on the target.

Before changing anything, confirm the intended node is healthy and fully caught up on the log. A target that is behind on replication may lose the election or, if it wins, stall while it catches up. Validating first is what makes the re-election predictable enough to be useful.

Gracefully step down the current leader to trigger a new election. Combined with the pre-check, this biases the outcome toward the validated node — far more reliable than force-killing and hoping, but still not deterministic the way SOFAJRaft’s transferLeadershipTo is.

Verify the new leader landed in the intended AZ (is-leader / list-members). If it didn’t, repeat the step-down. Once the leader is back in the hot-path zone, consensus and the workload share one AZ again and p50/p99 return to their single-AZ baseline.