Archive Durability and Replay
Aeron Archive gives you two dials: how durable a recording is, and how you replay it. Get both right and you decouple producers from consumers, recover nodes fast, and keep p99 low. This page covers sync levels for durability and the three ways to consume recorded data.
For the internals of how recordings, the catalog, and replay sessions actually work, see The Aeron Files. Here we focus on the operational knobs.
Sync levels control durability
Section titled “Sync levels control durability”Durability is configurable per recording. You trade write performance for crash safety by choosing a sync level.
| Sync Level | Behavior | Durability | Performance |
|---|---|---|---|
| Level 0 (default) | Writes to page cache only | Lowest — data lost if OS crashes | Fastest |
| Level 1 | Syncs data blocks to disk, but not metadata | Medium — data survives OS crash, but recording metadata may be inconsistent | Moderate |
| Level 2 | Syncs both data blocks and metadata to disk | Highest — full crash consistency | Slowest |
Two notes that bite people in production:
- The default is Level 0 — page cache only. Aeron Archive is NOT durable by default against OS or power failures.
- Catalog durability can be controlled separately, but set it to the same value as the recording sync level. Mismatched settings invite inconsistency.
The spectrum is simple:
Performance ←————————————————————→ Durability Level 0 Level 1 Level 2 (page cache) (data sync) (full sync)Three ways to consume recorded data
Section titled “Three ways to consume recorded data”Once data is recorded, there are three progressively more sophisticated ways to read it back.
1. Replay
Section titled “1. Replay”Replay is simple playback of historical data.
- Recordings can be replayed at a later point in time.
- Useful for recovery, auditing, and backtesting.
2. Live tailing
Section titled “2. Live tailing”Live tailing replays a stream as it is being recorded.
- It decouples producer and consumer — the producer doesn’t need to know about the consumer.
- It provides a buffer for slower subscribers. If a consumer falls behind, it reads from the recording instead of applying back-pressure to the producer.
That back-pressure point matters for tail latency: a slow downstream consumer reading from the archive can’t stall the producer, so it can’t drag the producer’s p99 with it.
3. Replay merge (most sophisticated)
Section titled “3. Replay merge (most sophisticated)”Replay merge combines subscription of recorded and live streams seamlessly.
The flow:
- Begin consumption from the recorded stream, catching up on historical data.
- Once the subscriber has caught up, seamlessly cut over to the live stream.
- No gap, no duplicate messages during the transition.