Skip to content

Conversation

@kjnilsson
Copy link
Contributor

@kjnilsson kjnilsson commented Jan 2, 2025

PR for visibility of v3 work.

See: https://github.com/rabbitmq/ra/milestone/7

Ra v3 Pull Request Description

Overview

Ra v3 is a major release introducing log compaction - a fundamentally new approach to log management that allows state machines to retain specific log entries ("live indexes") beyond the snapshot boundary. This enables significant optimisations for workloads where keeping certain historical entries is more efficient than including them in snapshots.

✨ New Features

Sparse Log Compaction System

  • ra_machine:live_indexes/1 callback - New optional callback allowing state machines to specify which log indexes should be retained beyond the snapshot boundary
  • Multi-phase compaction strategy:
    • Phase 1: Immediate deletion of whole segments with no live entries after snapshot
    • Phase 3: Segment consolidation - merging adjacent segments when live data fits into fewer segments
  • Automatic major compaction strategies - Configurable triggers for major compactions based on number of minor compactions
  • New ra:trigger_compaction/1 API - Manually trigger log compaction on a Ra server

New Modules

  • ra_kv - Key value store that keeps values on disk and reads them on demand.
  • ra_seq - Efficient sparse sequence data structure for tracking non-contiguous index ranges
  • ra_log_segments - Complete rewrite of segment management (replaces ra_log_reader) with compaction support
  • ra_log_snapshot_state - Tracks snapshot state including live indexes for WAL coordination
  • ra_worker - Background worker process for each Ra server handling snapshots, checkpoints, and compaction

Consistent Aux Commands

  • ra:consistent_aux/3 - New API for executing aux commands after consensus has been achieved (similar to consistent_query but for aux commands)

Absolute Machine Timers

  • Extended timer effect to support absolute timestamps: {timer, Name, Duration, {abs, true}}

WAL Sparse Write Support

  • WAL now supports sparse/non-contiguous writes with new write/7 API accepting PrevIndex parameter
  • Memory tables (ra_mt) refactored to track sparse sequences instead of contiguous ranges

Read Entries API

  • New ra_server_proc:read_entries/4 for reading specific log entries by index (replaces log_fold)

🔧 Improvements

Snapshotting Enhancements

  • Snapshots now store live indexes alongside state data
  • Snapshot write duration now logged for better observability
  • Snapshots with a "pre" phase are now restartable after interruption
  • Improved handling of interrupted snapshot replication with sparse entries

Performance Optimizations

  • Optimized snapshotting by moving ra_machine:live_indexes call to appropriate phase
  • Avoid complete segment re-initialization when releasing resources
  • Delete overwritten segment files when detected

Robustness Improvements

  • Detect sparse log on init and reset to last contiguous index
  • Remove dangling symlinks on log init
  • Reset log after aborted snapshot installation
  • Handle missing segments event after follower recovery
  • Improved suspension/unsuspension handling for peers

⚠️ Breaking Changes

Remote Query API Changes**

  • ra:leader_query/2,3 now only accepts MFA tuples {Module, Function, Args} - anonymous functions are no longer supported
  • ra:consistent_query/2,3 now only accepts MFA tuples {Module, Function, Args} - anonymous functions are no longer supported
  • ra:local_query/2,3 remains unchanged and still supports both anonymous functions and MFA tuples

Removed Deprecated APIs

The following deprecated functions have been removed from the ra module:

  • ra:start_server/1 → use ra:start_server/2
  • ra:restart_server/1 → use ra:restart_server/2
  • ra:stop_server/1 → use ra:stop_server/2
  • ra:start_cluster/1 → use ra:start_cluster/2
  • ra:overview/0 → use ra:overview/1
  • ra:register_external_log_reader/1removed entirely (external readers no longer supported)

WAL Configuration Changes

  • wal_write_strategy option removed - The default and o_sync write strategies are no longer configurable; the WAL now uses a single optimized strategy

Module Removals

  • ra_log_reader - Replaced by ra_log_segments with enhanced compaction support

API Changes

  • ra_server_proc:log_fold/4 → replaced by ra_server_proc:read_entries/4
  • Snapshot APIs now require machine module and state parameters:
    • ra_snapshot:begin_snapshot/4ra_snapshot:begin_snapshot/5
    • ra_snapshot:complete_snapshot/3ra_snapshot:complete_snapshot/4
  • Snapshot accept_chunk signature changed from 4 to 3 parameters
  • ra_snapshot:handle_down/3ra_snapshot:handle_error/3

Internal Structure Changes

  • Memory table (ra_mt) now tracks ra_seq:state() instead of {First, Last} ranges
  • Segment references structure changed to support sparse indexes
  • WAL batch writer record uses smallest_live_idx and seq instead of snap_idx and range
  • Ra server config now requires parent parameter (supervisor pid)

📚 Documentation

  • New internal design document: docs/internals/COMPACTION.md describing the compaction architecture

🧪 Testing

  • New test suites: ra_kv_SUITE, ra_log_segments_SUITE, ra_seq_SUITE
  • New stress-testing harness: ra_kv_harness for testing Ra KV under various failure conditions
  • Removed: ra_log_props_SUITE (property-based tests)
  • Significant expansion of existing test coverage

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR brings test suites in line with the v3 work by updating snapshot and log APIs, renaming backends, and removing obsolete tests.

  • Rename ra_log_reader to ra_log_segments in unit tests
  • Refactor ra_snapshot_SUITE to use the new bg_work pattern and updated pending/complete_snapshot signatures
  • Update WAL and memory tests to expect lists of indexes instead of tuple ranges, and remove the legacy property-based suite

Reviewed Changes

Copilot reviewed 52 out of 55 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/unit_SUITE.erl Replace ra_log_reader with ra_log_segments
test/ra_snapshot_SUITE.erl Migrate snapshot tests to new begin_snapshot/pending API
test/ra_log_wal_SUITE.erl Adapt await_written to expect lists of indexes
test/ra_log_memory.erl Change install_snapshot/4 to match new live‐indexes signature
test/ra_machine_ets_SUITE.erl Remove Common Test include (ct.hrl)
test/ra_log_props_SUITE.erl Delete entire deprecated property-based suite
Comments suppressed due to low confidence (2)

test/ra_log_memory.erl:174

  • The new install_snapshot/4 clause returns {ok, State, []} but the expected return is {State, List}. Remove the ok atom so it matches the spec and callers ({State, []}).
    {ra_log_memory_state(), list()}.

test/ra_machine_ets_SUITE.erl:12

  • The Common Test macros (e.g. ct:pal) are used elsewhere in this file. Removing the ct.hrl include may cause undefined macro errors. Re-add the Common Test include if CT macros are still used.
-export([

@kjnilsson kjnilsson force-pushed the v3 branch 3 times, most recently from 611ab22 to e368de3 Compare July 4, 2025 15:11
@kjnilsson kjnilsson force-pushed the v3 branch 3 times, most recently from 5f67ef4 to c525487 Compare July 22, 2025 08:12
@kjnilsson kjnilsson force-pushed the v3 branch 2 times, most recently from 0873afb to c54fd86 Compare January 15, 2026 12:26
Hopefully this will work better on windows.
The WAL can lose writer tracking data so if we have unwritten
items in the mem table these need to be resent after log
init
When resending entries to the WAL after a failure or restart, the
previous index must be correctly tracked and passed to each write
call. This is essential for sparse writes where entries are not
contiguous.

Changes:
- Consolidate resend_pending and resend_from0 into a single code path
  that threads PrevIdx through the fold
- Update wal_rewrite to accept and pass PrevIdx to ra_log_wal:write
- Fix write_sparse to properly update pending sequence with limit/append
- Add guard to ra_seq:limit for non-integer ceiling index
- Add tests for sparse write resend and set_last_index with pending
Extend the release_cursor effect to support a 4-tuple variant with
options: {release_cursor, Index, MacState, Opts}

Opts can specify conditions that must be met before the snapshot is
taken:

- {written, Idx}: wait until last_written >= Idx
- no_snapshot_sends: wait until no peers are receiving snapshots

This prevents snapshots from being taken before all referenced log
entries have been durably written to disk, which is important when
the machine state contains "live indexes" (references to entries
still in the WAL).

The pending release cursor is stashed in the server state and
re-evaluated when:
- Written events are processed (evaluate_quorum, handle_follower)
- Commit index advances (evaluate_commit_index_follower)
- Snapshot sends complete (install_snapshot_result, handle_down)
Introduce ra_li module implementing a leaky integrator for smoothed
rate estimation with configurable decay time. The integrator handles
irregular update intervals and gracefully handles backwards timestamps.

Track commit rate in ra_server_proc on every tick_timeout across all
states (leader, follower, candidate, pre_vote, await_condition). The
receive_snapshot state resets the integrator to avoid counting snapshot
index jumps as throughput.

Two rate metrics are exposed via ra:member_overview/1:
- commit_rate: smoothed rate using 6x tick_timeout decay (~30s default)
- commit_rate_last: point-in-time rate from the last tick interval

The commit rate state is stored as a tuple {ra_li:state(), LastCommitIndex,
LastIntervalRate} to minimize record fields.
Readers following a symlink during compaction could see stale data
if the symlink was created before the .compacting file was renamed
to the target. Reverse the order so the compacted data is in place
before any symlink points to it.

Update recovery logic to use .compacting absence as the commit point.
this is temporary for now to see if we can find
the bug.
Else we may end up with pending entries that never will
be written and will later trigger a resend that will fail.
As they may be queried later if using release cursor
conditions.
- Remove snapshot_next_event field, store deferred event in
  snapshot_phase as {awaiting_pending, EventType, Rpc}
- Add ra_log:has_pending/1 for O(1) pending check
- Simplify abort_receive to reset log based on snapshot_has_live_indexes
  rather than checking individual phases
@kjnilsson kjnilsson changed the title Ra v3 - REVIEW BUT DONT MERGE QUITE YET Ra v3 Jan 30, 2026
@kjnilsson kjnilsson merged commit 1802af5 into main Jan 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants