Commit be3bfa3
feat: Deterministic commit race resolution (MIP-03) (#152)
* feat: Deterministic commit race resolution (MIP-03)
* docs: Update changelogs for race resolution support
* test: Improve MLS SQLite storage coverage
* fix(mdk-core): redact group id in error and simplify rollback cleanup
* fix(mdk-core): fail commit processing if epoch snapshot creation fails
* fix(mdk-core): tighten assertions in race chain rollback test
* chore(mdk-memory-storage): remove speculative comments about lru crate
* fix(mdk-memory-storage): redact sensitive MLS storage fields in Debug
* docs: add Debug bound to MdkCallback in plan documentation
* fix(mdk-core): remove unused variables in race chain test
* Fix race conditions in memory storage and improve test determinism
- Improve thread safety in MdkMemoryStorage group operations by holding locks during existence checks.
- Add Debug implementations for snapshot types to redact sensitive information.
- Refactor commit race tests to ensure deterministic timestamps and rename test_commit_race_chain_rollback to test_commit_race_simple_rollback.
- Remove implemented plan document.
* Ensure correct commit processing and rollback support
- Conditionally attach epoch info to ProcessMessageWrongEpoch error only for Commit messages.
- Add epoch snapshot creation before merging pending commits in the OwnCommitPending path to ensure consistent rollback support (MIP-03).
* Prevent GroupId leakage in test logs
* Add PR link to CHANGELOG
* Replace SQLite savepoints with persistent application-level snapshots
- Replace SQLite savepoint mechanism with group_state_snapshots table
- Snapshots now persist across restarts (survive connection drops)
- Each snapshot is group-scoped (no interference between groups)
- Add test verifying group membership is preserved after rollback
- Fix clippy type_complexity lint in snapshot restoration
The previous savepoint-based approach had issues:
1. Savepoints don't survive connection restarts
2. Nested savepoints can interfere across groups
3. No persistence for crash recovery
The new approach stores snapshots in a dedicated table with:
- snapshot_name: unique identifier
- group_id: scopes snapshot to specific group
- table_name: which table the row belongs to
- row_key/row_data: serialized row content
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Address CodeRabbit review feedback for PR #152
Security fixes:
- Remove snapshot name (contains GroupId) from error messages
- Redact EpochSnapshotManager inner state in Debug impl
- Remove error details from snapshot/rollback logging
Atomicity improvements:
- Wrap SQLite snapshot_group_state in transaction
- Wrap SQLite restore_group_from_snapshot in transaction
CHANGELOG fixes:
- Correct method names to match trait signatures
- Remove duplicate section headers
- Fix entry placement
Other:
- Add Clone and Eq derives to MdkStorageError
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Fix formatting to match CI rustfmt
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Add comprehensive tests for epoch snapshot and error handling
- Add 20 tests for EpochSnapshotManager in epoch_snapshots.rs covering:
- MIP-03 ordering rules (timestamp priority, event ID tiebreaker)
- Snapshot creation, rollback, and release operations
- Debug implementations
- Add 10 tests for error variants in error.rs covering:
- ProcessMessageWrongEpochWithInfo error formatting
- OwnCommitPending error
- Storage error conversion from MdkStorageError
- Add 7 snapshot tests to mdk-sqlite-storage covering:
- Group snapshot and rollback operations
- Snapshot release without rollback
- Exporter secrets rollback
- Group isolation between snapshots
- Multiple sequential snapshots
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Fix snapshot atomicity docs to match implementation
The documentation incorrectly stated that snapshot operations were "not
atomic" and acquired "multiple independent locks sequentially". In reality,
both create_snapshot() and restore_snapshot() use a single global RwLock
on self.inner, making them atomic.
Updated docs in four locations:
- lib.rs module-level doc
- lib.rs MdkMemoryStorage struct doc
- snapshot.rs module-level doc
- snapshot.rs MemoryStorageSnapshot struct doc
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Add epoch-aware message tracking and rollback retry support
Implements epoch tracking for messages and processed_messages to enable
proper commit race resolution:
- Add epoch column to messages and processed_messages tables
- Store epoch at decryption time for proper invalidation targeting
- Add MessageState::EpochInvalidated for messages from rolled-back commits
- Enrich RollbackInfo callback with invalidated_messages and messages_needing_refetch
- Add find_failed_messages_for_retry() to find messages that failed to decrypt
with wrong keys but can succeed after rollback applies correct commit
- Consolidate V002 migration into V001 (nothing deployed yet)
The key insight: after rollback, two categories of messages exist:
1. Invalidated messages (processed with wrong commit's keys) - lost forever
2. Failed messages (encrypted with correct keys, couldn't decrypt) - retryable
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Add SnapshotCreationFailed error variant for better error handling
Addresses PR review feedback to add a proper error variant instead of
using the generic Error::Message for snapshot creation failures.
- Add Error::SnapshotCreationFailed(String) variant
- Update process_commit_message_for_group to use new variant
- Include underlying error details in log message and error
- Add test for new error variant
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Fix group-scoped snapshot isolation in memory storage
Previously, memory storage snapshots captured ALL data but rollback
restored ALL data, causing rollback of Group A to also affect Group B.
This implements proper group-scoped snapshots matching SQLite behavior:
- Add GroupScopedSnapshot struct for per-group snapshot isolation
- Implement create_group_scoped_snapshot() with JSON-serialized MLS keys
- Implement restore_group_scoped_snapshot() for group-only restoration
- Add snapshot existence check in SQLite to prevent silent data deletion
- Add isolation tests verifying rollback doesn't affect other groups
- Fix clippy collapsible-if warnings in messages.rs
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Remove implementation comment from SQLite snapshot code
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Enhance commit race tests with member and message verification
Address PR review feedback to verify more data after rollback:
- Add member verification to test_commit_race_simple_better_commit_wins:
- Verify original members (Alice, Bob, Carol) still present
- Verify correct new member added based on which commit won
- Verify total member count is exactly 4
- Add new test_message_invalidation_during_rollback:
- Verify messages sent at rolled-back epochs are invalidated
- Confirm RollbackInfo contains the invalidated message IDs
- Add get_rollback_infos() helper to TestCallback for full access
to RollbackInfo including invalidated_messages
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Add FK constraint to group_state_snapshots with CASCADE delete
- Adds FOREIGN KEY (group_id) REFERENCES groups(mls_group_id) ON DELETE CASCADE
to ensure snapshots are cleaned up when a group is deleted
- Updates restore_group_from_snapshot to read snapshot data into memory
BEFORE deleting the group (to avoid CASCADE deleting the data)
- Preserves other snapshots during rollback by re-inserting them after
the CASCADE deletion and group restoration
- Fixes clippy warnings (type_complexity, needless_borrow)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Remove redundant index and fix restore order for FK constraints
- Remove idx_group_state_snapshots_lookup index since SQLite
automatically indexes primary keys, and (snapshot_name, group_id)
is a prefix of the PK
- Fix restore_group_from_snapshot to insert the groups row first
before group_relays and group_exporter_secrets (which have FK
constraints referencing groups)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Refactor snapshot_group_state into smaller helper methods
Break down the 290-line method into 7 focused helper methods:
- snapshot_openmls_group_data()
- snapshot_openmls_proposals()
- snapshot_openmls_own_leaf_nodes()
- snapshot_openmls_epoch_key_pairs()
- snapshot_groups_table()
- snapshot_group_relays()
- snapshot_group_exporter_secrets()
The main method is now ~65 lines and clearly shows the 7 tables
being snapshotted. Each helper is self-contained and easier to
understand and maintain.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Add snapshot TTL and hydration support for epoch snapshots
This commit adds time-based cleanup and startup hydration for epoch
snapshots to prevent unbounded storage growth and ensure proper state
recovery after app restart.
Changes:
- Add snapshot_ttl_seconds config (default 1 week) to MdkConfig
- Add list_group_snapshots() and prune_expired_snapshots() to storage trait
- Add created_at timestamp to GroupScopedSnapshot
- Implement TTL methods in both memory and SQLite storage backends
- Add hydration logic to EpochSnapshotManager with ensure_hydrated()
- Prune expired snapshots on startup in MDKBuilder::build()
- Update is_better_candidate() to accept storage parameter for hydration
- Add comprehensive test coverage (14 new tests)
The hydration system:
- Only activates for persistent storage backends (SQLite)
- Parses snapshot names to reconstruct EpochSnapshot metadata
- Marks groups as hydrated to prevent redundant loads
- Handles missing timestamp gracefully (uses 0 placeholder)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Formatting
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>1 parent 88f99d9 commit be3bfa3
File tree
31 files changed
+6885
-918
lines changed- Plans
- crates
- mdk-core
- src
- mdk-memory-storage
- src
- mls_storage
- mdk-sqlite-storage
- migrations
- src
- mls_storage
- mdk-storage-traits
- src
- messages
- mdk-uniffi/src
31 files changed
+6885
-918
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
0 commit comments