Skip to content

Add test for non-atomic vote persistence in handleRequestVoteRequest#1242

Open
Qian-Cheng-nju wants to merge 1 commit intosofastack:masterfrom
specula-org:test/vote-persistence-bug
Open

Add test for non-atomic vote persistence in handleRequestVoteRequest#1242
Qian-Cheng-nju wants to merge 1 commit intosofastack:masterfrom
specula-org:test/vote-persistence-bug

Conversation

@Qian-Cheng-nju
Copy link

@Qian-Cheng-nju Qian-Cheng-nju commented Feb 16, 2026

Reproduction test for the issue described in #1241.

Summary by CodeRabbit

  • Tests
    • Added tests validating vote persistence across normal restarts and simulated I/O failures, including a scenario showing a double-vote after an interrupted persist.
    • Added test utilities to inject disk I/O failure points, inspect on-disk meta state after failures, and construct vote requests to reproduce and verify these behaviors.

@sofastack-cla
Copy link

sofastack-cla bot commented Feb 16, 2026

Hi @Qian-Cheng-nju, welcome to SOFAStack community, Please sign Contributor License Agreement!

After you signed CLA, we will automatically sync the status of this pull request in 3 minutes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Adds a new test class that simulates disk I/O failure during vote persistence (setVotedFor) by injecting an instrumented RaftMetaStorage, and includes tests reproducing normal-restart and crash-induced double-vote scenarios within the raft test cluster.

Changes

Cohort / File(s) Summary
Vote persistence test & storage injection
jraft-core/src/test/java/com/alipay/sofa/jraft/core/VotePersistenceBugTest.java
New test file adding VotePersistenceBugTest with two tests: testVoteRejectedAfterNormalRestart and testDoubleVoteAfterSetVotedForIOFailure. Introduces DiskFailureMetaStorage (wraps/delegates to LocalRaftMetaStorage, can make meta dir read-only to induce I/O failure) and DiskFailureServiceFactory (injects the instrumented storage into test cluster). Adds helper buildVoteRequest(PeerId, PeerId, long). Key areas to review: correctness of the storage wrapper delegation and failure simulation, proper injection into TestJRaftServiceFactory/TestCluster, and test assertions around vote persistence and restart behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 I nudged a file where the bytes like to hide,
Made the disk go sleepy and watched a vote collide.
Two hearts desired one single term,
I wiggled my nose and raised the alarm.
Hooray — bugs exposed, then gently untied! 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a test for non-atomic vote persistence in handleRequestVoteRequest, which matches the changeset that introduces VotePersistenceBugTest with related test infrastructure.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into master

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@jraft-core/src/test/java/com/alipay/sofa/jraft/core/VotePersistenceBugTest.java`:
- Around line 235-241: The test VotePersistenceBugTest currently asserts the
buggy double-vote behavior by checking resp2.getGranted() with assertTrue;
update the test to mark this as a known-to-be-broken assertion and flip it to
assertFalse once issue `#1241` (atomic vote persistence) is fixed. Specifically,
locate the block using NodeImpl node2, method
handleRequestVoteRequest(buildVoteRequest(...)) producing resp2, and replace the
assertTrue(resp2.getGranted()) with assertFalse(resp2.getGranted()) or wrap it
with a clear FIXME comment (e.g., "// FIXME: flip to assertFalse after `#1241` is
fixed") so future maintainers know to change the assertion when the bug is
resolved. Ensure the message text referencing candidate1/candidate2 remains
informative after the flip.
🧹 Nitpick comments (3)
jraft-core/src/test/java/com/alipay/sofa/jraft/core/VotePersistenceBugTest.java (3)

47-67: LGTM on class setup and Javadoc.

The class-level Javadoc clearly explains the two-phase persistence problem being tested. The @Before setup is consistent with other JRaft tests.

Minor: the testName rule (line 57-58) is declared but never referenced in either test method. Consider removing it if not needed.

,


155-158: getMetaFilePath is unused — consider removing.

This helper is defined but never called in either test. It appears to have been superseded by the CrashPointMetaStorage.metaFilePath approach for reading on-disk state. Removing it avoids confusion.

🧹 Remove unused method
-    private String getMetaFilePath(final PeerId peer) {
-        return this.dataPath + File.separator + peer.getEndpoint().toString().replace(':', '_') + File.separator
-               + "meta" + File.separator + "raft_meta";
-    }

69-78: Teardown robustness: consider adding a try-finally around stopAll.

If c.stopAll() throws for one cluster, the remaining clusters in CLUSTERS won't be stopped, and FileUtils.deleteDirectory / NodeManager.clear() won't execute. A try-finally or try-catch inside the loop would improve test isolation.

Wrap cluster shutdown in try-catch
     `@After`
     public void teardown() throws Exception {
         if (!TestCluster.CLUSTERS.isEmpty()) {
             for (final TestCluster c : TestCluster.CLUSTERS.removeAll()) {
-                c.stopAll();
+                try {
+                    c.stopAll();
+                } catch (final Exception e) {
+                    // log and continue to ensure remaining cleanup executes
+                }
             }
         }
-        FileUtils.deleteDirectory(new File(this.dataPath));
-        NodeManager.getInstance().clear();
+        try {
+            FileUtils.deleteDirectory(new File(this.dataPath));
+        } finally {
+            NodeManager.getInstance().clear();
+        }
     }

@sofastack-cla sofastack-cla bot added cla:yes and removed cla:no labels Feb 16, 2026
@Qian-Cheng-nju Qian-Cheng-nju force-pushed the test/vote-persistence-bug branch from dbc4bf3 to 0b9a695 Compare February 16, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant