2 of 3: corruption fix: Implement atomic read-modify-write for JSON files #5544

KJ7LNW · 2025-07-10T02:09:09Z

Dependencies

When dependent pull requests have been merged, Github will show more correct line number count

Depends on 1 of 3: memfix: implement and use safeReadJson to reduce memory usage #5332
Closes:
- Bug: Race condition when cancelling task can corrupt conversation history #5333
- User Input Lost When Clicking “Reject” or “Send” in AI Workflow #4926
Blocks:
- 3 of 3: scalability fix: refactor task history persistence to use file-based storage #3785
- feat: Implement task history scanner and recovery tools #5546

Note to Reviewer

This is a PR series, so the line numbers shown by Github are exaggerated. The commit series clearly marks where each PR begins using lines that say NOTICE: PR ____ STARTS HERE. See below for the annotated diffstat.

The commits tell a clean story, it will be easier to understand what is happening here by looking at each commit individually under "Commits" than by looking at all of the files that were changed.

The changes were kept minimal with detailed human inspection focusing on transactional safety without changing the original behavior.

This is probably the best place to start your review:

modifyConversation( async modifyFn(clineMessages, apiHistory) )

This function and the other two modify* functions perform an atomic mutation to synchronize Task.* structures with the file on disk.

Use "Hide Whitespace"

Due to significant indentation changes from moving logic into atomic read-modify-write transaction callbacks, using the "hide whitespace" option in the GitHub diff viewer will help focus on the logical changes.

Actual diff excluding tests:

]$ git diff -w --stat  github/stream-json-reads..github/json-read-modify-write -- ':(exclude)**/__tests__/**'  ':(exclude)**/i18n/**'  
// actual transactional implementation:
 src/utils/safeWriteJson.ts                |  69 +++++-

// transactional modify* functions replace the save/overwrite* functions,
// plus hooks for existing use into those functions:
 src/core/task/Task.ts                     | 300 ++++++++++++++++++---------

// moved to rules/ because it is needed by Code, architect, issue fixer, and possibly other modes:
 .roo/rules-code/use-safeWriteJson.md      |   6 -
 .roo/rules/use-safeWriteJson.md           |  11 +

// small changes to hook in the changes above 
 src/core/checkpoints/index.ts             |   8 +-
 src/core/sliding-window/index.ts          |   2 +-
 src/core/task-persistence/apiMessages.ts  |  14 --
 src/core/task-persistence/index.ts        |   3 +-
 src/core/task-persistence/taskMessages.ts |  42 ----
 src/core/webview/webviewMessageHandler.ts | 105 +++++-----
 10 files changed, 345 insertions(+), 215 deletions(-)

Key Changes

This PR enhances JSON file handling by introducing atomic read-modify-write transactions, which prevents race conditions and improves data integrity when modifying files concurrently.

safeWriteJson.ts is enhanced to support atomic read-modify-write transactions by accepting a readModifyFn parameter.
The Task.ts class is refactored to introduce modifyConversation, modifyApiConversationHistory, and modifyClineMessages methods, which use the new atomic pattern to safely update conversation history files.
All Task.ts methods that mutate the conversation state, including say, ask, recursivelyMakeClineRequests, updateApiReqMsg, resumeTaskFromHistory, condenseContext, and attemptApiRequest, are refactored to use these atomic transaction methods.
Redundant, non-atomic persistence functions like saveApiMessages and saveTaskMessages are removed, centralizing all history mutations within the new atomic methods.
The test suites for ClineProvider and Task are updated to properly mock and test the new transactional behavior.

Important

Enhances JSON file handling with atomic read-modify-write transactions, refactors Task.ts for atomic operations, and updates tests for new functionality.

Behavior:
- Introduces atomic read-modify-write transactions in safeWriteJson.ts and safeReadJson.ts to prevent race conditions and improve data integrity.
- Refactors Task.ts to use new atomic methods for conversation history updates, replacing non-atomic persistence functions.
- Updates modifyConversation, modifyApiConversationHistory, and modifyClineMessages methods in Task.ts.
Tests:
- Updates test files to mock and test new transactional behavior, including safeReadJson.spec.ts and safeWriteJson.test.ts.
- Adds tests for atomic transactions and error handling in safeWriteJson.test.ts.
Misc:
- Removes redundant functions like saveApiMessages and saveTaskMessages from task-persistence.
- Updates MdmService.ts and related tests to use safeReadJson for configuration loading.

^{This description was created by}^{for 38eba3ab658d9df6b0a3b9a5b00ec290e41e87ba. You can customize this summary. It will automatically update as commits are pushed.}

KJ7LNW · 2025-07-11T21:58:46Z

rebased on main, ready to review.

Implement safeReadJson function to complement the existing safeWriteJson functionality: - Uses stream-json for efficient processing of large JSON files - Supports both full object reading and selective path extraction - Provides file locking to prevent concurrent access - Includes comprehensive error handling - Adds complete test coverage - Passthrough all exceptions This enables efficient and safe JSON reading operations throughout the codebase. Signed-off-by: Eric Wheeler <[email protected]>

Replace manual file reading and JSON parsing with the safer safeReadJson utility across multiple files in the codebase. This change: - Provides atomic file access with proper locking to prevent race conditions - Streams file contents efficiently for better memory usage - Improves error handling consistency - Reduces code duplication Fixes: #5331 Signed-off-by: Eric Wheeler <[email protected]>

Updated test files to properly mock and use safeReadJson/safeWriteJson: - Added proper imports for safeReadJson from safeWriteJson module - Updated mock implementations to mock both functions correctly - Replaced direct fs operations with calls to safe functions - Updated assertions to match the new behavior This fixes all failing tests after the conversion to safeReadJson. Signed-off-by: Eric Wheeler <[email protected]>

#5544

- Modify safeWriteJson to accept readModifyFn parameter - Allow default values when file doesn't exist with readModifyFn - Ensure modifiable types (objects/arrays) for default values - Add tests for object and array default values, and to use the new read-modify-write pattern - Return the structure that was written to the file - Moved use-safeWriteJson.md to rules/ because it is needed by both code, architect, issue-fixer, and possibly other modes Signed-off-by: Eric Wheeler <[email protected]> safe-write: update test

These are simple modifications with clean refactors that do not yet introduce any optimization, making them easy to review. Replace direct file operations with safer atomic transactions by: - Introducing modifyApiConversationHistory and modifyClineMessages methods - Removing separate save functions in favor of callback-based modifications - Ensuring file operations are performed atomically to prevent race conditions - Maintaining consistent state between memory and persistent storage Signed-off-by: Eric Wheeler <[email protected]>

Replaced the obsolete `overwriteClineMessages` and `overwriteApiConversationHistory` methods with the atomic `modifyConversation` equivalent. This refactoring addresses potential race conditions and ensures that updates to the UI messages and API history are performed as a single, transactional operation. - All message modification logic, including index lookups and derived value calculations, now occurs within the atomic callback to guarantee data consistency. - The change preserves the existing helper function structure while adapting it to the new transactional approach. - Add modifyConversation method to Task class for atomic updates Signed-off-by: Eric Wheeler <[email protected]>

The updateApiReqMsg function was modifying the clineMessages array directly and relying on a subsequent, separate call to modifyClineMessages to persist the change. This was not atomic. This change refactors updateApiReqMsg to be an async function that encapsulates the entire transaction of updating the API request message and saving it to disk within a single modifyClineMessages call. The separate, redundant calls to modifyClineMessages have been removed. Signed-off-by: Eric Wheeler <[email protected]>

Refactors the `recursivelyMakeClineRequests` method to ensure message updates are transactional. This change replaces a direct mutation of the `clineMessages` array followed by a separate save operation with a single call to `modifyConversation`. By doing so, it guarantees that the modification of the `api_req_started` message and the addition of the user's content to the conversation history occur as a single, atomic operation, preventing potential data inconsistencies. Signed-off-by: Eric Wheeler <[email protected]>

Refactors the say() method in Task.ts to ensure message updates are atomic. The mutation logic for the last message is moved inside the modifyClineMessages() callback. This makes the find, update, and save operations a single atomic transaction, preventing race conditions and ensuring data integrity. Signed-off-by: Eric Wheeler <[email protected]>

The `ask()` method was directly mutating the `lastMessage` object before calling `modifyClineMessages`, which is not an atomic operation. This refactor moves the mutation logic inside the `modifyClineMessages` callback to ensure that finding, updating, and saving the message occurs as a single, transactional operation. Signed-off-by: Eric Wheeler <[email protected]>

The resumeTaskFromHistory method was refactored to ensure that updates to both the cline message history and the API conversation history are fully atomic. Previously, the method would read the histories, modify them in-memory, and then call the respective modify functions with the already-modified data. This approach did not guarantee atomicity. The new implementation moves the modification logic directly inside the callbacks for `modifyClineMessages` and `modifyApiConversationHistory`. This ensures that the entire read-modify-write cycle for each history is performed as a single, uninterruptible transaction, preventing potential race conditions or partial state saves. This change also involved: - Adjusting variable scopes to support the new callback structure. - Removing the now-unused `getSavedClineMessages` helper method as part of the refactor. Signed-off-by: Eric Wheeler <[email protected]>

The `readTaskMessages` function and its corresponding file have been removed. This function is now redundant because all access to the cline message history is handled by the atomic `modifyClineMessages` method. This method reads the message file within its `safeWriteJson` transaction, performs the modification, and writes the result. A separate, non-transactional read function is therefore unnecessary and has been removed to simplify the codebase. Signed-off-by: Eric Wheeler <[email protected]>

Moves the conversation summarization logic into the critical section of `modifyApiConversationHistory`. This refactoring ensures that the process of reading the existing history, summarizing it, and writing the condensed version back to disk is a single, atomic transaction. Previously, the summary was generated outside the critical section, creating a potential race condition where the history could have changed between summarization and writing. Additionally, any side effects, such as `say()` notifications, are now performed only after the atomic write operation has successfully completed, preventing deadlocks and ensuring a cleaner separation of concerns. Signed-off-by: Eric Wheeler <[email protected]>

Refactors `attemptApiRequest` to ensure the conversation history truncation is a fully atomic operation. Previously, the truncation logic was performed and then conditionally saved, which introduced a race condition and relied on an unsafe object reference comparison. This change moves the call to `truncateConversationIfNeeded` inside the `modifyApiConversationHistory` callback. This guarantees that reading the history, truncating it, and writing it back to storage happens as a single, indivisible transaction. Calls to `say` were also moved outside of the critical section to prevent side-effects during the atomic update. Additionally, the `TruncateResponse` type is now exported from the sliding-window module to satisfy type checking in `Task.ts`. Signed-off-by: Eric Wheeler <[email protected]>

Removes the final `modifyClineMessages` call from `abortTask`. This operation was redundant because all `clineMessages` mutations are already wrapped in atomic transactions, ensuring the on-disk state is always synchronized. This change simplifies the abort logic and avoids a superfluous file write. Signed-off-by: Eric Wheeler <[email protected]>

- Add vitest imports and mock safeWriteJson to support atomic transaction testing. This ensures file operations in tests are properly isolated and can be verified without actual filesystem interactions. - Add fs/promises.access mock to prevent errors when checking file existence. - Satisfy requirement that clineMessages must be defined for say/ask() to be valid Signed-off-by: Eric Wheeler <[email protected]>

Replace separate overwriteClineMessages and overwriteApiConversationHistory methods with a single modifyConversation method that ensures atomic updates to the conversation history. This change improves test reliability by ensuring conversation state is updated transactionally and simplifies test assertions by directly checking the resulting state. Signed-off-by: Eric Wheeler <[email protected]>

KJ7LNW · 2025-07-18T21:01:09Z

rebased on v3.23.14

daniel-lxs

Hey @KJ7LNW I took a look at your implementation and it looks solid, I left a couple of minor points that might be worth taking a look at.

Let me know what you think.

daniel-lxs · 2025-07-21T16:30:30Z

src/core/task/Task.ts

+			if (modifiedMessages === undefined) {
+				throw new Error("modifyConversation: modifiedMessages is undefined after inner transaction")
+			}
+


The modifyConversation method attempts to acquire locks on two different files through nested calls to modifyClineMessages and modifyApiConversationHistory. This pattern could lead to:

Deadlocks if another process tries to acquire these locks in reverse order

Data inconsistency if one modification succeeds but the other fails

Is this intentional? The proper-lockfile documentation recommends against holding multiple locks simultaneously. Consider either:

Using a single lock file for both operations

Implementing a two-phase commit pattern

Documenting why this approach is safe in your specific use case

Deadlocks if another process tries to acquire these locks in reverse order

I am aware of that, this is why are there is a function to guarantee ordering.

FWIW, I have been running this pull request on my desk in my personal build for a week or two and have never had a deadlock. Additionally, I have not had any task corruption since running this PR series.

Data inconsistency if one modification succeeds but the other fails

I gave a specific issues serious consideration during development (because multi-file would makes the code very complicated and increases the risk of error) and decided on the compromise that multi-file transactions guarantee the following:

both files are locked, in order, before proceeding

the transform function completes atomically: if it aborts early, then neither file is written and both locks are released in a proper "stack-order"

the only possibility for corruption is if the actual file write fails, which is unlikely unless the user runs out of disk space or something but then all bets are off

no matter how you slice it, this is 1000x better than the current implementation.

src/utils/safeWriteJson.ts

daniel-lxs · 2025-07-21T16:31:04Z

src/utils/safeReadJson.ts

The error handling strategy differs between safeReadJson and safeWriteJson:

safeReadJson: Throws all errors including ENOENT

safeWriteJson: Returns undefined for ENOENT when using readModifyFn

This inconsistency could confuse API consumers. Consider standardizing the approach:

// Option 1: Always throw try { const data = await safeReadJson(path); } catch (error) { if (error.code === 'ENOENT') { // Handle missing file } } // Option 2: Return undefined for ENOENT const data = await safeReadJson(path) ?? defaultValue;

Which pattern better serves your use cases?

probably both should return undefined; it is simpler to implement and more intuitive

daniel-lxs · 2025-07-21T16:31:18Z

src/utils/safeWriteJson.ts

+			}
+
+			// Call the modify function with the current data or default
+			const modifiedData = await readModifyFn(dataToModify)


The readModifyFn pattern seems to load the entire JSON into memory before modification. For large conversation histories or task data, this could cause:

Out-of-memory errors

Performance degradation

GC pressure

Have you considered the maximum expected file sizes? For large files, you might want to:

Implement streaming modifications for specific use cases

Add file size checks with warnings

Document memory requirements

Consider pagination for conversation history

I considered that too, but the implementation becomes too complicated for a first pass.

daniel-lxs · 2025-07-21T16:31:40Z

src/utils/__tests__/safeWriteJson.test.ts

While the test suite is comprehensive for single-process scenarios, it lacks tests for concurrent access - which is the primary purpose of these utilities. Consider adding tests that:

Simulate multiple processes trying to write simultaneously

Test lock acquisition/release under race conditions

Verify data integrity with concurrent read-modify-write operations

Test behavior when locks become stale

Example approach:

test('should handle concurrent writes safely', async () => { const promises = Array(5).fill(0).map((_, i) => safeWriteJson(filePath, {}, async (data) => { data[`process${i}`] = true; await delay(Math.random() * 100); return data; }) ); await Promise.all(promises); const result = await safeReadJson(filePath); expect(Object.keys(result)).toHaveLength(5); });

that seems reasonable

KJ7LNW · 2025-07-21T21:15:10Z

@daniel-lxs I do not have the bandwidth to work on this right now so feel free to push anything that you would like to address and I can review the change if you would like.

daniel-lxs · 2025-08-18T17:27:27Z

Thanks for the contribution. At this stage, the impact of these changes isn’t clear and we’re focusing on higher-priority items. Closing for now, but we can reconsider in the future if priorities shift.

KJ7LNW requested review from cte, jr and mrubens as code owners July 10, 2025 02:09

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 10, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 10, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 10, 2025

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jul 10, 2025

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 10, 2025

KJ7LNW marked this pull request as draft July 10, 2025 02:18

KJ7LNW force-pushed the json-read-modify-write branch 2 times, most recently from 29518f6 to 045470b Compare July 10, 2025 03:01

KJ7LNW marked this pull request as ready for review July 10, 2025 03:38

KJ7LNW force-pushed the json-read-modify-write branch from 045470b to 7faf1ae Compare July 10, 2025 04:14

daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Jul 10, 2025

hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jul 10, 2025

This was referenced Jul 10, 2025

Task can't be opened #722

Closed

User Input Lost When Clicking “Reject” or “Send” in AI Workflow #4926

Closed

KJ7LNW self-assigned this Jul 11, 2025

KJ7LNW marked this pull request as draft July 11, 2025 19:29

KJ7LNW force-pushed the json-read-modify-write branch 2 times, most recently from 3b74d35 to 38eba3a Compare July 11, 2025 21:57

KJ7LNW marked this pull request as ready for review July 11, 2025 21:58

Eric Wheeler added 18 commits July 18, 2025 13:47

NOTICE: PR 5544 STARTS HERE

b3101e1

#5544

KJ7LNW force-pushed the json-read-modify-write branch from ce491d3 to fb35d23 Compare July 18, 2025 20:48

daniel-lxs reviewed Jul 21, 2025

View reviewed changes

daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Jul 21, 2025

hannesrudolph added PR - Changes Requested and removed PR - Needs Preliminary Review labels Jul 21, 2025

daniel-lxs mentioned this pull request Jul 21, 2025

fix: prevent "Current ask promise was ignored" error when cancelling tasks #5957

Closed

daniel-lxs closed this Aug 18, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 18, 2025

github-project-automation bot moved this from PR [Changes Requested] to Done in Roo Code Roadmap Aug 18, 2025

2 of 3: corruption fix: Implement atomic read-modify-write for JSON files #5544

2 of 3: corruption fix: Implement atomic read-modify-write for JSON files #5544

Uh oh!

Conversation

KJ7LNW commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependencies

Note to Reviewer

Use "Hide Whitespace"

Actual diff excluding tests:

Key Changes

Uh oh!

KJ7LNW commented Jul 11, 2025

Uh oh!

KJ7LNW commented Jul 18, 2025

Uh oh!

daniel-lxs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KJ7LNW commented Jul 21, 2025

Uh oh!

daniel-lxs commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KJ7LNW commented Jul 10, 2025 •

edited

Loading