Skip to content

Conversation

@KJ7LNW
Copy link
Contributor

@KJ7LNW KJ7LNW commented Jun 16, 2025

Context

Closes #4311

This PR reintroduces the safeWriteJson utility from PR #3772 with critical improvements to fix the race condition identified in #4468 that caused the original implementation to be reverted in #4471.

The safeWriteJson utility addresses issue #722 by preventing task data corruption through atomic writes, proper file locking, and error handling. This ensures task files remain consistent even during crashes or concurrent access, making tasks reliably accessible.

Problem

The original implementation of safeWriteJson in PR #3772 had a race condition where it would fail with ENOENT errors during lock acquisition when a parent directory was just created. This happened because:

  1. The saveApiMessages function would call getTaskDirectoryPath to create the task directory with { recursive: true }
  2. Immediately after, it would call safeWriteJson with the file path
  3. The proper-lockfile library would perform an lstat operation on the target file path during lock acquisition
  4. This would fail with ENOENT when the directory creation hadn't fully synchronized with the filesystem

Solution

This PR includes the fix from commit 3880518e4e128020327c7df74af4ee401e1400c6 which ensures directories exist and are fully synchronized before lock acquisition by:

  1. Creating directories with fs.mkdir({ recursive: true }) within safeWriteJson itself
  2. Verifying access to created directories before attempting lock acquisition
  3. Setting realpath: false in lock options to handle non-existent files

These changes prevent the race condition that was causing issues in #4468 and ensure reliable operation even on slower file systems or under high load conditions.

Testing

Added comprehensive tests for directory creation capabilities including:

  • Creating files in non-existent directories
  • Handling multi-level directory creation
  • Testing error conditions
  • Confirming correct behavior with existing directories

All tests pass successfully.

Fixes: #4468
See-also: #4471, #3772, #722


Important

Reintroduces safeWriteJson utility for atomic JSON writes, fixing race conditions and integrating it across the codebase with comprehensive tests.

  • Behavior:
    • Reintroduces safeWriteJson utility to handle JSON file writes atomically, fixing race conditions from previous implementation.
    • Ensures directories exist before writing and uses file locking to prevent concurrent write issues.
  • Integration:
    • Replaces fs.writeFile with safeWriteJson in modelCache.ts, modelEndpointCache.ts, and importExport.ts.
    • Updates FileContextTracker.ts, apiMessages.ts, and taskMessages.ts to use safeWriteJson.
    • Modifies webviewMessageHandler.ts to create mcp.json using safeWriteJson.
    • Updates cache-manager.ts and McpHub.ts to use safeWriteJson for cache and server config writes.
  • Testing:
    • Adds tests in safeWriteJson.test.ts to cover directory creation, error handling, and rollback scenarios.
    • Mocks safeWriteJson in McpHub.test.ts to verify integration without actual file writes.

This description was created by Ellipsis for 64403b74ae71c6275a2dae76ec1df32e64964d90. You can customize this summary. It will automatically update as commits are pushed.

@KJ7LNW KJ7LNW requested review from cte, jr and mrubens as code owners June 16, 2025 00:46
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 16, 2025
@KJ7LNW KJ7LNW self-assigned this Jun 16, 2025
@KJ7LNW KJ7LNW moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jun 16, 2025
@dosubot dosubot bot added the bug Something isn't working label Jun 16, 2025
@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 16, 2025

@daniel-lxs, here is my rework that fixes #3772 with regard to #4468

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 16, 2025

@cte, do you understand this test error? I am not able to reproduce it on my desk. A full clean followed by pnpm test is always successful, but CI is triggering the following:

 roo-cline:test: roo-cline:bundle: [extension] Cleaning dist directory: /home/runner/work/Roo-Code/Roo-Code/src/dist
roo-cline:test: roo-cline:bundle: Error: ENOTEMPTY: directory not empty, rmdir '/home/runner/work/Roo-Code/Roo-Code/src/dist'
roo-cline:test: roo-cline:bundle:     at Object.rmdirSync (node:fs:1215:11)
roo-cline:test: roo-cline:bundle:     at _rmdirSync (node:internal/fs/rimraf:262:21)
roo-cline:test: roo-cline:bundle:     at rimrafSync (node:internal/fs/rimraf:195:7)
roo-cline:test: roo-cline:bundle:     at Module.rmSync (node:fs:1264:10)
roo-cline:test: roo-cline:bundle:     at main (file:///home/runner/work/Roo-Code/Roo-Code/src/esbuild.mjs:39:6)
roo-cline:test: roo-cline:bundle:     at file:///home/runner/work/Roo-Code/Roo-Code/src/esbuild.mjs:130:1
roo-cline:test: roo-cline:bundle:     at ModuleJob.run (node:internal/modules/esm/module_job:263:25)
roo-cline:test: roo-cline:bundle:     at async ModuleLoader.import (node:internal/modules/esm/loader:540:24)
roo-cline:test: roo-cline:bundle:     at async asyncRunEntryPointWithESMLoader (node:internal/modules/run_main:117:5) {
roo-cline:test: roo-cline:bundle:   errno: -39,
roo-cline:test: roo-cline:bundle:   code: 'ENOTEMPTY',
roo-cline:test: roo-cline:bundle:   syscall: 'rmdir',
roo-cline:test: roo-cline:bundle:   path: '/home/runner/work/Roo-Code/Roo-Code/src/dist'
roo-cline:test: roo-cline:bundle: }
roo-cline:test: roo-cline:bundle:  ELIFECYCLECommand failed with exit code 1.

according to Gemini-2.5:

The ENOTEMPTY error is caused by a race condition in the CI environment. The pnpm test command, executed by the platform-unit-test job, uses turbo to run tasks for multiple packages in parallel.

The build script, src/esbuild.mjs, is being executed concurrently by different packages (e.g., roo-cline and @roo-code/types). This script is not safe for concurrent execution because it cleans and writes to the same shared directory, src/dist.

The error occurs when one process attempts to delete the src/dist directory while another process has already started writing new build artifacts back into it. This race condition doesn't typically occur in a local environment where turbo might run the tasks serially due to caching or different resource availability.

To resolve this, you should either modify src/esbuild.mjs to allow each package to build to a unique output directory or configure turbo.json to force the conflicting bundle tasks to run serially.

I am not sure what to do about that, and fixing the CI race is out of scope for this PR, so FYI because others will probably hit this too.

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 16, 2025

@daniel-lxs: Ignoring the unrelated CI test failure that I asked @cte about above, this is ready for review:

# pnpm test
image

@daniel-lxs
Copy link
Member

Hey @KJ7LNW, the implementation looks good, I threw everything I had at it and it seems to be working.
I also ran all the integration tests and couldn't see any issues.

I tried solving the conflicts but I couldn't write to this branch, can you take a look?

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Jun 17, 2025
@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 17, 2025

Hey @KJ7LNW, the implementation looks good, I threw everything I had at it and it seems to be working. I also ran all the integration tests and couldn't see any issues.

I tried solving the conflicts but I couldn't write to this branch, can you take a look?

sure I can do that

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 17, 2025

Hey @KJ7LNW, the implementation looks good, I threw everything I had at it and it seems to be working. I also ran all the integration tests and couldn't see any issues.

I tried solving the conflicts but I couldn't write to this branch, can you take a look?

how far did you get in your fixes? I am having a very difficult time converting jest to vitest. any words of wisdom there?

@daniel-lxs
Copy link
Member

daniel-lxs commented Jun 17, 2025

@KJ7LNW
I solved all the conflicts, but I don't have them anymore.
I was thinking of sharing a patch but it was quite big.

I honestly thought it would take you a few minutes at most, my bad.

To fix them I just rebased and modified the actual conflicts (your new tests) to use vitest, had a bit of help from Opus 4.

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 17, 2025

I gave you push access and sent an invite for your user.

do you still have the task available to tell the model to redo the changes ?

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 17, 2025

the tests that are in conflict are probably fine but the most recent release removed jest completely, and I am having trouble with converting https://github.com/RooCodeInc/Roo-Code/blob/104b645e3c29ce329923c6aac01f0d6c383112e9/src/utils/__tests__/safeWriteJson.test.ts to use vitest

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 17, 2025

@daniel-lxs standby , @cte something something that might help ...

@KJ7LNW KJ7LNW force-pushed the use-safe-write-json-for-all-files branch from 104b645 to f2b457f Compare June 17, 2025 21:52
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 17, 2025
@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 22, 2025

I can confirm this issue still occurs.

I wasn't able to find any formatting options for Stringer in the stream-json package. The only approaches I found involved using the stringify method instead.

@KJ7LNW, do you have any other ideas?

To reduce memory pressure and increase performance, stream-json writes the structure directly to disk without first passing through a large string buffer. This is particularly important for very large conversation objects.

I think last time this change also wrote unformatted JSON (you could see it when adding a MCP from the marketplace) - is that behavior consistent with the current behavior now?

@mrubens, to answer your question: Currently in Roo, this behavior varies throughout the code base: for example, existing large objects in tasks/<uuid>/*.json are not formatted, however, smaller objects like task_metadata.json are formatted.

# pwd
~/.config/Code/User/globalStorage/rooveterinaryinc.roo-cline/tasks
# ls -l 37b0417d-65c3-4a48-8c9c-77d7ee556f3c/
total 188
-rw-------. 1 ewheeler ewheeler 95890 Jun 21 12:00 api_conversation_history.json
drwx------. 3 ewheeler ewheeler  4096 Jun 21 11:57 checkpoints
-rw-------. 1 ewheeler ewheeler  1438 Jun 21 11:58 task_metadata.json
-rw-------. 1 ewheeler ewheeler 82283 Jun 21 11:59 ui_messages.json

Stream-json does not appear to support pretty-printing. Here are some options, in order of increasing complexity:

  1. If none of Roo's JSON files are intended to be user-serviceable, drop support for pretty printing and merge as-is.
  2. Support pretty printing, but fall back to string-buffers for that purpose
  3. Write a custom hook for disassembler.pipe(stringer).pipe(fileWriteStream) to support pretty-printing (and retain disabled pretty-printing for structures that are known to be large for better performance and reduced memory pressure).

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 23, 2025

This PR likely solves at least the following issues:

#722: Task can't be opened
#4603: Entire chat history disappears after canceling a task.
#4542: Parts of a conversation are deleted when starting a new task.
#4482: Historical tasks are deleted on a fresh start.
#3391: Tasks completely vanish from history and logs.
#2540: Disappearing conversation history on remote (WSL2) servers.
#2397: Deleting a message from history causes data loss.
#2115: roo code automatically removes chat history

Here is a test build for anyone who wishes to try this PR:

If you try it out please report back.

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 23, 2025

@mrubens this is a summary of all .json files that I could find in the project, and the MCP configuration in .roo/ is the only one that appears to be user serviceable. The .roo/ MCP configuration file is small, should we revert to the pretty-printed version and leave everything else transactional with safeWriteJson ?

critical for consistency

  • Task Files

    • <taskDir>/api_conversation_history.json - API message history
    • <taskDir>/ui_messages.json - UI message history
    • <taskDir>/task_metadata.json - Task context information
  • Cache Files

    • <cachePath.fsPath> - Code index file hashes
    • <cacheDir>/<filename> - Model cache data
    • <cacheDir>/<filename> - Model endpoint cache

less critical

  • Export Files
    • <uri.fsPath> - Exported provider profiles and settings

Non-critical

  • Config Files
    • <globalSettingsPath>/mcp_settings.json - Global MCP settings
      - <workspaceFolder>/.roo/mcp.json - Workspace MCP server config

Unless you can think of other user-serviceable files, I think the last (bold) one is the only one we should pretty-print.

let me know what you think

@daniel-lxs
Copy link
Member

should we revert to the pretty-printed version and leave everything else transactional with safeWriteJson ?

That sounds like a good idea!

@mrubens
Copy link
Collaborator

mrubens commented Jun 24, 2025

Sounds good!

Support user-serviceable .roo/mcp.json file with pretty-printing

- Remove safeWriteJson in favor of direct JSON.stringify to pretty-print
  user serviceable MCP configuration files

Cc: @mrubens

Signed-off-by: Eric Wheeler <[email protected]>
@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 24, 2025

Sounds good!

Applied in 0f0432c

Was that the last issue to solve before merging?

@KJ7LNW KJ7LNW moved this from PR [Changes Requested] to PR [Needs Review] in Roo Code Roadmap Jun 24, 2025
@mrubens mrubens merged commit 8455909 into RooCodeInc:main Jun 25, 2025
13 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 25, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Jun 25, 2025
Alorse pushed a commit to Alorse/Roo-Code that referenced this pull request Jun 27, 2025
hannesrudolph pushed a commit that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

The conversation records cannot be saved bug: API conversation history file truncated to empty array during task resume

4 participants