-
Notifications
You must be signed in to change notification settings - Fork 2.6k
2 of 3: corruption fix: Implement atomic read-modify-write for JSON files #5544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
29518f6 to
045470b
Compare
045470b to
7faf1ae
Compare
3b74d35 to
38eba3a
Compare
|
rebased on main, ready to review. |
Implement safeReadJson function to complement the existing safeWriteJson functionality: - Uses stream-json for efficient processing of large JSON files - Supports both full object reading and selective path extraction - Provides file locking to prevent concurrent access - Includes comprehensive error handling - Adds complete test coverage - Passthrough all exceptions This enables efficient and safe JSON reading operations throughout the codebase. Signed-off-by: Eric Wheeler <[email protected]>
Replace manual file reading and JSON parsing with the safer safeReadJson utility across multiple files in the codebase. This change: - Provides atomic file access with proper locking to prevent race conditions - Streams file contents efficiently for better memory usage - Improves error handling consistency - Reduces code duplication Fixes: #5331 Signed-off-by: Eric Wheeler <[email protected]>
Updated test files to properly mock and use safeReadJson/safeWriteJson: - Added proper imports for safeReadJson from safeWriteJson module - Updated mock implementations to mock both functions correctly - Replaced direct fs operations with calls to safe functions - Updated assertions to match the new behavior This fixes all failing tests after the conversion to safeReadJson. Signed-off-by: Eric Wheeler <[email protected]>
- Modify safeWriteJson to accept readModifyFn parameter - Allow default values when file doesn't exist with readModifyFn - Ensure modifiable types (objects/arrays) for default values - Add tests for object and array default values, and to use the new read-modify-write pattern - Return the structure that was written to the file - Moved use-safeWriteJson.md to rules/ because it is needed by both code, architect, issue-fixer, and possibly other modes Signed-off-by: Eric Wheeler <[email protected]> safe-write: update test
These are simple modifications with clean refactors that do not yet introduce any optimization, making them easy to review. Replace direct file operations with safer atomic transactions by: - Introducing modifyApiConversationHistory and modifyClineMessages methods - Removing separate save functions in favor of callback-based modifications - Ensuring file operations are performed atomically to prevent race conditions - Maintaining consistent state between memory and persistent storage Signed-off-by: Eric Wheeler <[email protected]>
Replaced the obsolete `overwriteClineMessages` and `overwriteApiConversationHistory` methods with the atomic `modifyConversation` equivalent. This refactoring addresses potential race conditions and ensures that updates to the UI messages and API history are performed as a single, transactional operation. - All message modification logic, including index lookups and derived value calculations, now occurs within the atomic callback to guarantee data consistency. - The change preserves the existing helper function structure while adapting it to the new transactional approach. - Add modifyConversation method to Task class for atomic updates Signed-off-by: Eric Wheeler <[email protected]>
The updateApiReqMsg function was modifying the clineMessages array directly and relying on a subsequent, separate call to modifyClineMessages to persist the change. This was not atomic. This change refactors updateApiReqMsg to be an async function that encapsulates the entire transaction of updating the API request message and saving it to disk within a single modifyClineMessages call. The separate, redundant calls to modifyClineMessages have been removed. Signed-off-by: Eric Wheeler <[email protected]>
Refactors the `recursivelyMakeClineRequests` method to ensure message updates are transactional. This change replaces a direct mutation of the `clineMessages` array followed by a separate save operation with a single call to `modifyConversation`. By doing so, it guarantees that the modification of the `api_req_started` message and the addition of the user's content to the conversation history occur as a single, atomic operation, preventing potential data inconsistencies. Signed-off-by: Eric Wheeler <[email protected]>
Refactors the say() method in Task.ts to ensure message updates are atomic. The mutation logic for the last message is moved inside the modifyClineMessages() callback. This makes the find, update, and save operations a single atomic transaction, preventing race conditions and ensuring data integrity. Signed-off-by: Eric Wheeler <[email protected]>
The `ask()` method was directly mutating the `lastMessage` object before calling `modifyClineMessages`, which is not an atomic operation. This refactor moves the mutation logic inside the `modifyClineMessages` callback to ensure that finding, updating, and saving the message occurs as a single, transactional operation. Signed-off-by: Eric Wheeler <[email protected]>
The resumeTaskFromHistory method was refactored to ensure that updates to both the cline message history and the API conversation history are fully atomic. Previously, the method would read the histories, modify them in-memory, and then call the respective modify functions with the already-modified data. This approach did not guarantee atomicity. The new implementation moves the modification logic directly inside the callbacks for `modifyClineMessages` and `modifyApiConversationHistory`. This ensures that the entire read-modify-write cycle for each history is performed as a single, uninterruptible transaction, preventing potential race conditions or partial state saves. This change also involved: - Adjusting variable scopes to support the new callback structure. - Removing the now-unused `getSavedClineMessages` helper method as part of the refactor. Signed-off-by: Eric Wheeler <[email protected]>
The `readTaskMessages` function and its corresponding file have been removed. This function is now redundant because all access to the cline message history is handled by the atomic `modifyClineMessages` method. This method reads the message file within its `safeWriteJson` transaction, performs the modification, and writes the result. A separate, non-transactional read function is therefore unnecessary and has been removed to simplify the codebase. Signed-off-by: Eric Wheeler <[email protected]>
Moves the conversation summarization logic into the critical section of `modifyApiConversationHistory`. This refactoring ensures that the process of reading the existing history, summarizing it, and writing the condensed version back to disk is a single, atomic transaction. Previously, the summary was generated outside the critical section, creating a potential race condition where the history could have changed between summarization and writing. Additionally, any side effects, such as `say()` notifications, are now performed only after the atomic write operation has successfully completed, preventing deadlocks and ensuring a cleaner separation of concerns. Signed-off-by: Eric Wheeler <[email protected]>
Refactors `attemptApiRequest` to ensure the conversation history truncation is a fully atomic operation. Previously, the truncation logic was performed and then conditionally saved, which introduced a race condition and relied on an unsafe object reference comparison. This change moves the call to `truncateConversationIfNeeded` inside the `modifyApiConversationHistory` callback. This guarantees that reading the history, truncating it, and writing it back to storage happens as a single, indivisible transaction. Calls to `say` were also moved outside of the critical section to prevent side-effects during the atomic update. Additionally, the `TruncateResponse` type is now exported from the sliding-window module to satisfy type checking in `Task.ts`. Signed-off-by: Eric Wheeler <[email protected]>
Removes the final `modifyClineMessages` call from `abortTask`. This operation was redundant because all `clineMessages` mutations are already wrapped in atomic transactions, ensuring the on-disk state is always synchronized. This change simplifies the abort logic and avoids a superfluous file write. Signed-off-by: Eric Wheeler <[email protected]>
- Add vitest imports and mock safeWriteJson to support atomic transaction testing. This ensures file operations in tests are properly isolated and can be verified without actual filesystem interactions. - Add fs/promises.access mock to prevent errors when checking file existence. - Satisfy requirement that clineMessages must be defined for say/ask() to be valid Signed-off-by: Eric Wheeler <[email protected]>
Replace separate overwriteClineMessages and overwriteApiConversationHistory methods with a single modifyConversation method that ensures atomic updates to the conversation history. This change improves test reliability by ensuring conversation state is updated transactionally and simplifies test assertions by directly checking the resulting state. Signed-off-by: Eric Wheeler <[email protected]>
ce491d3 to
fb35d23
Compare
|
rebased on v3.23.14 |
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @KJ7LNW I took a look at your implementation and it looks solid, I left a couple of minor points that might be worth taking a look at.
Let me know what you think.
| if (modifiedMessages === undefined) { | ||
| throw new Error("modifyConversation: modifiedMessages is undefined after inner transaction") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modifyConversation method attempts to acquire locks on two different files through nested calls to modifyClineMessages and modifyApiConversationHistory. This pattern could lead to:
- Deadlocks if another process tries to acquire these locks in reverse order
- Data inconsistency if one modification succeeds but the other fails
Is this intentional? The proper-lockfile documentation recommends against holding multiple locks simultaneously. Consider either:
- Using a single lock file for both operations
- Implementing a two-phase commit pattern
- Documenting why this approach is safe in your specific use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deadlocks if another process tries to acquire these locks in reverse order
I am aware of that, this is why are there is a function to guarantee ordering.
FWIW, I have been running this pull request on my desk in my personal build for a week or two and have never had a deadlock. Additionally, I have not had any task corruption since running this PR series.
Data inconsistency if one modification succeeds but the other fails
I gave a specific issues serious consideration during development (because multi-file would makes the code very complicated and increases the risk of error) and decided on the compromise that multi-file transactions guarantee the following:
- both files are locked, in order, before proceeding
- the transform function completes atomically: if it aborts early, then neither file is written and both locks are released in a proper "stack-order"
- the only possibility for corruption is if the actual file write fails, which is unlikely unless the user runs out of disk space or something but then all bets are off
no matter how you slice it, this is 1000x better than the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling strategy differs between safeReadJson and safeWriteJson:
safeReadJson: Throws all errors including ENOENTsafeWriteJson: Returnsundefinedfor ENOENT when usingreadModifyFn
This inconsistency could confuse API consumers. Consider standardizing the approach:
// Option 1: Always throw
try {
const data = await safeReadJson(path);
} catch (error) {
if (error.code === 'ENOENT') {
// Handle missing file
}
}
// Option 2: Return undefined for ENOENT
const data = await safeReadJson(path) ?? defaultValue;Which pattern better serves your use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably both should return undefined; it is simpler to implement and more intuitive
| } | ||
|
|
||
| // Call the modify function with the current data or default | ||
| const modifiedData = await readModifyFn(dataToModify) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readModifyFn pattern seems to load the entire JSON into memory before modification. For large conversation histories or task data, this could cause:
- Out-of-memory errors
- Performance degradation
- GC pressure
Have you considered the maximum expected file sizes? For large files, you might want to:
- Implement streaming modifications for specific use cases
- Add file size checks with warnings
- Document memory requirements
- Consider pagination for conversation history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that too, but the implementation becomes too complicated for a first pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the test suite is comprehensive for single-process scenarios, it lacks tests for concurrent access - which is the primary purpose of these utilities. Consider adding tests that:
- Simulate multiple processes trying to write simultaneously
- Test lock acquisition/release under race conditions
- Verify data integrity with concurrent read-modify-write operations
- Test behavior when locks become stale
Example approach:
test('should handle concurrent writes safely', async () => {
const promises = Array(5).fill(0).map((_, i) =>
safeWriteJson(filePath, {}, async (data) => {
data[`process${i}`] = true;
await delay(Math.random() * 100);
return data;
})
);
await Promise.all(promises);
const result = await safeReadJson(filePath);
expect(Object.keys(result)).toHaveLength(5);
});There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seems reasonable
|
@daniel-lxs I do not have the bandwidth to work on this right now so feel free to push anything that you would like to address and I can review the change if you would like. |
|
Thanks for the contribution. At this stage, the impact of these changes isn’t clear and we’re focusing on higher-priority items. Closing for now, but we can reconsider in the future if priorities shift. |
Dependencies
When dependent pull requests have been merged, Github will show more correct line number count
Note to Reviewer
This is a PR series, so the line numbers shown by Github are exaggerated. The commit series clearly marks where each PR begins using lines that say
NOTICE: PR ____ STARTS HERE. See below for the annotated diffstat.The commits tell a clean story, it will be easier to understand what is happening here by looking at each commit individually under "Commits" than by looking at all of the files that were changed.
The changes were kept minimal with detailed human inspection focusing on transactional safety without changing the original behavior.
This is probably the best place to start your review:
modifyConversation( async modifyFn(clineMessages, apiHistory) )This function and the other two
modify*functions perform an atomic mutation to synchronizeTask.*structures with the file on disk.Use "Hide Whitespace"
Due to significant indentation changes from moving logic into atomic read-modify-write transaction callbacks, using the "hide whitespace" option in the GitHub diff viewer will help focus on the logical changes.
Actual diff excluding tests:
Key Changes
This PR enhances JSON file handling by introducing atomic read-modify-write transactions, which prevents race conditions and improves data integrity when modifying files concurrently.
safeWriteJson.tsis enhanced to support atomic read-modify-write transactions by accepting areadModifyFnparameter.Task.tsclass is refactored to introducemodifyConversation,modifyApiConversationHistory, andmodifyClineMessagesmethods, which use the new atomic pattern to safely update conversation history files.Task.tsmethods that mutate the conversation state, includingsay,ask,recursivelyMakeClineRequests,updateApiReqMsg,resumeTaskFromHistory,condenseContext, andattemptApiRequest, are refactored to use these atomic transaction methods.saveApiMessagesandsaveTaskMessagesare removed, centralizing all history mutations within the new atomic methods.ClineProviderandTaskare updated to properly mock and test the new transactional behavior.Important
Enhances JSON file handling with atomic read-modify-write transactions, refactors
Task.tsfor atomic operations, and updates tests for new functionality.safeWriteJson.tsandsafeReadJson.tsto prevent race conditions and improve data integrity.Task.tsto use new atomic methods for conversation history updates, replacing non-atomic persistence functions.modifyConversation,modifyApiConversationHistory, andmodifyClineMessagesmethods inTask.ts.safeReadJson.spec.tsandsafeWriteJson.test.ts.safeWriteJson.test.ts.saveApiMessagesandsaveTaskMessagesfromtask-persistence.MdmService.tsand related tests to usesafeReadJsonfor configuration loading.This description was created by
for 38eba3ab658d9df6b0a3b9a5b00ec290e41e87ba. You can customize this summary. It will automatically update as commits are pushed.