Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Oct 16, 2025

Related GitHub Issue

Closes: #8690

Roo Code Task Context (Optional)

This PR was created with assistance from Roo Code.

Description

This PR addresses the storage bloat issue where read_file tool contents were being unnecessarily persisted in ui_messages.json, causing excessive storage usage and potential UI failures (gray screen).

Key implementation details:

  • Added a new sanitizeMessagesForUIStorage() function that strips large file contents from messages before saving
  • The sanitization preserves essential metadata (paths, line ranges) while removing the actual file content
  • Implemented backward compatibility via purgeFileContentsFromMessages() to clean up existing bloated messages during rehydration
  • Used configurable constants for content truncation thresholds for better maintainability

The solution is non-breaking as:

  • API conversation history remains untouched (used for actual rehydration)
  • UI still displays file paths and metadata
  • Existing tasks with bloated messages are automatically cleaned up

Test Procedure

Testing performed:

  1. Added comprehensive unit tests in src/core/task-persistence/__tests__/sanitizeMessages.spec.ts covering:
    • Single file read sanitization
    • Batch file read sanitization
    • Non-file messages preservation
    • Edge cases (empty content, missing fields)
  2. All tests passing (11 test cases)
  3. Ran existing task-persistence tests - no regressions
  4. Ran full Task module tests - all passing

To verify the fix:

  1. Create a task that uses read_file on large files
  2. Check ui_messages.json - file contents should show [content stripped for storage] instead of full content
  3. Rehydrate the task - should work normally (api_conversation_history.json has full content)
  4. Load an old task with bloated messages - should be automatically cleaned up

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Not applicable - this is a backend storage optimization with no UI changes.

Documentation Updates

  • No documentation updates are required.

The fix is transparent to users and developers - it's an internal optimization that doesn't change any public APIs or user-facing behavior.

Additional Notes

This fix significantly reduces storage usage for tasks that read large files. In the reported issue, a ui_messages.json file that was 183MB can now be reduced to just a few KB while maintaining full functionality.

Get in Touch

Available via GitHub for any questions about this PR.


Important

Adds message sanitization to prevent large file contents from being stored in ui_messages.json, reducing storage bloat and ensuring backward compatibility.

  • Behavior:
    • Adds sanitizeMessagesForUIStorage() in sanitizeMessages.ts to strip large file contents from readFile tool messages, preserving metadata.
    • Introduces purgeFileContentsFromMessages() for backward compatibility, cleaning up existing messages during rehydration.
    • Updates readTaskMessages() and saveTaskMessages() in taskMessages.ts to use new sanitization functions.
  • Tests:
    • Adds sanitizeMessages.spec.ts with tests for single and batch file reads, non-file messages, and edge cases.
  • Constants:
    • Defines CONTENT_TRUNCATION_LENGTH and STRIPPED_CONTENT_MARKER in sanitizeMessages.ts for content handling.

This description was created by Ellipsis for d8c2a0f. You can customize this summary. It will automatically update as commits are pushed.

- Add sanitization function to strip large file contents from messages
- Sanitize messages before saving to ui_messages.json
- Add backward compatibility to purge contents from existing messages
- Add comprehensive tests for sanitization logic
- Extract constants for better maintainability

Fixes #8690
@roomote roomote bot requested review from cte, jr and mrubens as code owners October 16, 2025 23:09
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Oct 16, 2025
@roomote
Copy link
Contributor Author

roomote bot commented Oct 16, 2025

Review Summary

I've reviewed the pull request and identified the following issues that should be addressed:

Issues to Address

  • Remove unused import of ClineSayTool in sanitizeMessages.ts
  • Fix inconsistent sanitization behavior between single and batch file reads (single files preserve content ≤100 chars, batch files always strip content)

Overall Assessment

The PR successfully addresses the core issue of preventing read_file contents from bloating ui_messages.json. The implementation is well-tested with comprehensive test coverage. However, there are minor improvements needed around code cleanliness and consistency in the sanitization logic.

@@ -0,0 +1,86 @@
import type { ClineMessage } from "@roo-code/types"
import type { ClineSayTool } from "../../shared/ExtensionMessage"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is unused. The ClineSayTool type is never referenced in this file.

Comment on lines +60 to +71
// Handle batch file reads
if (sanitized.batchFiles && Array.isArray(sanitized.batchFiles)) {
sanitized.batchFiles = sanitized.batchFiles.map((file: any) => {
const sanitizedFile = { ...file }
// Remove the actual file content, keep only metadata
// Add type checking for content field
if ("content" in sanitizedFile && typeof sanitizedFile.content === "string") {
delete sanitizedFile.content
}
return sanitizedFile
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent sanitization between single and batch file reads. Single file reads preserve content if ≤100 characters (line 55), but batch file reads always strip all content regardless of size (lines 66-67). This means the same small file would be saved when read individually but stripped when read as part of a batch, leading to inconsistent UI behavior. Consider applying the same length-based threshold to batch files or documenting this intentional difference.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 16, 2025

if (fileExists) {
return JSON.parse(await fs.readFile(filePath, "utf8"))
const messages = JSON.parse(await fs.readFile(filePath, "utf8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap JSON.parse in try/catch to handle corrupt or malformed ui_messages.json files and log the error to prevent crashes.

This comment was generated because it violated a code review rule: irule_PTI8rjtnhwrWq6jS.

@mrubens mrubens closed this Oct 17, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Oct 17, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[BUG] read_file contents persisted in ui_messages.json cause bloat/gray screen

4 participants