-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What specific problem does this solve?
When large language models work on tasks that require re-reading files multiple times, the read_file results accumulate in the conversation history (apiConversationHistory). This causes several problems:
Who is affected: All users working with Roo on tasks that involve multiple file reads, especially those working with large codebases or long-running tasks.
When this happens:
- During iterative development where the AI needs to reference the same files repeatedly
- When debugging issues that require checking file contents multiple times
- During refactoring tasks that involve reading and modifying the same files
Current behavior: Each time a file is read, the complete file content is added to the conversation history, even if the same file was read recently. This leads to:
- Rapid consumption of the context window (e.g., reading a 500-line file 5 times uses 2,500 lines of context)
- Potential confusion when the AI sees multiple versions of the same file
- Reduced effectiveness as important context gets pushed out by redundant file reads
Current behavior
Currently, every read_file operation adds its complete result to the conversation history, regardless of whether that exact file was read before. There is no deduplication mechanism in place.
Proposed solution
Implement a deduplication mechanism that keeps only the most recent read of each file in the conversation history. When the experimental feature READ_FILE_DEDUPLICATION is enabled, the system will automatically remove older reads of the same file whenever a new read occurs.
Key aspects:
- Deduplication happens immediately after each file read
- Only the most recent read of each file is preserved
- The feature is opt-in via experimental settings
- Works with both single and multi-file read operations
Impact
Who benefits: All Roo users, especially those working on:
- Large codebases where files are frequently re-read
- Long-running tasks that reference the same files multiple times
- Iterative development workflows
- Debugging sessions that require checking file state repeatedly
How it helps:
- Increased context efficiency: Reduces redundant information in conversation history
- Extended conversation longevity: Tasks can run longer before hitting context limits
- Improved accuracy: AI sees only the most recent file state, reducing confusion
- Better performance: Smaller context means faster processing
Technical Context
Based on codebase analysis:
- The
apiConversationHistorystores all tool results including file reads - File reads are stored as text blocks with format:
[read_file for 'path'] Result: - The experimental feature system is already in place
- Tests exist expecting a
deduplicateReadFileHistorymethod but it's not implemented
Implementation approach:
- Add
READ_FILE_DEDUPLICATIONto experimental features - Implement
deduplicateReadFileHistorymethod in Task class - Call deduplication after successful file reads in
readFileTool
🔍 Comprehensive Issue Scoping
Root Cause / Implementation Target
When large language models re-read files multiple times during a conversation, the read_file results accumulate in the apiConversationHistory, causing excessive context consumption and potential confusion. The system currently lacks a mechanism to deduplicate these redundant file reads.
Affected Components
-
Primary Files:
src/shared/experiments.ts: Add new experimental feature flagsrc/core/task/Task.ts(lines ~330-350): Add deduplicateReadFileHistory methodsrc/core/tools/readFileTool.ts(lines ~610-615): Integrate deduplication call
-
Secondary Impact:
- Test files that mock or use Task class
- Any tools that rely on conversation history structure
- Experimental settings UI
Current Implementation Analysis
The system uses apiConversationHistory to store all API messages. When readFileTool executes, it adds results using pushToolResult, which creates text blocks in user messages with format: [read_file ...] Result: followed by XML-structured file content. These accumulate without any deduplication.
Proposed Implementation
Step 1: Add experimental feature flag
- File:
src/shared/experiments.ts - Changes: Add READ_FILE_DEDUPLICATION to EXPERIMENT_IDS and experimentConfigsMap
- Rationale: Allows safe opt-in testing without affecting all users
Step 2: Add deduplicateReadFileHistory method to Task class
- File:
src/core/task/Task.ts - Changes: Add public method that checks experimental flag and iterates through apiConversationHistory in reverse, removing older reads of files that appear multiple times
- Rationale: Keeps only the most recent read of each file while preserving message structure
Step 3: Integrate deduplication into readFileTool
- File:
src/core/tools/readFileTool.ts - Changes: Call
cline.deduplicateReadFileHistory()after successful file reads - Rationale: Ensures deduplication happens immediately after new reads are added
Code Architecture Considerations
- Follow existing patterns for experimental features (see POWER_STEERING implementation)
- Follow existing patterns for message manipulation (see
overwriteApiConversationHistory) - Preserve message structure integrity
- Handle both single and multi-file read operations
- Ensure deduplication works across all file read patterns
Testing Requirements
- Unit Tests:
- Test experimental flag enables/disables feature
- Test basic deduplication with duplicate file reads
- Test multi-file read handling
- Test message structure preservation
- Test edge cases (empty messages, malformed content)
- Test deduplication with mixed file operations
- Integration Tests:
- Test readFileTool integration
- Test performance with large conversation histories
Performance Impact
- Expected performance change: Minimal (O(n) traversal of messages)
- Optimization: Stop processing once all unique files are found
- Memory impact: Reduced due to smaller conversation history
Security Considerations
- No security implications - only reorganizes existing data
- No external data exposure
- No authentication/authorization changes
Migration Strategy
Not applicable - feature is backwards compatible and handles existing conversation formats.
Rollback Plan
Feature can be disabled by toggling off the experimental setting in the UI.
Dependencies and Breaking Changes
- No external dependencies affected
- No API contract changes
- No breaking changes for users
Metadata
Metadata
Assignees
Labels
Type
Projects
Status