-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: implement read_file history deduplication (#6279) #6316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add READ_FILE_DEDUPLICATION experimental feature flag - Implement deduplicateReadFileHistory method in Task class - Integrate deduplication into recursivelyMakeClineRequests flow - Add comprehensive unit tests for deduplication logic - Handle both inter-message and intra-message deduplication - Preserve non-read_file content when removing duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a read_file history deduplication feature to optimize context window usage by removing duplicate file reads from conversation history. The feature is controlled by an experimental flag READ_FILE_DEDUPLICATION and only keeps the most recent occurrence of each file read.
Key changes:
- Added new experimental feature flag
READ_FILE_DEDUPLICATIONwith proper configuration - Implemented deduplication logic in the
Taskclass that handles both inter-message and intra-message duplicates - Integrated deduplication into the conversation flow after messages are added to history
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/shared/experiments.ts |
Added READ_FILE_DEDUPLICATION experiment configuration |
packages/types/src/experiment.ts |
Added readFileDeduplication to experiment types and schema |
src/core/task/Task.ts |
Implemented core deduplication logic and integration into conversation flow |
src/core/task/__tests__/Task.spec.ts |
Added comprehensive unit tests for deduplication functionality |
src/shared/__tests__/experiments.spec.ts |
Added tests for the new experiment configuration |
webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx |
Updated test data to include new experiment flag |
| const headerMatch = text.match(/\[read_file for '([^']+)'\]/) | ||
| if (headerMatch && paths.length === 0) { | ||
| paths.push(headerMatch[1]) |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern only matches single-quoted file paths but the test cases show multi-file reads use double quotes and comma separation like "[read_file for 'file1.ts', 'file2.ts']". This will miss file paths in multi-file scenarios.
| const headerMatch = text.match(/\[read_file for '([^']+)'\]/) | |
| if (headerMatch && paths.length === 0) { | |
| paths.push(headerMatch[1]) | |
| const headerMatch = text.match(/\[read_file for ['"]([^'"]+(?:,\s*[^'"]+)*)['"]\]/) | |
| if (headerMatch && paths.length === 0) { | |
| const extractedPaths = headerMatch[1].split(',').map((filePath) => filePath.trim()); | |
| paths.push(...extractedPaths); |
| // Also handle legacy format where path might be in the header | ||
| const headerMatch = text.match(/\[read_file for '([^']+)'\]/) | ||
| if (headerMatch && paths.length === 0) { | ||
| paths.push(headerMatch[1]) |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy format handling should extract all file paths from the header, not just the first one. For multi-file reads like "[read_file for 'file1.ts', 'file2.ts']", this will only capture 'file1.ts'.
| // Also handle legacy format where path might be in the header | |
| const headerMatch = text.match(/\[read_file for '([^']+)'\]/) | |
| if (headerMatch && paths.length === 0) { | |
| paths.push(headerMatch[1]) | |
| // Also handle legacy format where paths might be in the header | |
| const legacyFormatRegex = /\[read_file for '([^']+?)'(?:, '([^']+?)')*\]/g | |
| let legacyMatch | |
| while ((legacyMatch = legacyFormatRegex.exec(text)) !== null) { | |
| // Extract all file paths from the match | |
| const matchedPaths = legacyMatch[0].match(/'([^']+)'/g)?.map((p) => p.replace(/'/g, "")) | |
| if (matchedPaths) { | |
| paths.push(...matchedPaths) | |
| } |
Related GitHub Issue
Closes: #6279
Roo Code Task Context (Optional)
No Roo Code task context for this PR
Description
This PR implements the
read_filehistory deduplication feature as described in issue #6279. The implementation focuses on reducing context window usage by removing duplicate file reads from the conversation history.Key implementation details:
READ_FILE_DEDUPLICATIONthat can be enabled to activate this featurededuplicateReadFileHistorymethod in theTaskclass that:read_fileresults based on file pathsrecursivelyMakeClineRequestsafter messages are added to the conversation historyDesign choices:
read_fileformat (without thefileswrapper) is also supported for backward compatibilityTest Procedure
Unit Tests:
src/core/task/__tests__/Task.spec.tscovering:Manual Testing:
READ_FILE_DEDUPLICATION=truein your environmentTest Commands:
All tests pass successfully.
Pre-Submission Checklist
Screenshots / Videos
No UI changes in this PR
Documentation Updates
Additional Notes
This feature is behind an experimental flag and will not affect users unless explicitly enabled. The deduplication logic has been thoroughly tested to ensure it doesn't accidentally remove important context while optimizing the conversation history.
Get in Touch
Please contact via GitHub
Important
Implements
read_filehistory deduplication inTask.tswith a feature flag, optimizing context usage and adding comprehensive tests.read_filehistory deduplication inTask.tsto optimize context window usage.READ_FILE_DEDUPLICATIONfeature flag inexperiment.tsandexperiments.ts.deduplicateReadFileHistory()inTaskclass to remove duplicate file reads, keeping the most recent.read_fileformat for backward compatibility.recursivelyMakeClineRequests()inTask.ts.Task.spec.tsfor various deduplication scenarios, including single/multi-file reads and legacy format.experiments.spec.tsto test the new feature flag configuration.This description was created by
for 2712cba. You can customize this summary. It will automatically update as commits are pushed.