feat: implement read_file history deduplication (#6279) #6316

hannesrudolph · 2025-07-28T22:27:20Z

Related GitHub Issue

Closes: #6279

Roo Code Task Context (Optional)

No Roo Code task context for this PR

Description

This PR implements the read_file history deduplication feature as described in issue #6279. The implementation focuses on reducing context window usage by removing duplicate file reads from the conversation history.

Key implementation details:

Added a new experimental feature flag READ_FILE_DEDUPLICATION that can be enabled to activate this feature
Implemented deduplicateReadFileHistory method in the Task class that:
- Identifies duplicate read_file results based on file paths
- Keeps only the most recent occurrence of each file read
- Handles both inter-message and intra-message deduplication (multiple reads in the same message)
- Preserves other content types (images, text) when removing duplicate reads from messages
Integrated the deduplication logic into recursivelyMakeClineRequests after messages are added to the conversation history
The deduplication runs after each AI response to ensure the context stays optimized

Design choices:

Deduplication happens after adding to conversation history rather than before, ensuring we have the complete context
The implementation handles complex cases where multiple file reads exist within a single user message
Legacy read_file format (without the files wrapper) is also supported for backward compatibility

Test Procedure

Unit Tests:

Added comprehensive unit tests in src/core/task/__tests__/Task.spec.ts covering:
- Basic deduplication of single file reads
- Multi-file read handling
- Legacy format support
- Intra-message deduplication (multiple reads in same message)
- Preservation of non-read_file content
- Edge cases like empty history and string content

Manual Testing:

Enable the experiment by setting READ_FILE_DEDUPLICATION=true in your environment
Start a conversation and ask the AI to read the same file multiple times (e.g., "read the package-lock.json file 10 times")
Observe that the conversation history only keeps the most recent read of each file
Verify that other content types (images, regular messages) are preserved

Test Commands:

# Run unit tests
cd src && npx vitest run core/task/__tests__/Task.spec.ts

# Run experiment tests
cd src && npx vitest run shared/__tests__/experiments.spec.ts

All tests pass successfully.

Pre-Submission Checklist

Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
Scope: My changes are focused on the linked issue (one major feature/fix per PR).
Self-Review: I have performed a thorough self-review of my code.
Testing: New and/or updated tests have been added to cover my changes (if applicable).
Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

No UI changes in this PR

Documentation Updates

No documentation updates are required.

Additional Notes

This feature is behind an experimental flag and will not affect users unless explicitly enabled. The deduplication logic has been thoroughly tested to ensure it doesn't accidentally remove important context while optimizing the conversation history.

Get in Touch

Please contact via GitHub

Important

Implements read_file history deduplication in Task.ts with a feature flag, optimizing context usage and adding comprehensive tests.

Feature:
- Introduces read_file history deduplication in Task.ts to optimize context window usage.
- Adds READ_FILE_DEDUPLICATION feature flag in experiment.ts and experiments.ts.
Implementation:
- Implements deduplicateReadFileHistory() in Task class to remove duplicate file reads, keeping the most recent.
- Handles inter-message and intra-message deduplication, preserving non-read_file content.
- Supports legacy read_file format for backward compatibility.
Integration:
- Integrates deduplication logic into recursivelyMakeClineRequests() in Task.ts.
Testing:
- Adds unit tests in Task.spec.ts for various deduplication scenarios, including single/multi-file reads and legacy format.
- Updates experiments.spec.ts to test the new feature flag configuration.
- Ensures all tests pass successfully.

^{This description was created by}^{for 2712cba. You can customize this summary. It will automatically update as commits are pushed.}

- Add READ_FILE_DEDUPLICATION experimental feature flag - Implement deduplicateReadFileHistory method in Task class - Integrate deduplication into recursivelyMakeClineRequests flow - Add comprehensive unit tests for deduplication logic - Handle both inter-message and intra-message deduplication - Preserve non-read_file content when removing duplicates

Copilot

Pull Request Overview

This PR implements a read_file history deduplication feature to optimize context window usage by removing duplicate file reads from conversation history. The feature is controlled by an experimental flag READ_FILE_DEDUPLICATION and only keeps the most recent occurrence of each file read.

Key changes:

Added new experimental feature flag READ_FILE_DEDUPLICATION with proper configuration
Implemented deduplication logic in the Task class that handles both inter-message and intra-message duplicates
Integrated deduplication into the conversation flow after messages are added to history

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/shared/experiments.ts`	Added READ_FILE_DEDUPLICATION experiment configuration
`packages/types/src/experiment.ts`	Added readFileDeduplication to experiment types and schema
`src/core/task/Task.ts`	Implemented core deduplication logic and integration into conversation flow
`src/core/task/__tests__/Task.spec.ts`	Added comprehensive unit tests for deduplication functionality
`src/shared/__tests__/experiments.spec.ts`	Added tests for the new experiment configuration
`webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx`	Updated test data to include new experiment flag

Copilot · 2025-07-28T22:27:52Z

src/core/task/Task.ts

+		const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
+		if (headerMatch && paths.length === 0) {
+			paths.push(headerMatch[1])


The regex pattern only matches single-quoted file paths but the test cases show multi-file reads use double quotes and comma separation like "[read_file for 'file1.ts', 'file2.ts']". This will miss file paths in multi-file scenarios.

Suggested change

const headerMatch = text.match(/\[read_file for '([^']+)'\]/)

if (headerMatch && paths.length === 0) {

paths.push(headerMatch[1])

const headerMatch = text.match(/\[read_file for ['"]([^'"]+(?:,\s*[^'"]+)*)['"]\]/)

if (headerMatch && paths.length === 0) {

const extractedPaths = headerMatch[1].split(',').map((filePath) => filePath.trim());

paths.push(...extractedPaths);

Copilot · 2025-07-28T22:27:53Z

src/core/task/Task.ts

+		// Also handle legacy format where path might be in the header
+		const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
+		if (headerMatch && paths.length === 0) {
+			paths.push(headerMatch[1])


The legacy format handling should extract all file paths from the header, not just the first one. For multi-file reads like "[read_file for 'file1.ts', 'file2.ts']", this will only capture 'file1.ts'.

Suggested change

// Also handle legacy format where path might be in the header

const headerMatch = text.match(/\[read_file for '([^']+)'\]/)

if (headerMatch && paths.length === 0) {

paths.push(headerMatch[1])

// Also handle legacy format where paths might be in the header

const legacyFormatRegex = /\[read_file for '([^']+?)'(?:, '([^']+?)')*\]/g

let legacyMatch

while ((legacyMatch = legacyFormatRegex.exec(text)) !== null) {

// Extract all file paths from the match

const matchedPaths = legacyMatch[0].match(/'([^']+)'/g)?.map((p) => p.replace(/'/g, ""))

if (matchedPaths) {

paths.push(...matchedPaths)

}

Copilot AI review requested due to automatic review settings July 28, 2025 22:27

hannesrudolph requested review from cte, jr and mrubens as code owners July 28, 2025 22:27

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 28, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 28, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 28, 2025

hannesrudolph mentioned this pull request Jul 28, 2025

Feature Proposal: Implement read_file history deduplication to increase context quality and longevity #6279

Closed

8 tasks

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jul 28, 2025

Copilot AI reviewed Jul 28, 2025

View reviewed changes

hannesrudolph closed this Jul 28, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 28, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement read_file history deduplication (#6279) #6316

feat: implement read_file history deduplication (#6279) #6316

Uh oh!

hannesrudolph commented Jul 28, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 28, 2025

Uh oh!

Copilot AI Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-		// Also handle legacy format where path might be in the header
-		const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
-		if (headerMatch && paths.length === 0) {
-			paths.push(headerMatch[1])
+		// Also handle legacy format where paths might be in the header
+		const legacyFormatRegex = /\[read_file for '([^']+?)'(?:, '([^']+?)')*\]/g
+		let legacyMatch
+		while ((legacyMatch = legacyFormatRegex.exec(text)) !== null) {
+			// Extract all file paths from the match
+			const matchedPaths = legacyMatch[0].match(/'([^']+)'/g)?.map((p) => p.replace(/'/g, ""))
+			if (matchedPaths) {
+				paths.push(...matchedPaths)
+			}

feat: implement read_file history deduplication (#6279) #6316

feat: implement read_file history deduplication (#6279) #6316

Uh oh!

Conversation

hannesrudolph commented Jul 28, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Roo Code Task Context (Optional)

Description

Test Procedure

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hannesrudolph commented Jul 28, 2025 •

edited by ellipsis-dev bot

Loading