Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jul 28, 2025

Related GitHub Issue

Closes: #6279

Roo Code Task Context (Optional)

No Roo Code task context for this PR

Description

This PR implements the read_file history deduplication feature as described in issue #6279. The implementation focuses on reducing context window usage by removing duplicate file reads from the conversation history.

Key implementation details:

  • Added a new experimental feature flag READ_FILE_DEDUPLICATION that can be enabled to activate this feature
  • Implemented deduplicateReadFileHistory method in the Task class that:
    • Identifies duplicate read_file results based on file paths
    • Keeps only the most recent occurrence of each file read
    • Handles both inter-message and intra-message deduplication (multiple reads in the same message)
    • Preserves other content types (images, text) when removing duplicate reads from messages
  • Integrated the deduplication logic into recursivelyMakeClineRequests after messages are added to the conversation history
  • The deduplication runs after each AI response to ensure the context stays optimized

Design choices:

  • Deduplication happens after adding to conversation history rather than before, ensuring we have the complete context
  • The implementation handles complex cases where multiple file reads exist within a single user message
  • Legacy read_file format (without the files wrapper) is also supported for backward compatibility

Test Procedure

Unit Tests:

  • Added comprehensive unit tests in src/core/task/__tests__/Task.spec.ts covering:
    • Basic deduplication of single file reads
    • Multi-file read handling
    • Legacy format support
    • Intra-message deduplication (multiple reads in same message)
    • Preservation of non-read_file content
    • Edge cases like empty history and string content

Manual Testing:

  1. Enable the experiment by setting READ_FILE_DEDUPLICATION=true in your environment
  2. Start a conversation and ask the AI to read the same file multiple times (e.g., "read the package-lock.json file 10 times")
  3. Observe that the conversation history only keeps the most recent read of each file
  4. Verify that other content types (images, regular messages) are preserved

Test Commands:

# Run unit tests
cd src && npx vitest run core/task/__tests__/Task.spec.ts

# Run experiment tests
cd src && npx vitest run shared/__tests__/experiments.spec.ts

All tests pass successfully.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

No UI changes in this PR

Documentation Updates

  • No documentation updates are required.

Additional Notes

This feature is behind an experimental flag and will not affect users unless explicitly enabled. The deduplication logic has been thoroughly tested to ensure it doesn't accidentally remove important context while optimizing the conversation history.

Get in Touch

Please contact via GitHub


Important

Implements read_file history deduplication in Task.ts with a feature flag, optimizing context usage and adding comprehensive tests.

  • Feature:
    • Introduces read_file history deduplication in Task.ts to optimize context window usage.
    • Adds READ_FILE_DEDUPLICATION feature flag in experiment.ts and experiments.ts.
  • Implementation:
    • Implements deduplicateReadFileHistory() in Task class to remove duplicate file reads, keeping the most recent.
    • Handles inter-message and intra-message deduplication, preserving non-read_file content.
    • Supports legacy read_file format for backward compatibility.
  • Integration:
    • Integrates deduplication logic into recursivelyMakeClineRequests() in Task.ts.
  • Testing:
    • Adds unit tests in Task.spec.ts for various deduplication scenarios, including single/multi-file reads and legacy format.
    • Updates experiments.spec.ts to test the new feature flag configuration.
    • Ensures all tests pass successfully.

This description was created by Ellipsis for 2712cba. You can customize this summary. It will automatically update as commits are pushed.

- Add READ_FILE_DEDUPLICATION experimental feature flag
- Implement deduplicateReadFileHistory method in Task class
- Integrate deduplication into recursivelyMakeClineRequests flow
- Add comprehensive unit tests for deduplication logic
- Handle both inter-message and intra-message deduplication
- Preserve non-read_file content when removing duplicates
Copilot AI review requested due to automatic review settings July 28, 2025 22:27
@hannesrudolph hannesrudolph requested review from cte, jr and mrubens as code owners July 28, 2025 22:27
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jul 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a read_file history deduplication feature to optimize context window usage by removing duplicate file reads from conversation history. The feature is controlled by an experimental flag READ_FILE_DEDUPLICATION and only keeps the most recent occurrence of each file read.

Key changes:

  • Added new experimental feature flag READ_FILE_DEDUPLICATION with proper configuration
  • Implemented deduplication logic in the Task class that handles both inter-message and intra-message duplicates
  • Integrated deduplication into the conversation flow after messages are added to history

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/shared/experiments.ts Added READ_FILE_DEDUPLICATION experiment configuration
packages/types/src/experiment.ts Added readFileDeduplication to experiment types and schema
src/core/task/Task.ts Implemented core deduplication logic and integration into conversation flow
src/core/task/__tests__/Task.spec.ts Added comprehensive unit tests for deduplication functionality
src/shared/__tests__/experiments.spec.ts Added tests for the new experiment configuration
webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx Updated test data to include new experiment flag

Comment on lines +469 to +471
const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
if (headerMatch && paths.length === 0) {
paths.push(headerMatch[1])
Copy link

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern only matches single-quoted file paths but the test cases show multi-file reads use double quotes and comma separation like "[read_file for 'file1.ts', 'file2.ts']". This will miss file paths in multi-file scenarios.

Suggested change
const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
if (headerMatch && paths.length === 0) {
paths.push(headerMatch[1])
const headerMatch = text.match(/\[read_file for ['"]([^'"]+(?:,\s*[^'"]+)*)['"]\]/)
if (headerMatch && paths.length === 0) {
const extractedPaths = headerMatch[1].split(',').map((filePath) => filePath.trim());
paths.push(...extractedPaths);

Copilot uses AI. Check for mistakes.
Comment on lines +468 to +471
// Also handle legacy format where path might be in the header
const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
if (headerMatch && paths.length === 0) {
paths.push(headerMatch[1])
Copy link

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy format handling should extract all file paths from the header, not just the first one. For multi-file reads like "[read_file for 'file1.ts', 'file2.ts']", this will only capture 'file1.ts'.

Suggested change
// Also handle legacy format where path might be in the header
const headerMatch = text.match(/\[read_file for '([^']+)'\]/)
if (headerMatch && paths.length === 0) {
paths.push(headerMatch[1])
// Also handle legacy format where paths might be in the header
const legacyFormatRegex = /\[read_file for '([^']+?)'(?:, '([^']+?)')*\]/g
let legacyMatch
while ((legacyMatch = legacyFormatRegex.exec(text)) !== null) {
// Extract all file paths from the match
const matchedPaths = legacyMatch[0].match(/'([^']+)'/g)?.map((p) => p.replace(/'/g, ""))
if (matchedPaths) {
paths.push(...matchedPaths)
}

Copilot uses AI. Check for mistakes.
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 28, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Feature Proposal: Implement read_file history deduplication to increase context quality and longevity

2 participants