Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jul 28, 2025

Description

Fixes #6279

This PR implements a read_file history deduplication feature that removes duplicate file reads from the conversation history while preserving the most recent content for each file. This helps reduce context size and improves efficiency when files are read multiple times during a conversation.

Changes Made

  • Added READ_FILE_DEDUPLICATION experimental feature flag in src/shared/experiments.ts and packages/types/src/experiment.ts
  • Implemented deduplicateReadFileHistory method in src/core/task/Task.ts that:
    • Uses a two-pass approach to identify and remove duplicate file reads
    • Preserves the most recent read for each file path
    • Respects a 5-minute cache window (recent messages are not deduplicated)
    • Handles single files, multi-file reads, and legacy formats
  • Integrated deduplication into src/core/tools/readFileTool.ts to trigger after successful file reads
  • Added comprehensive unit tests in src/core/task/__tests__/Task.spec.ts
  • Updated related test files to include the new experiment flag

Testing

  • All existing tests pass
  • Added tests for deduplication logic:
    • Single file deduplication
    • Multi-file read handling
    • Legacy format support
    • 5-minute cache window behavior
    • Preservation of non-read_file content
  • Manual testing completed:
    • Feature works correctly when enabled
    • No impact when feature is disabled
    • Conversation history remains intact

Verification of Acceptance Criteria

  • Criterion 1: Deduplication removes older duplicate read_file entries while preserving the most recent
  • Criterion 2: 5-minute cache window is respected - recent reads are not deduplicated
  • Criterion 3: Multi-file reads are handled correctly as atomic units
  • Criterion 4: Legacy single-file format is supported
  • Criterion 5: Feature is behind experimental flag and disabled by default
  • Criterion 6: Non-read_file content blocks are preserved

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Comments added for complex logic
  • Documentation updated (if needed)
  • No breaking changes (or documented if any)
  • Accessibility checked (for UI changes)

Additional Notes

This implementation takes a fresh approach to the deduplication problem, using a clean two-pass algorithm that ensures correctness while maintaining performance. The feature is disabled by default and can be enabled through the experimental features settings.

Get in Touch

@hrudolph


Important

Implements file read history deduplication in conversations, controlled by an experimental flag, with comprehensive testing and integration into the file reading tool.

  • Behavior:
    • Implements deduplicateReadFileHistory in Task.ts to remove duplicate file reads, preserving the most recent.
    • Respects a 5-minute cache window; recent reads are not deduplicated.
    • Handles single, multi-file reads, and legacy formats.
    • Integrated into readFileTool.ts to trigger after file reads.
  • Feature Flag:
    • Adds READ_FILE_DEDUPLICATION flag in experiments.ts and experiment.ts.
  • Testing:
    • Adds unit tests in Task.spec.ts for deduplication logic, including single/multi-file reads, cache behavior, and legacy support.
    • Updates tests to include the new experiment flag.
  • Misc:
    • Fixes TypeScript errors in experiments.spec.ts by adding missing experiment IDs.

This description was created by Ellipsis for 935677a. You can customize this summary. It will automatically update as commits are pushed.

- Add READ_FILE_DEDUPLICATION experimental feature flag
- Implement deduplicateReadFileHistory method in Task class
- Integrate deduplication into readFileTool after successful reads
- Add comprehensive unit tests for deduplication logic
- Update readFileTool tests to include mock deduplication method

This feature removes duplicate read_file entries from conversation history
while preserving the most recent content for each file. It respects a 5-minute
cache window and handles single files, multi-file reads, and legacy formats.
- Cast clientConfig to any when setting token property
- This is unrelated to the read_file deduplication feature but needed for build to pass
- Update ExtensionStateContext test to include the new experiment flag
- This ensures TypeScript types match the expected experiment structure
Copilot AI review requested due to automatic review settings July 28, 2025 18:47
@hannesrudolph hannesrudolph requested review from cte, jr and mrubens as code owners July 28, 2025 18:47
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jul 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a read_file history deduplication feature that removes duplicate file reads from the conversation history while preserving the most recent content for each file. The feature is gated behind an experimental flag and is disabled by default.

Key changes:

  • Added experimental feature flag READ_FILE_DEDUPLICATION across multiple configuration files
  • Implemented deduplicateReadFileHistory method in Task.ts with a two-pass algorithm that respects a 5-minute cache window
  • Integrated deduplication trigger in readFileTool.ts after successful file reads

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/shared/experiments.ts Added READ_FILE_DEDUPLICATION experiment flag configuration
packages/types/src/experiment.ts Added readFileDeduplication to experiment types schema
src/core/task/Task.ts Implemented core deduplication logic with comprehensive file path handling
src/core/tools/readFileTool.ts Added deduplication call after successful file reads
src/core/task/__tests__/Task.spec.ts Added extensive test coverage for deduplication scenarios
src/core/tools/__tests__/readFileTool.spec.ts Updated mock to include deduplication method
webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx Updated test data to include new experiment flag
src/shared/__tests__/experiments.spec.ts Added tests for new experiment configuration
src/api/providers/bedrock.ts Fixed TypeScript compilation issues with AWS client config
Comments suppressed due to low confidence (1)

src/core/task/Task.ts:342

  • [nitpick] The variable name blocksToRemove could be more descriptive. Consider renaming to messageBlocksToRemove or duplicateBlocks to better indicate its purpose.
		const blocksToRemove = new Map<number, Set<number>>() // messageIndex -> Set of blockIndexes to remove

Comment on lines +227 to +228
;(clientConfig as any).token = { token: this.options.awsApiKey }
;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems.
Copy link

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using as any type assertions should be avoided. Consider properly typing the clientConfig or using a more specific type assertion that maintains type safety.

Suggested change
;(clientConfig as any).token = { token: this.options.awsApiKey }
;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems.
clientConfig.token = { token: this.options.awsApiKey }
clientConfig.authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems.

Copilot uses AI. Check for mistakes.
Comment on lines +227 to +228
;(clientConfig as any).token = { token: this.options.awsApiKey }
;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems.
Copy link

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using as any type assertions should be avoided. Consider properly typing the clientConfig or using a more specific type assertion that maintains type safety.

Suggested change
;(clientConfig as any).token = { token: this.options.awsApiKey }
;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems.
clientConfig.token = { token: this.options.awsApiKey }
clientConfig.authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems.

Copilot uses AI. Check for mistakes.
const resultContent = block.text.substring(block.text.indexOf("Result:") + 7).trim()

// Handle new XML format
const xmlFileMatches = resultContent.matchAll(/<file>\s*<path>([^<]+)<\/path>/g)
Copy link

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern /<file>\s*<path>([^<]+)<\/path>/g is a magic string that could be extracted as a constant with a descriptive name for better maintainability.

Suggested change
const xmlFileMatches = resultContent.matchAll(/<file>\s*<path>([^<]+)<\/path>/g)
const xmlFileMatches = resultContent.matchAll(XML_FILE_PATH_REGEX)

Copilot uses AI. Check for mistakes.
const block = message.content[j]
if (block.type === "text" && typeof block.text === "string") {
// Check for read_file results in text blocks
const readFileMatch = block.text.match(/\[read_file(?:\s+for\s+'([^']+)')?.*?\]\s*Result:/i)
Copy link

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern /\[read_file(?:\s+for\s+'([^']+)')?.*?\]\s*Result:/i is a magic string that could be extracted as a constant with a descriptive name for better maintainability.

Suggested change
const readFileMatch = block.text.match(/\[read_file(?:\s+for\s+'([^']+)')?.*?\]\s*Result:/i)
const readFileMatch = block.text.match(READ_FILE_REGEX)

Copilot uses AI. Check for mistakes.
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 28, 2025
…6279)

- Add readFileDeduplicationCacheMinutes to global settings
- Implement UI for configuring cache time in experimental settings
- Add translations for all supported languages
- Update tests to include new setting
- Default cache time is 5 minutes, can be set to 0 to disable
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jul 28, 2025
@hannesrudolph
Copy link
Collaborator Author

Update: Added Configurable Cache Time Limit

I've added the configurable cache time limit feature as discussed. The changes include:

Backend Changes:

  • Added readFileDeduplicationCacheMinutes to global settings (default: 5 minutes)
  • Updated the deduplication logic to use this configurable value instead of hardcoded 5 minutes
  • Setting to 0 disables time-based deduplication entirely

Frontend Changes:

  • Added UI input field in Experimental Settings that appears when READ_FILE_DEDUPLICATION is enabled
  • Users can now configure the cache time in minutes
  • Added proper validation (minimum 0, integers only)

Translations:

  • Added all necessary translation keys for the new UI elements
  • Translations completed for all 18 supported languages

Testing:

  • All tests updated and passing
  • Both backend (src) and frontend (webview-ui) test suites pass successfully

The feature is now fully configurable through the settings UI, giving users control over the deduplication time window based on their needs.

- Changed from reactive to proactive deduplication approach
- Added getRecentFileContent method to check cache before reading files
- Modified readFileTool to use cached content when available
- Added comprehensive tests for the new caching functionality
- Fixed legacy format handling in getRecentFileContent
- Updated test mocks to include new methods
- Removed cache window logic from deduplicateReadFileHistory method
- Removed getRecentFileContent method from Task.ts
- Removed cache-related code from readFileTool.ts
- Removed readFileDeduplicationCacheMinutes setting from all type definitions
- Updated tests to remove cache-related test cases
- Verified all cache-related code has been removed

The deduplication feature now deduplicates all duplicate read_file results regardless of their age.
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Jul 28, 2025
…om UI

- Removed from ExperimentalSettings component props and UI
- Removed unused imports from ExperimentalSettings
- Removed from SettingsView component
- Removed from ExtensionStateContext default state
- Removed from ExtensionStateContext test file
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 28, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Feature Proposal: Implement read_file history deduplication to increase context quality and longevity

2 participants