-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: implement read_file history deduplication (#6279) #6306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add READ_FILE_DEDUPLICATION experimental feature flag - Implement deduplicateReadFileHistory method in Task class - Integrate deduplication into readFileTool after successful reads - Add comprehensive unit tests for deduplication logic - Update readFileTool tests to include mock deduplication method This feature removes duplicate read_file entries from conversation history while preserving the most recent content for each file. It respects a 5-minute cache window and handles single files, multi-file reads, and legacy formats.
- Cast clientConfig to any when setting token property - This is unrelated to the read_file deduplication feature but needed for build to pass
- Update ExtensionStateContext test to include the new experiment flag - This ensures TypeScript types match the expected experiment structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a read_file history deduplication feature that removes duplicate file reads from the conversation history while preserving the most recent content for each file. The feature is gated behind an experimental flag and is disabled by default.
Key changes:
- Added experimental feature flag
READ_FILE_DEDUPLICATIONacross multiple configuration files - Implemented
deduplicateReadFileHistorymethod inTask.tswith a two-pass algorithm that respects a 5-minute cache window - Integrated deduplication trigger in
readFileTool.tsafter successful file reads
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/shared/experiments.ts |
Added READ_FILE_DEDUPLICATION experiment flag configuration |
packages/types/src/experiment.ts |
Added readFileDeduplication to experiment types schema |
src/core/task/Task.ts |
Implemented core deduplication logic with comprehensive file path handling |
src/core/tools/readFileTool.ts |
Added deduplication call after successful file reads |
src/core/task/__tests__/Task.spec.ts |
Added extensive test coverage for deduplication scenarios |
src/core/tools/__tests__/readFileTool.spec.ts |
Updated mock to include deduplication method |
webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx |
Updated test data to include new experiment flag |
src/shared/__tests__/experiments.spec.ts |
Added tests for new experiment configuration |
src/api/providers/bedrock.ts |
Fixed TypeScript compilation issues with AWS client config |
Comments suppressed due to low confidence (1)
src/core/task/Task.ts:342
- [nitpick] The variable name
blocksToRemovecould be more descriptive. Consider renaming tomessageBlocksToRemoveorduplicateBlocksto better indicate its purpose.
const blocksToRemove = new Map<number, Set<number>>() // messageIndex -> Set of blockIndexes to remove
| ;(clientConfig as any).token = { token: this.options.awsApiKey } | ||
| ;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems. |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using as any type assertions should be avoided. Consider properly typing the clientConfig or using a more specific type assertion that maintains type safety.
| ;(clientConfig as any).token = { token: this.options.awsApiKey } | |
| ;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems. | |
| clientConfig.token = { token: this.options.awsApiKey } | |
| clientConfig.authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems. |
| ;(clientConfig as any).token = { token: this.options.awsApiKey } | ||
| ;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems. |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using as any type assertions should be avoided. Consider properly typing the clientConfig or using a more specific type assertion that maintains type safety.
| ;(clientConfig as any).token = { token: this.options.awsApiKey } | |
| ;(clientConfig as any).authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems. | |
| clientConfig.token = { token: this.options.awsApiKey } | |
| clientConfig.authSchemePreference = ["httpBearerAuth"] // Otherwise there's no end of credential problems. |
| const resultContent = block.text.substring(block.text.indexOf("Result:") + 7).trim() | ||
|
|
||
| // Handle new XML format | ||
| const xmlFileMatches = resultContent.matchAll(/<file>\s*<path>([^<]+)<\/path>/g) |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern /<file>\s*<path>([^<]+)<\/path>/g is a magic string that could be extracted as a constant with a descriptive name for better maintainability.
| const xmlFileMatches = resultContent.matchAll(/<file>\s*<path>([^<]+)<\/path>/g) | |
| const xmlFileMatches = resultContent.matchAll(XML_FILE_PATH_REGEX) |
| const block = message.content[j] | ||
| if (block.type === "text" && typeof block.text === "string") { | ||
| // Check for read_file results in text blocks | ||
| const readFileMatch = block.text.match(/\[read_file(?:\s+for\s+'([^']+)')?.*?\]\s*Result:/i) |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern /\[read_file(?:\s+for\s+'([^']+)')?.*?\]\s*Result:/i is a magic string that could be extracted as a constant with a descriptive name for better maintainability.
| const readFileMatch = block.text.match(/\[read_file(?:\s+for\s+'([^']+)')?.*?\]\s*Result:/i) | |
| const readFileMatch = block.text.match(READ_FILE_REGEX) |
…6279) - Add readFileDeduplicationCacheMinutes to global settings - Implement UI for configuring cache time in experimental settings - Add translations for all supported languages - Update tests to include new setting - Default cache time is 5 minutes, can be set to 0 to disable
Update: Added Configurable Cache Time LimitI've added the configurable cache time limit feature as discussed. The changes include: Backend Changes:
Frontend Changes:
Translations:
Testing:
The feature is now fully configurable through the settings UI, giving users control over the deduplication time window based on their needs. |
- Changed from reactive to proactive deduplication approach - Added getRecentFileContent method to check cache before reading files - Modified readFileTool to use cached content when available - Added comprehensive tests for the new caching functionality - Fixed legacy format handling in getRecentFileContent - Updated test mocks to include new methods
- Removed cache window logic from deduplicateReadFileHistory method - Removed getRecentFileContent method from Task.ts - Removed cache-related code from readFileTool.ts - Removed readFileDeduplicationCacheMinutes setting from all type definitions - Updated tests to remove cache-related test cases - Verified all cache-related code has been removed The deduplication feature now deduplicates all duplicate read_file results regardless of their age.
…om UI - Removed from ExperimentalSettings component props and UI - Removed unused imports from ExperimentalSettings - Removed from SettingsView component - Removed from ExtensionStateContext default state - Removed from ExtensionStateContext test file
Description
Fixes #6279
This PR implements a read_file history deduplication feature that removes duplicate file reads from the conversation history while preserving the most recent content for each file. This helps reduce context size and improves efficiency when files are read multiple times during a conversation.
Changes Made
READ_FILE_DEDUPLICATIONexperimental feature flag insrc/shared/experiments.tsandpackages/types/src/experiment.tsdeduplicateReadFileHistorymethod insrc/core/task/Task.tsthat:src/core/tools/readFileTool.tsto trigger after successful file readssrc/core/task/__tests__/Task.spec.tsTesting
Verification of Acceptance Criteria
Checklist
Additional Notes
This implementation takes a fresh approach to the deduplication problem, using a clean two-pass algorithm that ensures correctness while maintaining performance. The feature is disabled by default and can be enabled through the experimental features settings.
Get in Touch
@hrudolph
Important
Implements file read history deduplication in conversations, controlled by an experimental flag, with comprehensive testing and integration into the file reading tool.
deduplicateReadFileHistoryinTask.tsto remove duplicate file reads, preserving the most recent.readFileTool.tsto trigger after file reads.READ_FILE_DEDUPLICATIONflag inexperiments.tsandexperiment.ts.Task.spec.tsfor deduplication logic, including single/multi-file reads, cache behavior, and legacy support.experiments.spec.tsby adding missing experiment IDs.This description was created by
for 935677a. You can customize this summary. It will automatically update as commits are pushed.