Feature Proposal: Implement `read_file` history deduplication to increase context quality and longevity

## What specific problem does this solve?

When large language models work on tasks that require re-reading files multiple times, the `read_file` results accumulate in the conversation history (`apiConversationHistory`). This causes several problems:

**Who is affected:** All users working with Roo on tasks that involve multiple file reads, especially those working with large codebases or long-running tasks.

**When this happens:** 
- During iterative development where the AI needs to reference the same files repeatedly
- When debugging issues that require checking file contents multiple times
- During refactoring tasks that involve reading and modifying the same files

**Current behavior:** Each time a file is read, the complete file content is added to the conversation history, even if the same file was read recently. This leads to:
- Rapid consumption of the context window (e.g., reading a 500-line file 5 times uses 2,500 lines of context)
- Potential confusion when the AI sees multiple versions of the same file
- Reduced effectiveness as important context gets pushed out by redundant file reads

## Current behavior

Currently, every `read_file` operation adds its complete result to the conversation history, regardless of whether that exact file was read before. There is no deduplication mechanism in place.

## Proposed solution

Implement a deduplication mechanism that keeps only the most recent read of each file in the conversation history. When the experimental feature `READ_FILE_DEDUPLICATION` is enabled, the system will automatically remove older reads of the same file whenever a new read occurs.

**Key aspects:**
- Deduplication happens immediately after each file read
- Only the most recent read of each file is preserved
- The feature is opt-in via experimental settings
- Works with both single and multi-file read operations

## Impact

**Who benefits:** All Roo users, especially those working on:
- Large codebases where files are frequently re-read
- Long-running tasks that reference the same files multiple times
- Iterative development workflows
- Debugging sessions that require checking file state repeatedly

**How it helps:**
- **Increased context efficiency:** Reduces redundant information in conversation history
- **Extended conversation longevity:** Tasks can run longer before hitting context limits
- **Improved accuracy:** AI sees only the most recent file state, reducing confusion
- **Better performance:** Smaller context means faster processing

## Technical Context

Based on codebase analysis:
- The `apiConversationHistory` stores all tool results including file reads
- File reads are stored as text blocks with format: `[read_file for 'path'] Result:`
- The experimental feature system is already in place
- Tests exist expecting a `deduplicateReadFileHistory` method but it's not implemented

**Implementation approach:**
1. Add `READ_FILE_DEDUPLICATION` to experimental features
2. Implement `deduplicateReadFileHistory` method in Task class
3. Call deduplication after successful file reads in `readFileTool`

## 🔍 Comprehensive Issue Scoping

### Root Cause / Implementation Target
When large language models re-read files multiple times during a conversation, the read_file results accumulate in the `apiConversationHistory`, causing excessive context consumption and potential confusion. The system currently lacks a mechanism to deduplicate these redundant file reads.

### Affected Components
- **Primary Files:**
  - `src/shared/experiments.ts`: Add new experimental feature flag
  - `src/core/task/Task.ts` (lines ~330-350): Add deduplicateReadFileHistory method
  - `src/core/tools/readFileTool.ts` (lines ~610-615): Integrate deduplication call

- **Secondary Impact:**
  - Test files that mock or use Task class
  - Any tools that rely on conversation history structure
  - Experimental settings UI

### Current Implementation Analysis
The system uses `apiConversationHistory` to store all API messages. When `readFileTool` executes, it adds results using `pushToolResult`, which creates text blocks in user messages with format: `[read_file ...] Result:` followed by XML-structured file content. These accumulate without any deduplication.

### Proposed Implementation

#### Step 1: Add experimental feature flag
- File: `src/shared/experiments.ts`
- Changes: Add READ_FILE_DEDUPLICATION to EXPERIMENT_IDS and experimentConfigsMap
- Rationale: Allows safe opt-in testing without affecting all users

#### Step 2: Add deduplicateReadFileHistory method to Task class
- File: `src/core/task/Task.ts`
- Changes: Add public method that checks experimental flag and iterates through apiConversationHistory in reverse, removing older reads of files that appear multiple times
- Rationale: Keeps only the most recent read of each file while preserving message structure

#### Step 3: Integrate deduplication into readFileTool
- File: `src/core/tools/readFileTool.ts`
- Changes: Call `cline.deduplicateReadFileHistory()` after successful file reads
- Rationale: Ensures deduplication happens immediately after new reads are added

### Code Architecture Considerations
- Follow existing patterns for experimental features (see POWER_STEERING implementation)
- Follow existing patterns for message manipulation (see `overwriteApiConversationHistory`)
- Preserve message structure integrity
- Handle both single and multi-file read operations
- Ensure deduplication works across all file read patterns

### Testing Requirements
- Unit Tests:
  - [ ] Test experimental flag enables/disables feature
  - [ ] Test basic deduplication with duplicate file reads
  - [ ] Test multi-file read handling
  - [ ] Test message structure preservation
  - [ ] Test edge cases (empty messages, malformed content)
  - [ ] Test deduplication with mixed file operations
- Integration Tests:
  - [ ] Test readFileTool integration
  - [ ] Test performance with large conversation histories

### Performance Impact
- Expected performance change: Minimal (O(n) traversal of messages)
- Optimization: Stop processing once all unique files are found
- Memory impact: Reduced due to smaller conversation history

### Security Considerations
- No security implications - only reorganizes existing data
- No external data exposure
- No authentication/authorization changes

### Migration Strategy
Not applicable - feature is backwards compatible and handles existing conversation formats.

### Rollback Plan
Feature can be disabled by toggling off the experimental setting in the UI.

### Dependencies and Breaking Changes
- No external dependencies affected
- No API contract changes
- No breaking changes for users

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Proposal: Implement `read_file` history deduplication to increase context quality and longevity #6279

What specific problem does this solve?

Current behavior

Proposed solution

Impact

Technical Context

🔍 Comprehensive Issue Scoping

Root Cause / Implementation Target

Affected Components

Current Implementation Analysis

Proposed Implementation

Step 1: Add experimental feature flag

Step 2: Add deduplicateReadFileHistory method to Task class

Step 3: Integrate deduplication into readFileTool

Code Architecture Considerations

Testing Requirements

Performance Impact

Security Considerations

Migration Strategy

Rollback Plan

Dependencies and Breaking Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Proposal: Implement read_file history deduplication to increase context quality and longevity #6279

Description

What specific problem does this solve?

Current behavior

Proposed solution

Impact

Technical Context

🔍 Comprehensive Issue Scoping

Root Cause / Implementation Target

Affected Components

Current Implementation Analysis

Proposed Implementation

Step 1: Add experimental feature flag

Step 2: Add deduplicateReadFileHistory method to Task class

Step 3: Integrate deduplication into readFileTool

Code Architecture Considerations

Testing Requirements

Performance Impact

Security Considerations

Migration Strategy

Rollback Plan

Dependencies and Breaking Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Proposal: Implement `read_file` history deduplication to increase context quality and longevity #6279