Skip to content

Conversation

@Mnehmos
Copy link
Contributor

@Mnehmos Mnehmos commented Jun 10, 2025

PR for feature/4009-pr2-file-read-caching

Related GitHub Issue

Closes: #4009

Description

This PR introduces a file read caching service to improve performance by caching the content of files that have been read. This helps to avoid re-reading the same file multiple times during a conversation.

This PR specifically addresses the following feedback from the original implementation:

  • Memory management: A cache size limit and a First-In-First-Out (FIFO) eviction policy have been implemented using an LRU cache to prevent memory issues.
  • Error handling: Proper error handling has been added for file system operations when checking the modification time of a file.

Test Procedure

  1. Run pnpm test to execute all tests and ensure that the new tests for the caching service pass and that there are no regressions.
  2. Manually test file reading scenarios, including cache hits, cache misses, and scenarios where files are modified or deleted, to ensure the cache behaves as expected.

Type of Change

  • ✨ New Feature: Non-breaking change that adds functionality.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

N/A

Documentation Updates

None required.

Additional Notes

This is the second of three PRs to address the work in issue #4009.


Important

Introduces a file read caching service with LRU cache, integrates it with existing tools, and adds tests for improved performance and error handling.

  • Caching Service:
    • Introduces fileReadCacheService.ts for caching file contents with LRU cache.
    • Implements processAndFilterReadRequest() to manage cache hits and misses.
  • Error Handling:
    • Adds error handling for file system operations in fileReadCacheService.ts.
  • Integration:
    • Integrates caching with readFileTool.ts, applyDiffTool.ts, and writeToFileTool.ts.
    • Updates readFileTool.ts to use cache results for reading files.
  • Testing:
    • Adds tests in fileReadCacheService.spec.ts and readFileTool.test.ts for caching logic.
  • Miscellaneous:
    • Adds lruCache.ts utility for cache management.
    • Updates esbuild.mjs to clean assets directory.

This description was created by Ellipsis for 2071612. You can customize this summary. It will automatically update as commits are pushed.

@Mnehmos Mnehmos requested review from cte, jr and mrubens as code owners June 10, 2025 15:20
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 10, 2025
// Track file read
await cline.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource)

const stats = fs.statSync(fullPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using synchronous fs.statSync inside an async function to prevent blocking the event loop. Consider using fs.promises.stat for a non‐blocking alternative.

Suggested change
const stats = fs.statSync(fullPath)
const stats = await fs.promises.stat(fullPath)

Mnehmos added 2 commits June 10, 2025 09:34
- Implement MemoryAwareCache with 100MB limit and LRU eviction
- Fix syntax error in processAndFilterReadRequest function
- Add proper error handling for file permissions (EACCES, EPERM)
- Handle file deletion scenarios by removing from cache
- Add logging for cache evictions and errors
- Update imports to use fs/promises for test compatibility

All tests passing (12/12)
@daniel-lxs daniel-lxs changed the title Feature/4009 pr2 file read caching feat: Add file read caching to prevent redundant reads in conversation history Jun 10, 2025
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 10, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jun 10, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Mnehmos, thank you for taking this issue.

Overall I think it's a good idea to figure out if the issue needs a complex implementation like this to prevent reads on files that were recently read.

Keeping the content of the files in a cache to determine if a file read needs to be rejected might be a bit of an overkill.

I would like to hear your thoughts about this.

const isMultipleReadsEnabled = maxConcurrentReads > 1

return `## read_file
Description: Request to read the contents of ${isMultipleReadsEnabled ? "one or more files" : "a file"}. The tool outputs line-numbered content (e.g. "1 | const x = 1") for easy reference when creating diffs or discussing code.${args.partialReadsEnabled ? " Use line ranges to efficiently read specific portions of large files." : ""} Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find mentions about why partial reads are being permanently enabled. Is this change intentional? since partial reads can be disabled in the settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intentional and addresses Issue #4009. Let me clarify what's actually happening:

The Problem Being Solved:
The "Always read entire file" setting (maxReadFileLine = -1) was prohibiting line-range reads entirely, forcing users to always read complete files even when they had specific line numbers from:

git grep -n results
Compiler/linter error messages
search_files output
Manual diffs with line references
What This Change Does:

Preserves existing behavior: When no <line_range> is specified, entire files are still read
Adds intelligent choice: Model can now choose line ranges when contextually appropriate
Maintains the setting's intent: "Always read entire file" becomes the default, not an absolute restriction
Technical Detail:
Previously: partialReadsEnabled = maxReadFileLine !== -1 meant unlimited readers couldn't see line-range options
Now: Line ranges are always available in the tool interface, letting the model make smart decisions based on context

This transforms a rigid limitation into flexible intelligence - the model gets entire files by default but can target specific lines when it has line numbers to work with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simpler way of checking if the recently read file hasn't changed? on codebase indexing we use hashes to verify that the content of the file hasn't changed. If the file read is being rejected do we need to keep a cache of the whole file or would keeping a hash be a better option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the code, I need to clarify the current implementation:

Current Implementation Reality:

The cache does NOT store full file content
It only stores metadata: { mtime: string, size: number }
Cache decisions are based on conversation history analysis + mtime comparison
Memory tracking is for metadata size limits, not content storage
Your Hash Suggestion Benefits:

More Reliable: Hashes detect actual content changes vs mtime manipulation
Already Proven: Works well in your codebase indexing
Potentially Simpler: Could replace mtime + conversation history analysis
Current Approach Issues:

mtime comparison can miss cases where file content changes but timestamp is preserved
Conversation history parsing is complex
Still requires file stat calls
Hash-Based Alternative:

interface HashCacheEntry {
hash: string; // Content hash
lastRead: number; // Timestamp of last read
lineRanges: LineRange[]; // Ranges read at this hash
}

typescript

Trade-off Question:
The hash approach requires reading files to generate hashes, which adds I/O cost. However, it provides stronger content change detection than mtime alone.

Would the hash generation cost be acceptable given the improved reliability and potential to simplify the conversation history analysis logic?

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Jun 10, 2025
@daniel-lxs daniel-lxs marked this pull request as draft June 10, 2025 18:07
@Mnehmos Mnehmos closed this Jun 12, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 12, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Jun 12, 2025
@Mnehmos Mnehmos deleted the feature/4009-pr2-file-read-caching branch June 12, 2025 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Draft / In Progress size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

"Always read entire file" setting prevents line-range reads

3 participants