Skip to content
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/core/config/envConfig.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
function getFromEnv(key: string, defaultValue: string): string {
const value = process.env[key]
return value === undefined ? defaultValue : value
}

export const ROO_AGENT_CONFIG = {
fileReadCacheSize: () => parseInt(getFromEnv("ROO_FILE_READ_CACHE_SIZE", "100")),
}
228 changes: 144 additions & 84 deletions src/core/prompts/__tests__/__snapshots__/system.test.ts.snap

Large diffs are not rendered by default.

28 changes: 10 additions & 18 deletions src/core/prompts/tools/read-file.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,22 @@ export function getReadFileDescription(args: ToolArgs): string {
const isMultipleReadsEnabled = maxConcurrentReads > 1

return `## read_file
Description: Request to read the contents of ${isMultipleReadsEnabled ? "one or more files" : "a file"}. The tool outputs line-numbered content (e.g. "1 | const x = 1") for easy reference when creating diffs or discussing code.${args.partialReadsEnabled ? " Use line ranges to efficiently read specific portions of large files." : ""} Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find mentions about why partial reads are being permanently enabled. Is this change intentional? since partial reads can be disabled in the settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intentional and addresses Issue #4009. Let me clarify what's actually happening:

The Problem Being Solved:
The "Always read entire file" setting (maxReadFileLine = -1) was prohibiting line-range reads entirely, forcing users to always read complete files even when they had specific line numbers from:

git grep -n results
Compiler/linter error messages
search_files output
Manual diffs with line references
What This Change Does:

Preserves existing behavior: When no <line_range> is specified, entire files are still read
Adds intelligent choice: Model can now choose line ranges when contextually appropriate
Maintains the setting's intent: "Always read entire file" becomes the default, not an absolute restriction
Technical Detail:
Previously: partialReadsEnabled = maxReadFileLine !== -1 meant unlimited readers couldn't see line-range options
Now: Line ranges are always available in the tool interface, letting the model make smart decisions based on context

This transforms a rigid limitation into flexible intelligence - the model gets entire files by default but can target specific lines when it has line numbers to work with.

Description: Request to read the contents of ${isMultipleReadsEnabled ? "one or more files" : "a file"}. The tool outputs line-numbered content (e.g. "1 | const x = 1") for easy reference when creating diffs or discussing code. Use line ranges to efficiently read specific portions of large files. Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.

${isMultipleReadsEnabled ? `**IMPORTANT: You can read a maximum of ${maxConcurrentReads} files in a single request.** If you need to read more files, use multiple sequential read_file requests.` : "**IMPORTANT: Multiple file reads are currently disabled. You can only read one file at a time.**"}

${args.partialReadsEnabled ? `By specifying line ranges, you can efficiently read specific portions of large files without loading the entire file into memory.` : ""}
By specifying line ranges, you can efficiently read specific portions of large files without loading the entire file into memory.
Parameters:
- args: Contains one or more file elements, where each file contains:
- path: (required) File path (relative to workspace directory ${args.cwd})
${args.partialReadsEnabled ? `- line_range: (optional) One or more line range elements in format "start-end" (1-based, inclusive)` : ""}
- line_range: (optional) One or more line range elements in format "start-end" (1-based, inclusive)

Usage:
<read_file>
<args>
<file>
<path>path/to/file</path>
${args.partialReadsEnabled ? `<line_range>start-end</line_range>` : ""}
<line_range>start-end</line_range>
</file>
</args>
</read_file>
Expand All @@ -32,7 +32,7 @@ Examples:
<args>
<file>
<path>src/app.ts</path>
${args.partialReadsEnabled ? `<line_range>1-1000</line_range>` : ""}
<line_range>1-1000</line_range>
</file>
</args>
</read_file>
Expand All @@ -44,16 +44,12 @@ ${isMultipleReadsEnabled ? `2. Reading multiple files (within the ${maxConcurren
<args>
<file>
<path>src/app.ts</path>
${
args.partialReadsEnabled
? `<line_range>1-50</line_range>
<line_range>100-150</line_range>`
: ""
}
<line_range>1-50</line_range>
<line_range>100-150</line_range>
</file>
<file>
<path>src/utils.ts</path>
${args.partialReadsEnabled ? `<line_range>10-20</line_range>` : ""}
<line_range>10-20</line_range>
</file>
</args>
</read_file>`
Expand All @@ -72,14 +68,10 @@ ${isMultipleReadsEnabled ? "3. " : "2. "}Reading an entire file:
IMPORTANT: You MUST use this Efficient Reading Strategy:
- ${isMultipleReadsEnabled ? `You MUST read all related files and implementations together in a single operation (up to ${maxConcurrentReads} files at once)` : "You MUST read files one at a time, as multiple file reads are currently disabled"}
- You MUST obtain all necessary context before proceeding with changes
${
args.partialReadsEnabled
? `- You MUST use line ranges to read specific portions of large files, rather than reading entire files when not needed
${`- You MUST use line ranges to read specific portions of large files, rather than reading entire files when not needed
- You MUST combine adjacent line ranges (<10 lines apart)
- You MUST use multiple ranges for content separated by >10 lines
- You MUST include sufficient line context for planned modifications while keeping ranges minimal
`
: ""
}
`}
${isMultipleReadsEnabled ? `- When you need to read more than ${maxConcurrentReads} files, prioritize the most critical files first, then use subsequent read_file requests for additional files` : ""}`
}
140 changes: 140 additions & 0 deletions src/core/services/__tests__/fileReadCacheService.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
import { vi, describe, it, expect, beforeEach } from "vitest"
import {
processAndFilterReadRequest,
subtractRange,
subtractRanges,
ConversationMessage,
} from "../fileReadCacheService"
import { stat } from "fs/promises"
import { lruCache } from "../../utils/lruCache"
vi.mock("fs/promises", () => ({
stat: vi.fn(),
}))
vi.mock("../../utils/lruCache")
vi.mock("../../config/envConfig", () => ({
ROO_AGENT_CONFIG: {
fileReadCacheSize: () => 10,
},
}))
const mockedStat = vi.mocked(stat)
describe("fileReadCacheService", () => {
describe("subtractRange", () => {
it("should return the original range if there is no overlap", () => {
const original = { start: 1, end: 10 }
const toRemove = { start: 11, end: 20 }
expect(subtractRange(original, toRemove)).toEqual([original])
})
it("should return an empty array if the range is completely removed", () => {
const original = { start: 1, end: 10 }
const toRemove = { start: 1, end: 10 }
expect(subtractRange(original, toRemove)).toEqual([])
})
it("should subtract from the beginning", () => {
const original = { start: 1, end: 10 }
const toRemove = { start: 1, end: 5 }
expect(subtractRange(original, toRemove)).toEqual([{ start: 6, end: 10 }])
})
it("should subtract from the end", () => {
const original = { start: 1, end: 10 }
const toRemove = { start: 6, end: 10 }
expect(subtractRange(original, toRemove)).toEqual([{ start: 1, end: 5 }])
})
it("should subtract from the middle, creating two new ranges", () => {
const original = { start: 1, end: 10 }
const toRemove = { start: 4, end: 6 }
expect(subtractRange(original, toRemove)).toEqual([
{ start: 1, end: 3 },
{ start: 7, end: 10 },
])
})
})
describe("subtractRanges", () => {
it("should subtract multiple ranges from a single original range", () => {
const originals = [{ start: 1, end: 20 }]
const toRemoves = [
{ start: 1, end: 5 },
{ start: 15, end: 20 },
]
expect(subtractRanges(originals, toRemoves)).toEqual([{ start: 6, end: 14 }])
})
})
describe("processAndFilterReadRequest", () => {
const MOCK_FILE_PATH = "/test/file.txt"
const CURRENT_MTIME = new Date().toISOString()
beforeEach(() => {
vi.clearAllMocks()
mockedStat.mockResolvedValue({ mtime: { toISOString: () => CURRENT_MTIME } } as any)
})
it("should allow all when history is empty", async () => {
const requestedRanges = [{ start: 1, end: 10 }]
const result = await processAndFilterReadRequest(MOCK_FILE_PATH, requestedRanges, [])
expect(result.status).toBe("ALLOW_ALL")
expect(result.rangesToRead).toEqual(requestedRanges)
})
it("should reject all when a full cache hit occurs", async () => {
const requestedRanges = [{ start: 1, end: 10 }]
const conversationHistory: ConversationMessage[] = [
{
files: [
{
fileName: MOCK_FILE_PATH,
mtime: new Date(CURRENT_MTIME).getTime(),
lineRanges: [{ start: 1, end: 10 }],
},
],
} as any,
]
const result = await processAndFilterReadRequest(MOCK_FILE_PATH, requestedRanges, conversationHistory)
expect(result.status).toBe("REJECT_ALL")
expect(result.rangesToRead).toEqual([])
})
it("should allow partial when a partial cache hit occurs", async () => {
const requestedRanges = [{ start: 1, end: 20 }]
const conversationHistory: ConversationMessage[] = [
{
files: [
{
fileName: MOCK_FILE_PATH,
mtime: new Date(CURRENT_MTIME).getTime(),
lineRanges: [{ start: 1, end: 10 }],
},
],
} as any,
]
const result = await processAndFilterReadRequest(MOCK_FILE_PATH, requestedRanges, conversationHistory)
expect(result.status).toBe("ALLOW_PARTIAL")
expect(result.rangesToRead).toEqual([{ start: 11, end: 20 }])
})
it("should allow all when mtime is older in history", async () => {
const requestedRanges = [{ start: 1, end: 10 }]
const conversationHistory: ConversationMessage[] = [
{
files: [
{
fileName: MOCK_FILE_PATH,
mtime: new Date(CURRENT_MTIME).getTime() - 100, // Older mtime
lineRanges: [{ start: 1, end: 10 }],
},
],
} as any,
]
const result = await processAndFilterReadRequest(MOCK_FILE_PATH, requestedRanges, conversationHistory)
expect(result.status).toBe("ALLOW_ALL")
expect(result.rangesToRead).toEqual(requestedRanges)
})
it("should allow all when file does not exist", async () => {
mockedStat.mockRejectedValue({ code: "ENOENT" })
const requestedRanges = [{ start: 1, end: 10 }]
const result = await processAndFilterReadRequest(MOCK_FILE_PATH, requestedRanges, [])
expect(result.status).toBe("ALLOW_ALL")
expect(result.rangesToRead).toEqual(requestedRanges)
})
it("should throw an error for non-ENOENT stat errors", async () => {
const error = new Error("EPERM")
mockedStat.mockRejectedValue(error)
const requestedRanges = [{ start: 1, end: 10 }]
const result = await processAndFilterReadRequest(MOCK_FILE_PATH, requestedRanges, [])
expect(result.status).toBe("ALLOW_ALL") // Fallback to allow all
})
})
})
169 changes: 169 additions & 0 deletions src/core/services/fileReadCacheService.ts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simpler way of checking if the recently read file hasn't changed? on codebase indexing we use hashes to verify that the content of the file hasn't changed. If the file read is being rejected do we need to keep a cache of the whole file or would keeping a hash be a better option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the code, I need to clarify the current implementation:

Current Implementation Reality:

The cache does NOT store full file content
It only stores metadata: { mtime: string, size: number }
Cache decisions are based on conversation history analysis + mtime comparison
Memory tracking is for metadata size limits, not content storage
Your Hash Suggestion Benefits:

More Reliable: Hashes detect actual content changes vs mtime manipulation
Already Proven: Works well in your codebase indexing
Potentially Simpler: Could replace mtime + conversation history analysis
Current Approach Issues:

mtime comparison can miss cases where file content changes but timestamp is preserved
Conversation history parsing is complex
Still requires file stat calls
Hash-Based Alternative:

interface HashCacheEntry {
hash: string; // Content hash
lastRead: number; // Timestamp of last read
lineRanges: LineRange[]; // Ranges read at this hash
}

typescript

Trade-off Question:
The hash approach requires reading files to generate hashes, which adds I/O cost. However, it provides stronger content change detection than mtime alone.

Would the hash generation cost be acceptable given the improved reliability and potential to simplify the conversation history analysis logic?

Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
import fs from "fs"
import { lruCache } from "../utils/lruCache"
import { ROO_AGENT_CONFIG } from "../config/envConfig"

const CACHE_SIZE = ROO_AGENT_CONFIG.fileReadCacheSize()

// Types
export interface LineRange {
start: number
end: number
}

export interface FileMetadata {
fileName: string
mtime: number
lineRanges: LineRange[]
}

export interface ConversationMessage {
role: "user" | "assistant"
content: any
ts: number
tool?: {
name: string
options: any
}
files?: FileMetadata[]
}

type CacheResult =
| { status: "ALLOW_ALL"; rangesToRead: LineRange[] }
| { status: "ALLOW_PARTIAL"; rangesToRead: LineRange[] }
| { status: "REJECT_ALL"; rangesToRead: LineRange[] }

// Initialize a new LRU cache for file modification times.
const mtimeCache = new lruCache<string, string>(CACHE_SIZE)

/**
* Checks if two line ranges overlap.
* @param r1 - The first line range.
* @param r2 - The second line range.
* @returns True if the ranges overlap, false otherwise.
*/
function rangesOverlap(r1: LineRange, r2: LineRange): boolean {
return r1.start <= r2.end && r1.end >= r2.start
}

/**
* Subtracts one line range from another.
* @param from - The range to subtract from.
* @param toSubtract - The range to subtract.
* @returns An array of ranges remaining after subtraction.
*/
export function subtractRange(from: LineRange, toSubtract: LineRange): LineRange[] {
// No overlap
if (from.end < toSubtract.start || from.start > toSubtract.end) {
return [from]
}
const remainingRanges: LineRange[] = []
// Part of 'from' is before 'toSubtract'
if (from.start < toSubtract.start) {
remainingRanges.push({ start: from.start, end: toSubtract.start - 1 })
}
// Part of 'from' is after 'toSubtract'
if (from.end > toSubtract.end) {
remainingRanges.push({ start: toSubtract.end + 1, end: from.end })
}
return remainingRanges
}

/**
* Subtracts a set of ranges from another set of ranges.
*/
export function subtractRanges(originals: LineRange[], toRemoves: LineRange[]): LineRange[] {
let remaining = [...originals]

for (const toRemove of toRemoves) {
remaining = remaining.flatMap((original) => subtractRange(original, toRemove))
}

return remaining
}

/**
*
* @param filePath The path to the file to get the mtime for.
* @returns The mtime of the file as an ISO string, or null if the file does not exist.
* @throws An error if there is an error getting the file stats, other than the file not existing.
*/
async function getFileMtime(filePath: string): Promise<string | null> {
const cachedMtime = mtimeCache.get(filePath)
if (cachedMtime) {
return cachedMtime
}
try {
const stats = await fs.promises.stat(filePath)
const mtime = stats.mtime.toISOString()
mtimeCache.set(filePath, mtime)
return mtime
} catch (error) {
if (error instanceof Error && "code" in error && error.code === "ENOENT") {
return null // File does not exist, so no mtime.
}
// For other errors, we want to know about them.
throw error
}
}

/**
* Processes a read request against cached file data in conversation history.
* @param requestedFilePath - The full path of the file being requested.
* @param requestedRanges - The line ranges being requested.
* @param conversationHistory - The history of conversation messages.
* @returns A CacheResult indicating whether to allow, partially allow, or reject the read.
*/
export async function processAndFilterReadRequest(
requestedFilePath: string,
requestedRanges: LineRange[],
conversationHistory: ConversationMessage[],
): Promise<CacheResult> {
try {
const currentMtime = await getFileMtime(requestedFilePath)
if (currentMtime === null) {
// If file does not exist, there's nothing to read from cache. Let the tool handle it.
return { status: "ALLOW_ALL", rangesToRead: requestedRanges }
}

let rangesToRead = [...requestedRanges]

// If no specific ranges are requested, treat it as a request for the whole file.
if (rangesToRead.length === 0) {
// We need to know the number of lines to create a full range.
// This logic is simplified; in a real scenario, you'd get the line count.
// For this example, we'll assume we can't determine the full range without reading the file,
// so we proceed with ALLOW_ALL if no ranges are specified.
return { status: "ALLOW_ALL", rangesToRead: requestedRanges }
}

for (const message of conversationHistory) {
if (message.files) {
for (const file of message.files) {
if (file.fileName === requestedFilePath && new Date(file.mtime) >= new Date(currentMtime)) {
// File in history is up-to-date. Check ranges.
for (const cachedRange of file.lineRanges) {
rangesToRead = rangesToRead.flatMap((reqRange) => {
if (rangesOverlap(reqRange, cachedRange)) {
return subtractRange(reqRange, cachedRange)
}
return [reqRange]
})
}
}
}
}
}

if (rangesToRead.length === 0) {
return { status: "REJECT_ALL", rangesToRead: [] }
} else if (rangesToRead.length < requestedRanges.length) {
return { status: "ALLOW_PARTIAL", rangesToRead }
} else {
return { status: "ALLOW_ALL", rangesToRead: requestedRanges }
}
} catch (error) {
console.error(`Error processing file read request for ${requestedFilePath}:`, error)
// On other errors, allow the read to proceed to let the tool handle it.
return { status: "ALLOW_ALL", rangesToRead: requestedRanges }
}
}
Loading
Loading