feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

roomote · 2025-07-24T15:36:31Z

Fixes #6155
This PR implements a safeguard to prevent consuming the entire context window when reading large files with maxReadFileLine set to -1.

Problem

When maxReadFileLine is set to -1, the readFileTool reads the full contents of a file. This can be problematic if the file is huge since it could consume the entire context window.

Solution

Added a basic safeguard that:

Checks if a file has more than 1000 lines when maxReadFileLine is -1
Uses tiktoken to count tokens in the file content
If token count exceeds 50,000 (approximately 50% of a typical 100k context window), automatically switches to partial read mode
Reads only the first 2000 lines with an informative notice explaining why
Includes a fallback for very large files (>5000 lines) when token counting fails

Changes

Modified src/core/tools/readFileTool.ts to add the safeguard logic
Added comprehensive tests in src/core/tools/__tests__/readFileTool.spec.ts

Testing

All existing tests pass
Added new test cases covering:
- Files with high token count that trigger the safeguard
- Files with low token count that do not trigger the safeguard
- Files under the line threshold that skip token counting entirely
- Very large files when token counting fails
- Safeguard not applying when maxReadFileLine is not -1
- Line ranges bypassing the safeguard
- Boundary conditions for thresholds

Performance Considerations

Token counting is only performed for files over 1000 lines to minimize overhead
The safeguard preserves the existing behavior for smaller files
The thresholds are configurable constants that can be adjusted if needed

Important

Adds a safeguard in readFileTool to handle large files by limiting lines read when maxReadFileLine is -1, with new utility and tests.

Behavior:
- Adds safeguard in readFileTool.ts to limit lines read for large files when maxReadFileLine is -1.
- Uses tiktoken to count tokens and applies safeguard if token count exceeds 50,000.
- Reads only the first 2000 lines if safeguard is triggered, with a notice.
- Fallback for very large files (>5000 lines) when token counting fails.
Utilities:
- Introduces readLinesWithCharLimit in read-lines-char-limit.ts to read lines up to a character limit.
Testing:
- Adds tests in readFileTool.spec.ts for various scenarios including large files, token count checks, and line range handling.
- Adds tests in read-lines-char-limit.spec.ts for the new utility function.
Localization:
- Updates localization files to include safeguard notice in multiple languages.

^{This description was created by}^{for f52f374. You can customize this summary. It will automatically update as commits are pushed.}

…ine is -1 - Add token counting check using tiktoken for files over 1000 lines - Automatically switch to partial read (first 2000 lines) when token count exceeds 50k - Add fallback safeguard for very large files (>5000 lines) when token counting fails - Include informative notice explaining why partial read is being used - Add comprehensive test coverage for all safeguard scenarios This prevents consuming the entire context window when reading very large files.

cte · 2025-07-24T15:40:04Z

@roomote-agent This looks promising, but can we get access to the actual context window size of the current model via the reference to the Task object and use that instead of assuming a 100K window size? Additionally, 1,000 lines is a bit low; let's bump that up to 10K lines.

roomote · 2025-07-24T15:41:50Z

Hi @cte! I see your feedback. Ill work on:

Getting the actual context window size from the Task object instead of assuming 100K
Increasing the line threshold from 1,000 to 10,000 lines

Let me check out the branch and make these changes.

ellipsis-dev · 2025-07-24T15:42:08Z

src/core/tools/readFileTool.ts

-				const lineRangeAttr = ` lines="1-${totalLines}"`
+				// Handle normal file read with safeguard for large files
+				// Define thresholds for the safeguard
+				const LARGE_FILE_LINE_THRESHOLD = 1000 // Consider files with more than 1000 lines as "large"


Consider moving the safeguard threshold constants (LARGE_FILE_LINE_THRESHOLD, MAX_TOKEN_THRESHOLD, FALLBACK_MAX_LINES) to a shared configuration or constants module to avoid magic numbers and improve maintainability.

^{This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.}

ellipsis-dev · 2025-07-24T15:42:08Z

src/core/tools/readFileTool.ts

+						if (tokenCount > MAX_TOKEN_THRESHOLD) {
+							shouldApplySafeguard = true
+							linesToRead = FALLBACK_MAX_LINES
+							safeguardNotice = `<notice>This file contains ${totalLines} lines and approximately ${tokenCount.toLocaleString()} tokens, which could consume a significant portion of the context window. Showing only the first ${FALLBACK_MAX_LINES} lines to preserve context space. Use line_range if you need to read specific sections.</notice>\n`


The safeguard notice message is hardcoded in English. For internationalization, consider using the i18n function (e.g., t()) so that the message can be translated for all supported languages.

^{This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.}

cte · 2025-07-24T15:48:34Z

@roomote-agent Instead of line count let's use file size to detect files that potentially contain a lot of tokens; this should handle the case where there's a file with a huge amount of content on a single line.

- Access actual context window size from Task object via cline.api.getModel().info.contextWindow - Increase LARGE_FILE_LINE_THRESHOLD from 1,000 to 10,000 lines - Calculate MAX_TOKEN_THRESHOLD as 50% of actual context window instead of hardcoded 50K - Update tests to reflect new thresholds and mock api.getModel() properly

roomote · 2025-07-24T15:51:51Z

Thanks for the feedback @cte! I understand - switching from line count to file size makes sense to better handle files with large amounts of content on single lines. I'll update the implementation to use file size as the primary detection method for potentially token-heavy files. Working on this now.

roomote · 2025-07-24T15:58:48Z

Hi @cte! Ive successfully implemented both requested changes:

✅ Changes made:

Dynamic context window sizing: Now accessing the actual context window size from cline.api.getModel().info.contextWindow instead of assuming 100K
Increased line threshold: Bumped LARGE_FILE_LINE_THRESHOLD from 1,000 to 10,000 lines as requested

📝 Implementation details:

The MAX_TOKEN_THRESHOLD is now calculated as 50% of the actual models context window
Added fallback to 100K if context window info is not available
Updated all related tests to mock the api.getModel() method properly
All tests are passing locally

🔍 CI Status:

Most checks have passed ✅
The Windows unit test failure appears to be a pre-existing issue (Ubuntu tests passed fine)

The changes ensure that the safeguard adapts to different model context windows while being less aggressive about triggering for moderately-sized files.

@cte

- Replace line count threshold with file size threshold (100KB) - Files larger than 100KB trigger token count check - Files larger than 1MB automatically apply safeguard if token counting fails - Update tests to reflect new file size-based approach - This better handles files with large amounts of content on single lines As requested by @cte in PR comment

roomote · 2025-07-24T16:06:37Z

Hi @cte! I've successfully implemented the requested change to use file size instead of line count for detecting large files.

Summary of changes:

Replaced line count threshold with file size threshold: Files larger than 100KB now trigger the token count check (instead of files with more than 10,000 lines)
Added file size-based safeguard: Files larger than 1MB automatically apply the safeguard if token counting fails
Updated notice messages: The safeguard notices now show file size in KB instead of line count
Updated all tests: Modified test suite to use file size mocking and updated test descriptions accordingly

This approach better handles edge cases where files might have a huge amount of content on single lines, as you mentioned. The implementation still respects the dynamic context window calculation that was added in the previous commit.

All tests are passing locally, and CI checks are currently running. The changes have been pushed to this PR branch.

ellipsis-dev · 2025-07-24T16:08:26Z

src/core/tools/readFileTool.ts

+							if (tokenCount > MAX_TOKEN_THRESHOLD) {
+								shouldApplySafeguard = true
+								linesToRead = FALLBACK_MAX_LINES
+								safeguardNotice = `<notice>This file is ${fileSizeKB}KB and contains approximately ${tokenCount.toLocaleString()} tokens, which could consume a significant portion of the context window. Showing only the first ${FALLBACK_MAX_LINES} lines to preserve context space. Use line_range if you need to read specific sections.</notice>\n`


Consider using the i18n translation function for the safeguard notice message (e.g. on line 550) instead of hardcoded English text, to ensure consistency with other user‐facing messages.

^{This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.}

…d i18n - Replace line-based limiting with character-based limiting to handle files with very long lines - Move threshold constants to packages/types/src/file-limits.ts for better organization - Add readLinesWithCharLimit function that truncates at complete line boundaries - Optimize file reading to avoid double reads when checking token count - Add i18n support for safeguard notice messages in all 18 supported languages - Update tests to match new character-based implementation - Safeguard now limits by character count (200KB default) instead of line count - Ensures files are never truncated in the middle of a line

daniel-lxs · 2025-07-28T23:27:26Z

Closing in favor of #6319

roomote bot requested review from cte, jr and mrubens as code owners July 24, 2025 15:36

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 24, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 24, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 24, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 24, 2025

ellipsis-dev bot reviewed Jul 24, 2025

View reviewed changes

daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jul 24, 2025

hannesrudolph added the PR - Needs Preliminary Review label Jul 24, 2025

ellipsis-dev bot reviewed Jul 24, 2025

View reviewed changes

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 24, 2025

daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Jul 24, 2025

hannesrudolph added PR - Draft / In Progress and removed PR - Needs Preliminary Review labels Jul 24, 2025

daniel-lxs moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Jul 28, 2025

daniel-lxs mentioned this pull request Jul 28, 2025

fix: add file size validation to prevent memory exhaustion #6157

Closed

hannesrudolph added the PR - Needs Preliminary Review label Jul 28, 2025

hannesrudolph removed the PR - Draft / In Progress label Jul 28, 2025

daniel-lxs closed this Jul 28, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 28, 2025

github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Jul 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

Uh oh!

roomote bot commented Jul 24, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Uh oh!

daniel-lxs commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

Uh oh!

Conversation

roomote bot commented Jul 24, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Performance Considerations

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Summary of changes:

Uh oh!

ellipsis-dev bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-lxs commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

roomote bot commented Jul 24, 2025 •

edited by ellipsis-dev bot

Loading