fix: stop reading big files that crash context #6667

liwilliam2021 · 2025-08-04T16:50:41Z

Implemented a simple version of: #6319

We now always stop reading after a limited read. We also use simpler heuristics that do not call the tokenizer to validate while remaining conservative.

Goal: prevent infinite hanging on large files when partial reads are off. This may limit the ability of the model to read large files which is addressed in the next PR.

Important

Introduces file size validation in readFileTool.ts to prevent crashes from large files by limiting reads based on context size.

Behavior:
- Introduces file size validation in readFileTool.ts using validateFileSizeForContext from new contextValidator.ts.
- Stops reading files when they exceed a safe content limit to prevent context overflow.
- Adds notices for partial reads due to context limitations.
Tests:
- Updates readFileTool.spec.ts to test new validation logic and partial read notices.
- Adds tests for maxChars parameter in read-lines.spec.ts to ensure character limit handling.
Localization:
- Adds showingOnlyLines translation in multiple tools.json files for different languages.

^{This description was created by}^{for 75fb09a. You can customize this summary. It will automatically update as commits are pushed.}

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

src/i18n/locales/hi/tools.json

roomote

Thank you for your contribution! I've reviewed the changes and the implementation looks solid overall. The approach to prevent infinite hanging on large files is well thought out. I've left some suggestions inline that could improve performance and robustness.

src/integrations/misc/read-partial-content.ts

roomote · 2025-08-04T16:54:55Z

src/core/tools/contextValidator.ts

+ * Conservative buffer percentage for file reading.
+ * We use a very conservative estimate to ensure files fit in context.
+ */
+const FILE_READ_BUFFER_PERCENTAGE = 0.4 // 40% buffer for safety


Is the 40% buffer intentionally this conservative? It might be worth making this configurable or adjusting based on model capabilities. Some models might handle closer-to-limit content better than others.

For now yes.

It seems like we shouldn’t need to be so conservative here if the rest of the logic is working right

yeah sorry-- I think I just picked a big number for the simple version

src/integrations/misc/read-partial-content.ts

src/core/tools/readFileTool.ts

- Fix Hindi translation punctuation - Fix race condition by checking stream.destroyed - Optimize newline counting with regex - Performance improvements for large file handling - Defensive programming for end parameter already in place

daniel-lxs

Thank you @liwilliam2021 this finally defeated my file

src/integrations/misc/read-partial-content.ts

mrubens · 2025-08-21T09:37:17Z

src/core/tools/contextValidator.ts

+		const remainingTokens = contextWindow - currentlyUsed
+		const usableTokens = Math.floor(remainingTokens * (1 - FILE_READ_BUFFER_PERCENTAGE))
+
+		// Reserve space for response (use 25% of remaining or 4096, whichever is smaller)


We should use the common logic for this

mrubens · 2025-08-21T09:40:46Z

src/core/tools/contextValidator.ts

+		// For large files or when approaching limits, always limit
+		if (fileSizeBytes > safeCharLimit || fileSizeBytes > LARGE_FILE_SIZE) {
+			// Use a very conservative limit
+			const finalLimit = Math.min(safeCharLimit, 100000) // Cap at 100K chars


I think this might annoy people who are trying to use a model with a large context window to read large files

I think the plan for this PR was to do something stupid to fix the temporary error-- basically never reading big files

This PR has the full implementation and doesn't have that limit
#6319

hannesrudolph · 2025-09-23T04:04:10Z

Let's get this across the finish line

Implements a simple, token-budget based file reading system that prevents context window overflow and tokenizer crashes. Problem: - Files could fill entire context window causing issues - tiktoken crashes with 'unreachable' error on files >5MB - PR #6667's approach was too complex with magic numbers Solution - Multi-Layer Defense: 1. Fast path: Files <100KB skip validation (no overhead) 2. Token validation: 100KB-5MB files use real token counting - Budget: (contextWindow - currentTokens) * 0.6 - Smart truncation if exceeds budget 3. Preview mode: Files >5MB get 100KB preview (prevents crashes) 4. Error recovery: Catch tokenizer 'unreachable' errors gracefully Key Features: - No magic numbers - dynamic based on actual context - Real token counting using existing tokenizer - 100KB previews for large files (perfect size for structure visibility) - Graceful error handling prevents conversation crashes - Simple implementation (~160 lines vs complex heuristics) Testing: - 17 comprehensive tests covering all scenarios - All tests passing including edge cases and error conditions Files: - src/core/tools/helpers/fileTokenBudget.ts: Core validation logic - src/core/tools/helpers/__tests__/fileTokenBudget.spec.ts: Test suite - src/core/tools/readFileTool.ts: Integration into read file tool

liwilliam2021 and others added 14 commits July 28, 2025 15:40

working

5c48253

code review

9f612d5

code review + speed fix

e2b13a6

stupid translation fix

9923b7c

Merge branch 'main' into will/fixed-context-overload-read

32098cb

minified file partial fix

7d7df19

comment

0e0bd6c

Update src/core/tools/contextValidator.ts

fe60b11

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

fail more gracefully

8fc176d

mostly fixed

2505ac6

working

6347e57

Merge branch 'main' into will/fixed-context-overload-read

b556d64

super awesome refactor

2285015

simple implementation

e4bac4f

liwilliam2021 requested review from cte, jr and mrubens as code owners August 4, 2025 16:50

github-project-automation bot moved this to New in Roo Code Roadmap Aug 4, 2025

github-project-automation bot added this to Roo Code Roadmap Aug 4, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 4, 2025

github-project-automation bot added this to Roo Code Roadmap Aug 4, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. bug Something isn't working labels Aug 4, 2025

liwilliam2021 changed the title ~~Will/max read fix~~ fix: stop reading big files that crash context Aug 4, 2025

ellipsis-dev bot reviewed Aug 4, 2025

View reviewed changes

src/i18n/locales/hi/tools.json Outdated Show resolved Hide resolved

roomote bot reviewed Aug 4, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 4, 2025

fix: address PR review comments

8269f3a

- Fix Hindi translation punctuation - Fix race condition by checking stream.destroyed - Optimize newline counting with regex - Performance improvements for large file handling - Defensive programming for end parameter already in place

daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 5, 2025

hannesrudolph added the PR - Needs Preliminary Review label Aug 5, 2025

hannesrudolph removed the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 5, 2025

daniel-lxs approved these changes Aug 5, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 5, 2025

daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Aug 5, 2025

hannesrudolph added PR - Needs Review and removed PR - Needs Preliminary Review labels Aug 5, 2025

mrubens reviewed Aug 5, 2025

View reviewed changes

src/integrations/misc/read-partial-content.ts Outdated Show resolved Hide resolved

collapse char reads into one file

75fb09a

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Aug 6, 2025

mrubens reviewed Aug 21, 2025

View reviewed changes

daniel-lxs moved this from PR [Needs Review] to PR [Changes Requested] in Roo Code Roadmap Aug 23, 2025

hannesrudolph added PR - Changes Requested and removed PR - Needs Review labels Aug 23, 2025

hannesrudolph moved this from PR [Changes Requested] to PR [Needs Prelim Review] in Roo Code Roadmap Sep 23, 2025

hannesrudolph added PR - Needs Preliminary Review and removed PR - Changes Requested labels Sep 23, 2025

daniel-lxs mentioned this pull request Oct 23, 2025

feat: add token-budget based file reading with intelligent preview #8789

Merged

mrubens closed this in #8789 Oct 23, 2025

github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Oct 23, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 23, 2025

fix: stop reading big files that crash context #6667

fix: stop reading big files that crash context #6667

Uh oh!

Conversation

liwilliam2021 commented Aug 4, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

liwilliam2021 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

mrubens Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

liwilliam2021 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

daniel-lxs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mrubens Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

mrubens Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

liwilliam2021 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

hannesrudolph commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liwilliam2021 commented Aug 4, 2025 •

edited by ellipsis-dev bot

Loading