feat: prevent large text file read context pollution #17468

Nubebuster · 2026-01-25T01:21:39Z

Re-opening #15175 as I cannot re-open it.

Update: Still important feature. Please review. It already got a LGTM:

LGTM

Originally posted by @LIHUA919 in #15175 (review)

Summary

Add a configurable text file read threshold (512 KiB default) that fails overly-broad text file reads with guidance, preventing context pollution from massive file reads. Additionally removes old automatic content line count and line length truncation logic.

Details

Problem: The agent often reads large text files (logs, JSON, JSONL, CSV, etc.) that consume excessive context tokens, degrading attention and increasing latency. The current 2k-lines/2k-chars truncation doesn't adequately address this —truncated content often strip key information while retaining useless metadata.

Solution: Fail suspected overly-broad reads before they happen, guiding the agent to adapt:

Read in chunks via offset/limit
Use more specific query tools (grep, head, tail, jq)
Explicitly bypass with limit=-1 when full read is intentional

Implementation notes:

formatMemoryUsage → formatBytes rename is partial (core only); PR refactor: rename formatMemoryUsage to formatBytes #14997 completes the refactor project-wide
Removed line-length truncation (threshold prevents unintended large file reads instead)

Open questions for reviewers:

Is 512 KiB the right default threshold?
Should configuration be via settings.json instead of env var?

Related Issues

Closes #14991

How to Validate

Create a large text file (>512 KiB):

dd if=/dev/zero bs=1024 count=600 | tr '\0' 'x' > large.txt

Start Gemini CLI and ask it to read the file:
```
> read large.txt
```
Expected: Read fails with error guiding to use offset/limit or limit=-1
Ask with explicit limit:
```
> read large.txt with limit=-1
```
Expected: Read succeeds, full content returned

Test threshold override:

GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES=-1 gemini
> read large.txt

Expected: Read succeeds (threshold disabled)

Implement a configurable byte threshold (512 KiB default) for text file reads. When exceeded without an explicit limit parameter, the read fails with guidance to use chunked reading or surgical tools. - Add TEXT_FILE_READ_TOO_BROAD error type - Allow limit=-1 to explicitly read entire file - Any explicit limit bypasses threshold check - Remove 2000-char line truncation (threshold gates large files instead) - Remove 2000-line default limit - Rename formatMemoryUsage to formatBytes (core package only) - Configurable via GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES env var Closes google-gemini#14991

Update documentation for the new text file read threshold feature: - file-system.md: Update read_file tool with large file handling section, remove outdated 2000-line default references - configuration.md: Add GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES env var

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

google-cla · 2026-01-25T01:21:46Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2026-01-25T01:22:04Z

Summary of Changes

Hello @Nubebuster, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's ability to manage context and prevent performance degradation when interacting with large text files. By introducing a configurable size threshold for text file reads, the system now proactively guides the agent towards more efficient and targeted data access methods, such as reading in chunks or using specialized tools. This change also streamlines file processing by removing previous automatic truncation mechanisms, ensuring that when a file is read, its content is fully preserved up to the defined threshold or explicit limits.

Highlights

Text File Read Threshold: Introduced a configurable text file read threshold (default 512 KiB) for the read_file tool. If an agent attempts to read a text file larger than this without specifying a limit, the read will fail with guidance on how to read in chunks, use surgical tools, or explicitly bypass the threshold.
Removal of Automatic Truncation: The old automatic content line count (2000 lines) and line length (2000 characters) truncation logic for text files has been removed. This means files under the new byte threshold will be read in full, preserving long lines.
Enhanced read_file Tool Parameters: The limit parameter for the read_file tool now accepts -1 as a special value to explicitly read the entire file, bypassing the new size threshold. The offset parameter no longer strictly requires limit to be set.
Configuration via Environment Variable: The text file read threshold can be configured via the GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES environment variable, allowing users to adjust or disable it (by setting to -1).
Utility Function Rename: The formatMemoryUsage utility function has been renamed to formatBytes for clarity and consistency.
New Error Type: A new ToolErrorType.TEXT_FILE_READ_TOO_BROAD has been added to specifically handle cases where text file reads exceed the configured threshold.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Update test to reflect that files under the byte threshold are read in full without automatic line truncation.

Nubebuster · 2026-01-25T01:24:52Z

Removed Claude Code as Co-Author causing CLA failure.

gemini-code-assist

Code Review

This pull request introduces a valuable feature to prevent context pollution by adding a configurable size threshold for reading text files. It also correctly removes the previous, less effective line-length truncation logic. The implementation is generally solid, and the documentation updates are clear. My review identified two high-severity issues, which are detailed in the comments. First, the parsing of the new environment variable for the threshold is not fully robust and could lead to confusing behavior if misconfigured. Second, the read_many_files tool does not appear to handle cases where a file exceeds the new size threshold, which could cause the tool to fail unexpectedly. I have provided detailed comments and suggestions to address these points.

packages/core/src/utils/fileUtils.ts

packages/core/src/tools/read-many-files.test.ts

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-cli · 2026-01-27T02:45:41Z

Hi there! Thank you for your contribution to Gemini CLI. We really appreciate the time and effort you've put into this pull request.

To keep our backlog manageable and ensure we're focusing on current priorities, we are closing pull requests that haven't seen maintainer activity for 30 days. Currently, the team is prioritizing work associated with 🔒 maintainer only or help wanted issues.

If you believe this change is still critical, please feel free to comment with updated details. Otherwise, we encourage contributors to focus on open issues labeled as help wanted. Thank you for your understanding!

Nubebuster and others added 3 commits January 25, 2026 01:47

Handle invalid setting value, fallback to default value

2870df3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Nubebuster requested review from a team as code owners January 25, 2026 01:21

Nubebuster changed the title ~~Nubebuster/feat/text file read threshold~~ feat: prevent large text file read context pollution Jan 25, 2026

test(core): update read-many-files test for byte threshold behavior

281e383

Update test to reflect that files under the byte threshold are read in full without automatic line truncation.

Nubebuster force-pushed the nubebuster/feat/text-file-read-threshold branch from 77ddc52 to 281e383 Compare January 25, 2026 01:24

gemini-code-assist bot reviewed Jan 25, 2026

View reviewed changes

packages/core/src/utils/fileUtils.ts Outdated Show resolved Hide resolved

packages/core/src/tools/read-many-files.test.ts Show resolved Hide resolved

gemini-cli bot added priority/p1 Important and should be addressed in the near term. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality labels Jan 25, 2026

Nubebuster and others added 2 commits January 25, 2026 03:31

Update packages/core/src/utils/fileUtils.ts

f6eb2c3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'main' into nubebuster/feat/text-file-read-threshold

42ec88a

gemini-cli bot closed this Jan 27, 2026

jackwotherspoon reopened this Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: prevent large text file read context pollution #17468

feat: prevent large text file read context pollution #17468

Uh oh!

Nubebuster commented Jan 25, 2026

Uh oh!

google-cla bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Uh oh!

Nubebuster commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-cli bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: prevent large text file read context pollution #17468

Are you sure you want to change the base?

feat: prevent large text file read context pollution #17468

Uh oh!

Conversation

Nubebuster commented Jan 25, 2026

Summary

Details

Related Issues

How to Validate

Uh oh!

google-cla bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Nubebuster commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-cli bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants