Skip to content

Conversation

@Nubebuster
Copy link
Contributor

Re-opening #15175 as I cannot re-open it.

Update: Still important feature. Please review. It already got a LGTM:

LGTM

Originally posted by @LIHUA919 in #15175 (review)

Summary

Add a configurable text file read threshold (512 KiB default) that fails overly-broad text file reads with guidance, preventing context pollution from massive file reads. Additionally removes old automatic content line count and line length truncation logic.

Details

Problem: The agent often reads large text files (logs, JSON, JSONL, CSV, etc.) that consume excessive context tokens, degrading attention and increasing latency. The current 2k-lines/2k-chars truncation doesn't adequately address this —truncated content often strip key information while retaining useless metadata.

Solution: Fail suspected overly-broad reads before they happen, guiding the agent to adapt:

  • Read in chunks via offset/limit
  • Use more specific query tools (grep, head, tail, jq)
  • Explicitly bypass with limit=-1 when full read is intentional

Implementation notes:

Open questions for reviewers:

  1. Is 512 KiB the right default threshold?
  2. Should configuration be via settings.json instead of env var?

Related Issues

Closes #14991

How to Validate

  1. Create a large text file (>512 KiB):

    dd if=/dev/zero bs=1024 count=600 | tr '\0' 'x' > large.txt
  2. Start Gemini CLI and ask it to read the file:

    > read large.txt
    
  3. Expected: Read fails with error guiding to use offset/limit or limit=-1

  4. Ask with explicit limit:

    > read large.txt with limit=-1
    
  5. Expected: Read succeeds, full content returned

  6. Test threshold override:

    GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES=-1 gemini
    > read large.txt
  7. Expected: Read succeeds (threshold disabled)

Nubebuster and others added 3 commits January 25, 2026 01:47
Implement a configurable byte threshold (512 KiB default) for text file
reads. When exceeded without an explicit limit parameter, the read fails
with guidance to use chunked reading or surgical tools.

- Add TEXT_FILE_READ_TOO_BROAD error type
- Allow limit=-1 to explicitly read entire file
- Any explicit limit bypasses threshold check
- Remove 2000-char line truncation (threshold gates large files instead)
- Remove 2000-line default limit
- Rename formatMemoryUsage to formatBytes (core package only)
- Configurable via GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES env var

Closes google-gemini#14991
Update documentation for the new text file read threshold feature:

- file-system.md: Update read_file tool with large file handling section,
  remove outdated 2000-line default references
- configuration.md: Add GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES env var
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Nubebuster Nubebuster requested review from a team as code owners January 25, 2026 01:21
@google-cla
Copy link

google-cla bot commented Jan 25, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@Nubebuster Nubebuster changed the title Nubebuster/feat/text file read threshold feat: prevent large text file read context pollution Jan 25, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Nubebuster, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's ability to manage context and prevent performance degradation when interacting with large text files. By introducing a configurable size threshold for text file reads, the system now proactively guides the agent towards more efficient and targeted data access methods, such as reading in chunks or using specialized tools. This change also streamlines file processing by removing previous automatic truncation mechanisms, ensuring that when a file is read, its content is fully preserved up to the defined threshold or explicit limits.

Highlights

  • Text File Read Threshold: Introduced a configurable text file read threshold (default 512 KiB) for the read_file tool. If an agent attempts to read a text file larger than this without specifying a limit, the read will fail with guidance on how to read in chunks, use surgical tools, or explicitly bypass the threshold.
  • Removal of Automatic Truncation: The old automatic content line count (2000 lines) and line length (2000 characters) truncation logic for text files has been removed. This means files under the new byte threshold will be read in full, preserving long lines.
  • Enhanced read_file Tool Parameters: The limit parameter for the read_file tool now accepts -1 as a special value to explicitly read the entire file, bypassing the new size threshold. The offset parameter no longer strictly requires limit to be set.
  • Configuration via Environment Variable: The text file read threshold can be configured via the GEMINI_TEXT_FILE_READ_THRESHOLD_BYTES environment variable, allowing users to adjust or disable it (by setting to -1).
  • Utility Function Rename: The formatMemoryUsage utility function has been renamed to formatBytes for clarity and consistency.
  • New Error Type: A new ToolErrorType.TEXT_FILE_READ_TOO_BROAD has been added to specifically handle cases where text file reads exceed the configured threshold.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Update test to reflect that files under the byte threshold are read in
full without automatic line truncation.
@Nubebuster Nubebuster force-pushed the nubebuster/feat/text-file-read-threshold branch from 77ddc52 to 281e383 Compare January 25, 2026 01:24
@Nubebuster
Copy link
Contributor Author

Removed Claude Code as Co-Author causing CLA failure.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature to prevent context pollution by adding a configurable size threshold for reading text files. It also correctly removes the previous, less effective line-length truncation logic. The implementation is generally solid, and the documentation updates are clear. My review identified two high-severity issues, which are detailed in the comments. First, the parsing of the new environment variable for the threshold is not fully robust and could lead to confusing behavior if misconfigured. Second, the read_many_files tool does not appear to handle cases where a file exceeds the new size threshold, which could cause the tool to fail unexpectedly. I have provided detailed comments and suggestions to address these points.

@gemini-cli gemini-cli bot added priority/p1 Important and should be addressed in the near term. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality labels Jan 25, 2026
Nubebuster and others added 2 commits January 25, 2026 03:31
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@gemini-cli
Copy link
Contributor

gemini-cli bot commented Jan 27, 2026

Hi there! Thank you for your contribution to Gemini CLI. We really appreciate the time and effort you've put into this pull request.

To keep our backlog manageable and ensure we're focusing on current priorities, we are closing pull requests that haven't seen maintainer activity for 30 days. Currently, the team is prioritizing work associated with 🔒 maintainer only or help wanted issues.

If you believe this change is still critical, please feel free to comment with updated details. Otherwise, we encourage contributors to focus on open issues labeled as help wanted. Thank you for your understanding!

@gemini-cli gemini-cli bot closed this Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality priority/p1 Important and should be addressed in the near term.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: ginormous file read calls succeed undesirably ­— context poison and confident continuation

2 participants