Critical Financial Risk in Gemini CLI: Uncontrolled Token Consumption & Misleading Errors with Enterprise/Paid API Keys #4841

Theorist100 · 2025-07-25T11:17:27Z

Theorist100
Jul 25, 2025

ummary (TL;DR)
The Gemini CLI (aistudio) tool, when used with an Enterprise/paid API key (e.g., from a Vertex AI subscription), exhibits dangerous default behavior that can lead to unexpected and astronomical token consumption. The tool autonomously uses file system tools (ReadFile, Shell, etc.) in a loop, causing the context to grow exponentially without the user's explicit consent or awareness.
Furthermore, when the actual rate limit (Tokens Per Minute) of the Enterprise account is hit, the CLI displays a misleading error message referencing the free daily quota of AI Studio, completely confusing the user and masking the real issue.
This combination of uncontrolled, hidden token usage and incorrect error reporting makes the tool financially unsafe for any professional use with a paid API key. A simple task of refactoring 5 tests can secretly consume over 47 million tokens, which could translate to a surprise bill of nearly $1,000 USD.
Steps to Reproduce
Setup: Configure Gemini CLI to use a paid API key from a Google Cloud (Vertex AI) project.
Environment: Place the CLI in a directory containing a software project (e.g., the root of a Git repository).
Action: Give the CLI a seemingly simple but open-ended task related to the codebase. For example:
gemini "Analyze the tests in this directory and identify which ones need to be refactored."
gemini "Refactor these 5 tests" (and then paste the code).
Observation: Do not use any special flags like -i or -p. Just run it in the default interactive agent mode. The agent will start using tools to read files and run commands.
Result: After the interaction, check the Interaction Summary. You will see an anomalously high Input Tokens count and a long Tool Time.
Expected Behavior
When given a simple task, the CLI should only consume tokens proportional to the prompt and the immediate context provided. For a task like "refactor 5 tests," the expected input token count should be in the low thousands.
If an Enterprise quota limit (like Tokens Per Minute) is exceeded, the error message should clearly state 429 Resource Exhausted and reference the specific TPM limit.
Actual Behavior
First Incident:
Task: Refactor a few test files.
Result: The tool secretly consumed 47,185,013 input tokens over 251 requests.
Error: When the limit was hit, the CLI displayed the error for the AI Studio free daily quota, which was incorrect and misleading.
Second Incident (after starting a new session):
Task: A short task to continue editing the same tests.
Real-Time Duration: Wall Time: 13m 38s
Result: The agent initiated 47 tool calls (reading files, running pytest, editing files in a loop) and consumed 721,943 input tokens. The Tool Time was over an hour, indicating massive background activity.
Evidence (Interaction Summary):
Generated code
Model Usage Reqs Input Tokens Output Tokens
───────────────────────────────────────────────────────────────
gemini-2.5-pro 251 47,185,013 177,547
gemini-2.5-flash 5 1,483,889 1,829

Tool Calls: 47 ( ✔ 47 ✖ 0 )
Tool Time: 1h 14m 9s (97.1%)
Model Usage Reqs Input Tokens Output Tokens
───────────────────────────────────────────────────────────────
gemini-1.5-flash 32 721,943 6,786
Generated code

Root Cause Analysis

The core of the problem is that Gemini CLI is designed as an autonomous agent, not a simple assistant. By default, it operates on a Reason-Act (ReAct) loop:

Reason: It decides it needs more context.
Act: It uses a tool (ReadFile, Shell) to get that context.
Observe: It gets the result (file contents, command output).
Repeat: It adds the entire history (reasoning, action, observation) to the next prompt and sends it to the API.

This creates an exponential context growth that is hidden from the user. This "helpful" behavior, while perhaps acceptable in a free, rate-limited environment, is disastrous when attached to a paid API.

WARNING: Severe Financial Risk

This is not a minor bug. This is a critical issue that poses a significant financial risk to your customers. At a conservative average rate of $20 per million tokens (for Pro model I/O), the first incident described above could have resulted in an unexpected bill of approximately $940 USD for a simple development task.

The Gemini CLI tool, in its current default configuration, should be considered financially unsafe to use with paid API keys.

Recommendations

Change the Default Behavior: The autonomous, tool-using agent mode should be explicitly opt-in, not the default. The default mode should be a "safe" mode that does not access the file system or remember long-term history unless requested.
Add an Explicit Warning: When a user authenticates with a paid/Enterprise API key, the CLI should display a loud, clear warning about the potential for high token consumption in agent mode.
Fix Error Reporting: The CLI must report the correct quota error returned by the API (e.g., Tokens Per Minute exceeded for Vertex AI) instead of falling back to the misleading AI Studio error.

Thank you for your attention to this critical issue.

Manamama · 2025-07-25T14:33:33Z

Manamama
Jul 25, 2025

See also the perennial "loop" and "costs" posts in this very corner: https://github.com/google-gemini/gemini-cli/discussions, with some answers, some of those mine.

0 replies

tuapuikia · 2025-07-25T15:48:34Z

tuapuikia
Jul 25, 2025

All GAI agent tools in the market are developer companion apps. They help you solve problems, not solve unknown problems for the user. In the Google Cloud API, this is already mentioned to the user. Be more specific and clear, and don't use open questions and let the LLM think for you.

This is why in the current GAI state, it cannot use to replace human but a assistant to fix human problem.

0 replies

RonnieShawDeveloper · 2025-07-25T17:28:55Z

RonnieShawDeveloper
Jul 25, 2025

I have made an Issue and a Discussion post about this problem, as it happened to me. I was using the API Key method of authentication on a project I created just to use for Gemini CLI. I was getting ready to create a tutorial on using the CLI in a commercial environment, and I used it to create a complete weather website. It pulled live data from the National Weather Service website, radar images, live real-time alerts. I used Gemini-Cli for about 5 hours. I had set the threshold of the project billing to $100, thinking I would never exceed the free tier of 1000 requests per day, because google has said they build the pricing models to exceed what a developer could possibly use. Well, the next morning I woke up to seeing a $142 bill, with 97 MILLION input tokens used and 500+ requests for a weather website. I only made about 70 of those requests, and the CLI made the rest. To use Gemini CLI with pro in any development environment will me Claude AI look like a charity case! Using it 8 to 10 hours a day would cost you $2k - $3k a month or more. I stopped using it. Its a shame too, because it's an awesome tool! But, the fact that Google says they are giving this huge free tier is misleading!

If you login using the google account, with no API, you get 50 requests and then you are dropped to the Flash model. And the Flash model makes huge mistakes, gets confused, and is a very bad listener to instructions.

FROM THIS MORNING:

These were screenshots from this morning. I wanted to see if anything had changed. Still only lets you have 50 requests, and I only made 7 requests myself. This session lasted about 10 minutes and the CLI blew through the rate limit and I was dropped to Flash. At that point, I closed the terminal.

WHAT GOOGLE SAYS YOU GET:

But what they don't tell you is which model you get to use those 1000 daily requests with - and that is the Flash model after you reach 50 on the Pro model. And the Flash model - may as well as use Chat GPT 4o, its way better than the Flash model. And I am sorry to say that because I really like the Gemini Model and the CLI was a game changer, but its not affordable.

0 replies

owenofbrien · 2025-07-25T19:17:38Z

owenofbrien
Jul 25, 2025
Collaborator

@Theorist100 - Thank you for the thoughtful post. I filed #4876 to track the misleading error reporting issue.

It sounds like you had enabled auto-edit or yolo mode, correct? By default, you have to manually approve tool calls.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Critical Financial Risk in Gemini CLI: Uncontrolled Token Consumption & Misleading Errors with Enterprise/Paid API Keys #4841

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Critical Financial Risk in Gemini CLI: Uncontrolled Token Consumption & Misleading Errors with Enterprise/Paid API Keys #4841

Uh oh!

Uh oh!

Theorist100 Jul 25, 2025

Root Cause Analysis

WARNING: Severe Financial Risk

Recommendations

Replies: 4 comments

Uh oh!

Uh oh!

Manamama Jul 25, 2025

Uh oh!

tuapuikia Jul 25, 2025

Uh oh!

Uh oh!

RonnieShawDeveloper Jul 25, 2025

If you login using the google account, with no API, you get 50 requests and then you are dropped to the Flash model. And the Flash model makes huge mistakes, gets confused, and is a very bad listener to instructions.

Uh oh!

owenofbrien Jul 25, 2025 Collaborator

Theorist100
Jul 25, 2025

Manamama
Jul 25, 2025

tuapuikia
Jul 25, 2025

RonnieShawDeveloper
Jul 25, 2025

owenofbrien
Jul 25, 2025
Collaborator