Critical Financial Risk in Gemini CLI: Uncontrolled Token Consumption & Misleading Errors with Enterprise/Paid API Keys #4841
Replies: 4 comments
-
|
See also the perennial "loop" and "costs" posts in this very corner: https://github.com/google-gemini/gemini-cli/discussions, with some answers, some of those mine. |
Beta Was this translation helpful? Give feedback.
-
|
All GAI agent tools in the market are developer companion apps. They help you solve problems, not solve unknown problems for the user. In the Google Cloud API, this is already mentioned to the user. Be more specific and clear, and don't use open questions and let the LLM think for you. This is why in the current GAI state, it cannot use to replace human but a assistant to fix human problem. |
Beta Was this translation helpful? Give feedback.
-
|
I have made an Issue and a Discussion post about this problem, as it happened to me. I was using the API Key method of authentication on a project I created just to use for Gemini CLI. I was getting ready to create a tutorial on using the CLI in a commercial environment, and I used it to create a complete weather website. It pulled live data from the National Weather Service website, radar images, live real-time alerts. I used Gemini-Cli for about 5 hours. I had set the threshold of the project billing to $100, thinking I would never exceed the free tier of 1000 requests per day, because google has said they build the pricing models to exceed what a developer could possibly use. Well, the next morning I woke up to seeing a $142 bill, with 97 MILLION input tokens used and 500+ requests for a weather website. I only made about 70 of those requests, and the CLI made the rest. To use Gemini CLI with pro in any development environment will me Claude AI look like a charity case! Using it 8 to 10 hours a day would cost you $2k - $3k a month or more. I stopped using it. Its a shame too, because it's an awesome tool! But, the fact that Google says they are giving this huge free tier is misleading! If you login using the google account, with no API, you get 50 requests and then you are dropped to the Flash model. And the Flash model makes huge mistakes, gets confused, and is a very bad listener to instructions.These were screenshots from this morning. I wanted to see if anything had changed. Still only lets you have 50 requests, and I only made 7 requests myself. This session lasted about 10 minutes and the CLI blew through the rate limit and I was dropped to Flash. At that point, I closed the terminal. But what they don't tell you is which model you get to use those 1000 daily requests with - and that is the Flash model after you reach 50 on the Pro model. And the Flash model - may as well as use Chat GPT 4o, its way better than the Flash model. And I am sorry to say that because I really like the Gemini Model and the CLI was a game changer, but its not affordable. |
Beta Was this translation helpful? Give feedback.
-
|
@Theorist100 - Thank you for the thoughtful post. I filed #4876 to track the misleading error reporting issue. It sounds like you had enabled auto-edit or yolo mode, correct? By default, you have to manually approve tool calls. |
Beta Was this translation helpful? Give feedback.






Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
ummary (TL;DR)
The Gemini CLI (aistudio) tool, when used with an Enterprise/paid API key (e.g., from a Vertex AI subscription), exhibits dangerous default behavior that can lead to unexpected and astronomical token consumption. The tool autonomously uses file system tools (ReadFile, Shell, etc.) in a loop, causing the context to grow exponentially without the user's explicit consent or awareness.
Furthermore, when the actual rate limit (Tokens Per Minute) of the Enterprise account is hit, the CLI displays a misleading error message referencing the free daily quota of AI Studio, completely confusing the user and masking the real issue.
This combination of uncontrolled, hidden token usage and incorrect error reporting makes the tool financially unsafe for any professional use with a paid API key. A simple task of refactoring 5 tests can secretly consume over 47 million tokens, which could translate to a surprise bill of nearly $1,000 USD.
Steps to Reproduce
Setup: Configure Gemini CLI to use a paid API key from a Google Cloud (Vertex AI) project.
Environment: Place the CLI in a directory containing a software project (e.g., the root of a Git repository).
Action: Give the CLI a seemingly simple but open-ended task related to the codebase. For example:
gemini "Analyze the tests in this directory and identify which ones need to be refactored."
gemini "Refactor these 5 tests" (and then paste the code).
Observation: Do not use any special flags like -i or -p. Just run it in the default interactive agent mode. The agent will start using tools to read files and run commands.
Result: After the interaction, check the Interaction Summary. You will see an anomalously high Input Tokens count and a long Tool Time.
Expected Behavior
When given a simple task, the CLI should only consume tokens proportional to the prompt and the immediate context provided. For a task like "refactor 5 tests," the expected input token count should be in the low thousands.
If an Enterprise quota limit (like Tokens Per Minute) is exceeded, the error message should clearly state 429 Resource Exhausted and reference the specific TPM limit.
Actual Behavior
First Incident:
Task: Refactor a few test files.
Result: The tool secretly consumed 47,185,013 input tokens over 251 requests.
Error: When the limit was hit, the CLI displayed the error for the AI Studio free daily quota, which was incorrect and misleading.
Second Incident (after starting a new session):
Task: A short task to continue editing the same tests.
Real-Time Duration: Wall Time: 13m 38s
Result: The agent initiated 47 tool calls (reading files, running pytest, editing files in a loop) and consumed 721,943 input tokens. The Tool Time was over an hour, indicating massive background activity.
Evidence (Interaction Summary):
Generated code
Model Usage Reqs Input Tokens Output Tokens
───────────────────────────────────────────────────────────────
gemini-2.5-pro 251 47,185,013 177,547
gemini-2.5-flash 5 1,483,889 1,829
Tool Calls: 47 ( ✔ 47 ✖ 0 )
Tool Time: 1h 14m 9s (97.1%)
Model Usage Reqs Input Tokens Output Tokens
───────────────────────────────────────────────────────────────
gemini-1.5-flash 32 721,943 6,786
Generated code
Root Cause Analysis
The core of the problem is that
Gemini CLIis designed as an autonomous agent, not a simple assistant. By default, it operates on a Reason-Act (ReAct) loop:ReadFile,Shell) to get that context.This creates an exponential context growth that is hidden from the user. This "helpful" behavior, while perhaps acceptable in a free, rate-limited environment, is disastrous when attached to a paid API.
WARNING: Severe Financial Risk
This is not a minor bug. This is a critical issue that poses a significant financial risk to your customers. At a conservative average rate of $20 per million tokens (for Pro model I/O), the first incident described above could have resulted in an unexpected bill of approximately $940 USD for a simple development task.
The
Gemini CLItool, in its current default configuration, should be considered financially unsafe to use with paid API keys.Recommendations
Thank you for your attention to this critical issue.
Beta Was this translation helpful? Give feedback.
All reactions