Gemini 2.5 Pro Rate-Limited on Free Tier After 10–15 Prompts (Also via API) #2436

Divyanshu-20 · 2025-06-28T15:14:30Z

Divyanshu-20
Jun 28, 2025

Hey guys,

On the free tier, Gemini gives access to 2.5 Pro for a few prompts (10–15), but then quickly rate-limits and switches to Flash. This is happening even with API.

Is this intended behavior? Can we get clarity or control over model access?

Thanks.

Answered by ryanjsalva

Jul 2, 2025

Howdy, folks. 👋 Ryan here from the Gemini CLI team. Hopefully, I can demystify the behavior.

For those devs using the free tier by logging in with your Google account, our goal is to deliver the best possible experience at the keyboard – ideally, one where you never have to stop work because you hit a limit. To do that, we have to balance model choice with capacity. Thus, the free tier uses a blend of Gemini 2.5 Pro and Flash.

For example, we might use Flash to determine the complexity of a request before routing a request to the model for the “official” response. After all, Pro is overkill for a lot of really simple steps (e.g. “start the npm server”) better routed to Flash. Pro is bette…

View full answer

Manamama · 2025-06-28T15:21:20Z

Manamama
Jun 28, 2025

Likely intended, see some Tickets here about it: very reasonable limitation, business wise.

1 reply

Domoramo Jun 28, 2025

Reasonable but unexpected when they drum up how it's free

rutexd · 2025-06-28T23:35:02Z

rutexd
Jun 28, 2025

I can understand if these limitations are due to the huge load from the influx of users.
I can understand if this is done intentionally from a business perspective.

However, what I cannot understand is the complete lack of transparency or specifics on this issue. Why is it that in all the issues where people ask about this, there are zero comments from the developers or from Google, and the issues suggesting improvements to transparency are simply closed without anyone even addressing the core of the problem? Meanwhile, the official documentation still says there are 1000 requests per day, but it’s unclear what these limits actually apply to. Is it for the Pro model? For Flash? Are search queries also limited to 15 requests...?

I was thrilled when I first saw the tool, thinking to myself that this was a real game changer from Google. But now it’s starting to seem more and more like just another marketing campaign with blatant deception...

4 replies

Divyanshu-20 Jun 30, 2025
Author

Yeah even I thought it's just wild having 1000 requests per day but now I get the picture. Pro to Flash in just 4-5 prompts. Makes sense there are no comments from developers.

GUSTAVOWORKW Jul 1, 2025

Exactly, the same thing happens to me — after 4 prompts it switches to Gemini Flash, even though there's still 98% of the context left to use.

JoshuaKahle Jul 1, 2025

Please check that you have a .gitignore that excludes folders containing installed modules/libraries. They often have a huge amount of files and code and gemini CLI knows which files to exclude via the .gitignore file as far as I can tell. At least this was the issue for me. Time will tell.

rutexd Jul 1, 2025

In my use case I don't using it for coding at all - I'm using for simple web search and / or tool execution tasks without pollution of the context from a local files at all.

It switches to flash after 15-20 messages with 97% context. Even within empty directory. So I guess this isn't a real problem here.

JoshuaKahle · 2025-07-01T16:22:21Z

JoshuaKahle
Jul 1, 2025

From my brief testing, I think the way Gemini CLI works can sometimes include huge amounts of data as context and so quickly consume the free limit for 2.5 pro. I noticed when using it with a large existing codebase, its context left indicator went from 99% left to less than 40% left without having received all that much output. since 40% left would mean that around 600 000 tokens of context are being used, I can only assume that it was attaching the entire codebase as context for each prompt, but maybe I'm mistake. When I used it as my planning assistant (only one markdown document in the current directory), this context usage was reduced to only one or two percent and I could continue using 2.5 pro for much longer.

Hopefully someone can provide more information on this, but my assumption would be that either they are aiming for a fairly costly agent that has very good overall "awareness" of a given codebase due to it having the full thing in it's context window, or they are still working on making context management more efficient.

0 replies

JoshuaKahle · 2025-07-01T17:46:28Z

JoshuaKahle
Jul 1, 2025

I discovered what the issue was for me. I had an incomplete .gitignore file in the project that I experienced the rapid exceeding of my free limit with. By default, gemini-cli will use this to set it's own ignore pattern for files that should be excluded from being included in it's context. For me, it was my node_modules folder which was VERY large and explains the huge amount of input tokens I was using. Have a look at your .gitignore at the time to exclude this as a possible cause of your issue. Cheers.

0 replies

madalinpopa · 2025-07-02T14:37:58Z

madalinpopa
Jul 2, 2025

I was very enthusiast about Gemini CLI. I tried with my personal google account, all looked well and promising.
Next, I noticed that you can use it with a Gemini Code Assist subscription. Awesome! I bought a subscription and back to Gemini. However, I recently experienced this issue, where after few prompts the model is getting switched to Gemini flash. For coding tasks, gemini flash is awful. Since I'm paying for a subscription, I would like to know if there is any way to set and use only gemini 2.5 pro.

0 replies

ryanjsalva · 2025-07-02T15:08:42Z

ryanjsalva
Jul 2, 2025
Maintainer

Howdy, folks. 👋 Ryan here from the Gemini CLI team. Hopefully, I can demystify the behavior.

For those devs using the free tier by logging in with your Google account, our goal is to deliver the best possible experience at the keyboard – ideally, one where you never have to stop work because you hit a limit. To do that, we have to balance model choice with capacity. Thus, the free tier uses a blend of Gemini 2.5 Pro and Flash.

For example, we might use Flash to determine the complexity of a request before routing a request to the model for the “official” response. After all, Pro is overkill for a lot of really simple steps (e.g. “start the npm server”) better routed to Flash. Pro is better suited to big, complex tasks that require reasoning (e.g. "write integration tests for x and y micro services").

We also fallback from Pro to Flash when there are two or more slow responses. Because of the (frankly, overwhelming 🤗 ) developer response in the first week of availability, our service has returned more slow response times than we'd like. But we're working to add capacity quickly. Our error rate is now well below 1%.

We’re at the beginning of our release journey. There are still a lot of improvements we can make to improve planning and orchestration. If we get it right, you won’t have to think about which model is being used. But if you want to use a specific model, you can always use an API Key.

7 replies

Manamama Jul 2, 2025

It happens to me at every 10th attempt or so. Needs restarting gemini-cli. (Sometimes session memory is lost, so do save things...)

JoshuaKahle Jul 3, 2025

Thanks for the clarification! It's still early days so of course there will be things to iron out, but I'm super excited about how far this can go. Good luck to the team.

Manamama Jul 3, 2025

Another tip: as indeed the above (impossible to even start gemini-cli from the terminal, with one default API) happened about 5 times in a row yesterday, indeed => I have used another API key from another Google account of mine...

Worked then, until yesterday, but indeed, today neither of the three works, as all 'limits exceeded', hm...
ver. 1.1

carlosglz1912 Jul 3, 2025

Esto explica mucho, en mi caso me autentique con una cuenta Pro y después de la primer sesión y con aun + 85% de contexto disponible el modelo hizo fallback de manera automatica a 2.5 flash bajo la leyenda de 'Slow response times detected.'. Espero se solucione pronto

skaman5 Aug 25, 2025

Hi @ryanjsalva - left a message below on the issue. But I am experiencing an unhandled error loop in Gemini CLI. This is beyond running between models, and seems to be the combination of a few bugs in implementation.
#2436 (comment)

rutexd · 2025-07-03T11:10:07Z

rutexd
Jul 3, 2025

Thank you for the clarification. The plan sounds good and reasonable, assuming that the flash model works well. However, in practice, the flash model is, frankly speaking, unusable.

The model forgets or ignores its own context, doesn’t understand synonyms, lies, and can’t handle elementary tasks like replacing text in a file—it either doesn’t realize it has already made the change and gets stuck in an endless loop repeating the same action, or it fails to learn from its own mistakes, even though all the necessary context is present. I can’t understand where you got the 1% figure from; it’s at least ten times higher, if not more.

The only thing that helps is to write a detailed “poem” describing exactly what I want the model to do. But by the time I’ve written this “poem,” taking into account all the model’s previous mistakes, it would be easier for me to write a script to solve the problem or just spend 20 minutes doing the task manually rather than 30 minutes crafting and testing a prompt, hoping the model won’t destroy my data. Even the very first public models from OpenAI or Llama weren’t this dumb.

With the pro model, there are far fewer such errors (though they still happen), and I’m willing to spend an extra five minutes on the pro model rather than boxing with the flash model. And if you want to train your models on user data, flash will only create even more problems. It would be best to disable flash altogether, if financially possible. Or at the very least, don’t switch users to flash model under the pretext of slow responses.

0 replies

Manamama · 2025-07-03T12:43:25Z

Manamama
Jul 3, 2025

A very elegant summary of our experiences this is: #2436 (comment), thanks. BTW, speaking of loops: #2923: it has not happened to me today, after the CLI client update.

A very informal tip: I do with Flash-level Gemini AIs what I have done for years with e.g. https://github.com/OpenInterpreter/open-interpreter and such: more hand holding (putting myself in THEIR shoes mentally, aka orchestrating), via "anti-goldfish memory syndrome" set of artefacts or heuristics. E.g. this one, informal one too for now, works reasonably well: #2386 (comment). But indeed, User must remind AIs to check these too, now and then (= active read_file function call) as otherwise the AIs tend to treat these as ornamentation, some faux QA bumf (a sin of many human junior PMs, too...).

0 replies

Thijs199 · 2025-07-04T09:28:08Z

Thijs199
Jul 4, 2025

When will the Google AI Ultra subscription be available to link with gemini cli to allow for much more use of the pro version similar to claude code and the max plan? Until then this is unusable for me!

0 replies

Manamama · 2025-07-04T10:48:13Z

Manamama
Jul 4, 2025

BTW, on free tier still (but using a couple of old accounts, now and then), and I figured out why the tokens may be consumed too fast. Virgin session (new day) and this Gemini makes this mistake:

Tips for getting started:
1. Ask questions, edit files, or run commands.
2. Be specific for the best results.
3. /help for more information.


╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  > For some reasons (they miscompiled), I have removed onnxruntime . Check the logic of the remaining  mentions of onnxruntime_cxx_api.h when you see the m in subfolders, as they still   │
│    try to run.                                                                                                                                                                             │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ I will check for remaining mentions of onnxruntime_cxx_api.h in the subfolders, since they may still be attempting to run despite the removal of onnxruntime. I'll start by searching the
  entire project directory for all occurrences of that header file.
 ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ ✔  SearchText 'onnxruntime_cxx_api.h'                                                                                                                                               │
 │                                                                                                                                                                                     │
 │    Found 26 matches                                                                                                                                                                 │
 ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✕ [API Error: {"error":{"message":"{\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"The input token count (1986493) exceeds the maximum number of tokens allowed (1048576).\",\n    
  \"status\": \"INVALID_ARGUMENT\"\n  }\n}\n","code":400,"status":"Bad Request"}}]


╭───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  > You seem to try ingesting ALL the files by hand. So your buffer of ingest overflowed. Use instead: grep etc.  │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ You're right. Using a more targeted approach like grep is much more efficient. I'll use that to find the files containing onnxruntime_cxx_api.h.


{Gemini: greps only then and finds much more easily and "cheaply"}

In short, Gemini tries to ingest (read in full) all the files asap and of course: The input token count (1986493) exceeds the maximum number . Surely a bug and not a feature as these are not instructions to process fully, most of the code (mid sized repo) is bumpf to Gemini, to be filtered out by Unix level pipes. But Gemini seems not to know of it (yet).

0 replies

skaman5 · 2025-08-25T07:31:26Z

skaman5
Aug 25, 2025

Free tier product implementation seems to have flaw on CLI. Take a day or more off, and try to prompt context window with over 85% left, and it immediately outputs unhelpful messaging, staying in poor user experience with repeated loop of messages for over 30 minutes at a time. Simply the request should state first time, terminate process, and direct user to try again in X time. Have a request open for over 30 minutes of runtime:
✕ [API Error: {"error":{"message":"{\n "error": {\n "code": 429,\n "message": "You exceeded your current quota, please check your plan and billing details. For more
information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",\n "status": "RESOURCE_EXHAUSTED",\n "details": [\n {\n "@type":
"type.googleapis.com/google.rpc.QuotaFailure",\n "violations": [\n {\n "quotaMetric":
"generativelanguage.googleapis.com/generate_content_free_tier_input_token_count",\n "quotaId": "GenerateContentInputTokensPerModelPerMinute-FreeTier",\n
"quotaDimensions": {\n "location": "global",\n "model": "gemini-2.5-pro"\n },\n "quotaValue": "125000"\n }\n
]\n },\n {\n "@type": "type.googleapis.com/google.rpc.Help",\n "links": [\n {\n "description": "Learn more about Gemini API
quotas",\n "url": "https://ai.google.dev/gemini-api/docs/rate-limits"\n }\n ]\n },\n {\n "@type":
"type.googleapis.com/google.rpc.RetryInfo",\n "retryDelay": "55s"\n }\n ]\n }\n}\n","code":429,"status":"Too Many Requests"}}]
Please wait and try again later. To increase your limits, request a quota increase through AI Studio, or switch to another /auth method.

0 replies

Gemini 2.5 Pro Rate-Limited on Free Tier After 10–15 Prompts (Also via API) #2436

Uh oh!

Uh oh!

Replies: 11 comments · 12 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Divyanshu-20 Jun 30, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryanjsalva Jul 2, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 11 comments 12 replies

Divyanshu-20 Jun 30, 2025
Author

ryanjsalva
Jul 2, 2025
Maintainer