@@ -329,10 +329,19 @@ The proxy includes a powerful text-based UI for configuration and management.
329329
330330** Antigravity:**
331331- Gemini 3 Pro with ` thinkingLevel ` support
332+ - Gemini 2.5 Flash/Flash Lite with thinking mode
332333- Claude Opus 4.5 (thinking mode)
333334- Claude Sonnet 4.5 (thinking and non-thinking)
335+ - GPT-OSS 120B Medium
334336- Thought signature caching for multi-turn conversations
335337- Tool hallucination prevention
338+ - Quota baseline tracking with background refresh
339+ - Parallel tool usage instruction injection
340+ - ** Quota Groups** : Models that share quota are automatically grouped:
341+ - Claude/GPT-OSS: ` claude-sonnet-4-5 ` , ` claude-opus-4-5 ` , ` gpt-oss-120b-medium `
342+ - Gemini 3 Pro: ` gemini-3-pro-high ` , ` gemini-3-pro-low ` , ` gemini-3-pro-preview `
343+ - Gemini 2.5 Flash: ` gemini-2.5-flash ` , ` gemini-2.5-flash-thinking ` , ` gemini-2.5-flash-lite `
344+ - All models in a group deplete the usage of the group equally. So in claude group - it is beneficial to use only Opus, and forget about Sonnet and GPT-OSS.
336345
337346** Qwen Code:**
338347- Dual auth (API key + OAuth Device Flow)
@@ -394,6 +403,8 @@ The proxy includes a powerful text-based UI for configuration and management.
394403| ` CONCURRENCY_MULTIPLIER_<PROVIDER>_PRIORITY_<N> ` | Concurrency multiplier per priority tier |
395404| ` QUOTA_GROUPS_<PROVIDER>_<GROUP> ` | Models sharing quota limits |
396405| ` OVERRIDE_TEMPERATURE_ZERO ` | ` remove ` or ` set ` to prevent tool hallucination |
406+ | ` GEMINI_CLI_QUOTA_REFRESH_INTERVAL ` | Quota baseline refresh interval in seconds (default: 300) |
407+ | ` ANTIGRAVITY_QUOTA_REFRESH_INTERVAL ` | Quota baseline refresh interval in seconds (default: 300) |
397408
398409</details >
399410
@@ -512,14 +523,48 @@ Uses Google OAuth to access internal Gemini endpoints with higher rate limits.
512523- Automatic free-tier project onboarding
513524- Paid vs free tier detection
514525- Smart fallback on rate limits
526+ - Quota baseline tracking with background refresh (accurate remaining quota estimates)
527+ - Sequential rotation mode (uses credentials until quota exhausted)
528+
529+ ** Quota Groups:** Models that share quota are automatically grouped:
530+ - ** Pro** : ` gemini-2.5-pro ` , ` gemini-3-pro-preview `
531+ - ** 2.5-Flash** : ` gemini-2.0-flash ` , ` gemini-2.5-flash ` , ` gemini-2.5-flash-lite `
532+ - ** 3-Flash** : ` gemini-3-flash-preview `
533+
534+ All models in a group deplete the shared quota equally. 24-hour per-model quota windows.
515535
516536** Environment Variables (for stateless deployment):**
537+
538+ Single credential (legacy):
517539``` env
518540GEMINI_CLI_ACCESS_TOKEN="ya29.your-access-token"
519541GEMINI_CLI_REFRESH_TOKEN="1//your-refresh-token"
520542GEMINI_CLI_EXPIRY_DATE="1234567890000"
521543GEMINI_CLI_EMAIL="your-email@gmail.com"
522544GEMINI_CLI_PROJECT_ID="your-gcp-project-id" # Optional
545+ GEMINI_CLI_TIER="standard-tier" # Optional: standard-tier or free-tier
546+ ```
547+
548+ Multiple credentials (use ` _N_ ` suffix where N is 1, 2, 3...):
549+ ``` env
550+ GEMINI_CLI_1_ACCESS_TOKEN="ya29.first-token"
551+ GEMINI_CLI_1_REFRESH_TOKEN="1//first-refresh"
552+ GEMINI_CLI_1_EXPIRY_DATE="1234567890000"
553+ GEMINI_CLI_1_EMAIL="first@gmail.com"
554+ GEMINI_CLI_1_PROJECT_ID="project-1"
555+ GEMINI_CLI_1_TIER="standard-tier"
556+
557+ GEMINI_CLI_2_ACCESS_TOKEN="ya29.second-token"
558+ GEMINI_CLI_2_REFRESH_TOKEN="1//second-refresh"
559+ GEMINI_CLI_2_EXPIRY_DATE="1234567890000"
560+ GEMINI_CLI_2_EMAIL="second@gmail.com"
561+ GEMINI_CLI_2_PROJECT_ID="project-2"
562+ GEMINI_CLI_2_TIER="free-tier"
563+ ```
564+
565+ ** Feature Toggles:**
566+ ``` env
567+ GEMINI_CLI_QUOTA_REFRESH_INTERVAL=300 # Quota refresh interval in seconds (default: 300 = 5 min)
523568```
524569
525570</details >
@@ -531,9 +576,11 @@ Access Google's internal Antigravity API for cutting-edge models.
531576
532577** Supported Models:**
533578- ** Gemini 3 Pro** — with ` thinkingLevel ` support (low/high)
579+ - ** Gemini 2.5 Flash** — with thinking mode support
580+ - ** Gemini 2.5 Flash Lite** — configurable thinking budget
534581- ** Claude Opus 4.5** — Anthropic's most powerful model (thinking mode only)
535582- ** Claude Sonnet 4.5** — supports both thinking and non-thinking modes
536- - Gemini 2.5 Pro/Flash
583+ - ** GPT-OSS 120B ** — OpenAI-compatible model
537584
538585** Setup:**
5395861 . Run ` python -m rotator_library.credential_tool `
@@ -545,6 +592,8 @@ Access Google's internal Antigravity API for cutting-edge models.
545592- Tool hallucination prevention via parameter signature injection
546593- Automatic thinking block sanitization for Claude
547594- Credential prioritization (paid resets every 5 hours, free weekly)
595+ - Quota baseline tracking with background refresh (accurate remaining quota estimates)
596+ - Parallel tool usage instruction injection for Claude
548597
549598** Environment Variables:**
550599``` env
@@ -556,6 +605,8 @@ ANTIGRAVITY_EMAIL="your-email@gmail.com"
556605# Feature toggles
557606ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
558607ANTIGRAVITY_GEMINI3_TOOL_FIX=true
608+ ANTIGRAVITY_QUOTA_REFRESH_INTERVAL=300 # Quota refresh interval (seconds)
609+ ANTIGRAVITY_PARALLEL_TOOL_INSTRUCTION_CLAUDE=true # Parallel tool instruction for Claude
559610```
560611
561612> ** Note:** Gemini 3 models require a paid-tier Google Cloud project.
0 commit comments