-
-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Bug Description
The local quota request_count can unexpectedly reset to 0 when the background quota refresh syncs with the API and receives remainingFraction: 1.0 (100% remaining).
Observed Behavior
From logs, quota tracking showed:
01:56:52 - quota: 0/400 [100%] (first request)
01:56:57 - quota: 1/400 [99%]
01:57:06 - quota: 2/400 [99%]
01:57:35 - quota: 3/400 [99%]
01:58:36 - quota: 4/400 [99%]
01:59:37 - quota: 0/400 [100%] β UNEXPECTED RESET
02:00:42 - quota: 1/400 [99%]
The quota counter reset from 4 back to 0 mid-session, losing tracking of recent requests.
Root Cause
In src/rotator_library/usage_manager.py, the update_quota_baseline() function unconditionally overwrites the local request_count with the API's value:
# Line ~2214-2216
# Sync local request count to API's authoritative value
model_data["request_count"] = used_requests
model_data["requests_at_baseline"] = used_requestsWhere used_requests = int((1.0 - remaining_fraction) * max_requests).
When the background quota refresh runs and the API returns remainingFraction: 1.0 (100% remaining), used_requests becomes 0, and all local tracking is wiped.
Why the API might return 100%:
- API response caching/staleness
- Different quota window timing on server vs local expectations
- Requests not yet reflected in API's quota calculation
Proposed Fixes
Option 1: Only allow API to increase count (simpler)
# Sync local request count to API's authoritative value
# Use max() to prevent API from resetting our count if it returns stale/cached 100%
# The API can only increase our count (if we missed requests), not decrease it
current_count = model_data.get("request_count", 0)
model_data["request_count"] = max(current_count, used_requests)
model_data["requests_at_baseline"] = model_data["request_count"]Pros: Simple, directly addresses the bug, low risk
Cons: If quota genuinely resets server-side, local count won't decrease until local window reset logic triggers
Option 2: Track requests between syncs (more precise)
# Calculate requests made locally since last baseline
requests_since_baseline = max(0, model_data.get("request_count", 0) - model_data.get("requests_at_baseline", 0))
# New count = API's count + our local requests since last sync
model_data["request_count"] = used_requests + requests_since_baseline
model_data["requests_at_baseline"] = used_requestsPros: More accurate, properly accounts for requests made between API fetches
Cons: More complex logic, potential for double-counting edge cases
Environment
- Branch:
dev(verified bug exists in upstream dev) - Affected providers: Antigravity, Gemini CLI (any provider using
update_quota_baseline)
Questions for Maintainer
- Which fix approach is preferred?
- Are there any edge cases we should consider (e.g., quota group syncing, cross-credential scenarios)?