Skip to content

Bug: Quota count resets unexpectedly when background refresh syncs with APIΒ #75

@b3nw

Description

@b3nw

Bug Description

The local quota request_count can unexpectedly reset to 0 when the background quota refresh syncs with the API and receives remainingFraction: 1.0 (100% remaining).

Observed Behavior

From logs, quota tracking showed:

01:56:52 - quota: 0/400 [100%]  (first request)
01:56:57 - quota: 1/400 [99%]
01:57:06 - quota: 2/400 [99%]
01:57:35 - quota: 3/400 [99%]
01:58:36 - quota: 4/400 [99%]
01:59:37 - quota: 0/400 [100%]  ← UNEXPECTED RESET
02:00:42 - quota: 1/400 [99%]

The quota counter reset from 4 back to 0 mid-session, losing tracking of recent requests.

Root Cause

In src/rotator_library/usage_manager.py, the update_quota_baseline() function unconditionally overwrites the local request_count with the API's value:

# Line ~2214-2216
# Sync local request count to API's authoritative value
model_data["request_count"] = used_requests
model_data["requests_at_baseline"] = used_requests

Where used_requests = int((1.0 - remaining_fraction) * max_requests).

When the background quota refresh runs and the API returns remainingFraction: 1.0 (100% remaining), used_requests becomes 0, and all local tracking is wiped.

Why the API might return 100%:

  • API response caching/staleness
  • Different quota window timing on server vs local expectations
  • Requests not yet reflected in API's quota calculation

Proposed Fixes

Option 1: Only allow API to increase count (simpler)

# Sync local request count to API's authoritative value
# Use max() to prevent API from resetting our count if it returns stale/cached 100%
# The API can only increase our count (if we missed requests), not decrease it
current_count = model_data.get("request_count", 0)
model_data["request_count"] = max(current_count, used_requests)
model_data["requests_at_baseline"] = model_data["request_count"]

Pros: Simple, directly addresses the bug, low risk
Cons: If quota genuinely resets server-side, local count won't decrease until local window reset logic triggers

Option 2: Track requests between syncs (more precise)

# Calculate requests made locally since last baseline
requests_since_baseline = max(0, model_data.get("request_count", 0) - model_data.get("requests_at_baseline", 0))

# New count = API's count + our local requests since last sync  
model_data["request_count"] = used_requests + requests_since_baseline
model_data["requests_at_baseline"] = used_requests

Pros: More accurate, properly accounts for requests made between API fetches
Cons: More complex logic, potential for double-counting edge cases

Environment

  • Branch: dev (verified bug exists in upstream dev)
  • Affected providers: Antigravity, Gemini CLI (any provider using update_quota_baseline)

Questions for Maintainer

  1. Which fix approach is preferred?
  2. Are there any edge cases we should consider (e.g., quota group syncing, cross-credential scenarios)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions