Skip to content

Conversation

@b3nw
Copy link
Contributor

@b3nw b3nw commented Jan 16, 2026

Implement quota tracking for Chutes provider using a simple standalone mixin pattern:

Core Implementation:

  • ChutesQuotaTracker: Standalone mixin (no complex base class inheritance)
  • Tracks credential-level quota (1 request = 1 credit consumed)
  • Daily quota reset at 00:00 UTC
  • Automatic tier detection (Legacy=200, Base=300, Plus=2000, Pro=5000)

Features:

  • fetch_quota_usage(): Queries Chutes API for quota/used credits
  • Background refresh job: Periodic quota updates via run_background_job()
  • Integration with UsageManager using virtual model 'chutes/_quota'
  • Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL env var

Files:

  • NEW: chutes_quota_tracker.py (343 lines) - Standalone quota mixin
  • MODIFIED: chutes_provider.py - Add quota tracking + background job
  • MODIFIED: usage_manager.py - Add 'chutes' to _REQUEST_COUNT_PROVIDERS

API Integration:

Architecture: Simple standalone mixin for credential-level quota tracking.


Important

Adds quota tracking for Chutes provider using a mixin pattern, with daily resets and tier detection, integrated into UsageManager.

  • Behavior:
    • Introduces ChutesQuotaTracker mixin for quota tracking in chutes_quota_tracker.py.
    • Tracks credential-level quota with daily reset at 00:00 UTC.
    • Automatic tier detection based on quota values.
    • Integrates with UsageManager using virtual model chutes/_quota.
  • Features:
    • fetch_quota_usage(): Fetches quota usage from Chutes API.
    • Background job for periodic quota updates in chutes_provider.py.
    • Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL.
  • Files:
    • NEW: chutes_quota_tracker.py for quota tracking mixin.
    • MODIFIED: chutes_provider.py to include quota tracking and background job.
    • MODIFIED: usage_manager.py to add 'chutes' to _REQUEST_COUNT_PROVIDERS.
  • API Integration:
    • Endpoint: GET https://api.chutes.ai/users/me/quota_usage/me.
    • Auth: Raw API key in Authorization header.
  • Architecture:
    • Simple standalone mixin for credential-level quota tracking.

This description was created by Ellipsis for 556658e. You can customize this summary. It will automatically update as commits are pushed.

@b3nw b3nw requested a review from Mirrowel as a code owner January 16, 2026 02:50
@mirrobot-agent
Copy link
Contributor

I'm starting my review of the Chutes quota tracking implementation. It looks like a solid addition with the new mixin pattern. Diving into the details now!

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This is a well-structured PR that implements quota tracking for the Chutes provider using a clean mixin pattern. The addition of the background refresh job and the integration with UsageManager via a virtual model are solid architectural choices.

However, there are a few points regarding performance and a potentially critical missing piece (quota grouping) that should be addressed to ensure the tracking works correctly across all models.

Architectural Feedback

The use of a mixin (ChutesQuotaTracker) is excellent for keeping the main provider class clean. The primary concern is the missing quota grouping. Without implementing get_model_quota_group in ChutesProvider, the UsageManager will treat each model as having its own independent quota, which contradicts the "credential-level" nature of Chutes quotas.

Key Suggestions

  • Implement Quota Grouping: Add get_model_quota_group to ChutesProvider to ensure all models share the same usage stats.
  • Parallelize Background Job: Use asyncio.gather in run_background_job to refresh multiple credentials efficiently.
  • Client Reuse: Allow passing an existing httpx.AsyncClient to fetch_quota_usage to avoid repeated overhead.

Questions for the Author

  • Is the authentication difference between get_models (Bearer) and fetch_quota_usage (Raw) intentional?
  • Does the Chutes API return any reset-related headers that we could use instead of the local 00:00 UTC calculation?

This review was generated by an AI assistant.

Implement quota tracking for Chutes provider using a simple standalone mixin pattern:

Core Implementation:
- ChutesQuotaTracker: Standalone mixin (no complex base class inheritance)
- Tracks credential-level quota (1 request = 1 credit consumed)
- Daily quota reset at 00:00 UTC
- Automatic tier detection (Legacy=200, Base=300, Plus=2000, Pro=5000)

Features:
- fetch_quota_usage(): Queries Chutes API for quota/used credits
- Background refresh job: Parallel quota updates via asyncio.gather
- Integration with UsageManager using virtual model 'chutes/_quota'
- get_model_quota_group(): Returns 'chutes_global' for shared quota pool
- Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL env var

Performance:
- Shared httpx.AsyncClient for connection reuse
- Parallel credential fetching with semaphore (max 5 concurrent)

Files:
- NEW: chutes_quota_tracker.py - Standalone quota mixin
- MODIFIED: chutes_provider.py - Add quota tracking + background job
- MODIFIED: usage_manager.py - Add 'chutes' to _REQUEST_COUNT_PROVIDERS

API Integration:
- Endpoint: GET https://api.chutes.ai/users/me/quota_usage/me
- Returns: { quota: int, used: float }
- Auth: Bearer token in Authorization header

Architecture: Simple standalone mixin for credential-level quota tracking.
No OAuth complexity, no cost learning algorithms, no per-model quotas.
@b3nw b3nw force-pushed the feature/chutes-quota-system branch from 556658e to 8105c20 Compare January 16, 2026 03:07
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

The author has done an excellent job of addressing the feedback from the previous review. The architecture is now more robust and efficient.

Assessment of New Changes:

  • Quota Grouping: Successfully implemented get_model_quota_group, ensuring shared usage tracking across all models.
  • Performance: Parallelized the background quota refresh job and implemented HTTP client reuse, which will reduce latency and overhead.
  • Consistency: Unified the authentication scheme to use Bearer tokens across all endpoints.
  • Integration: Correctly integrated 'chutes' into the UsageManager's request-count tracking.

Overall Status:
The PR is now in a high-quality state and ready for merging.

This review was generated by an AI assistant.

os.environ.get("CHUTES_QUOTA_REFRESH_INTERVAL", "300")
)

def get_model_quota_group(self, model: str) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix verified: Implementing get_model_quota_group correctly ensures that all Chutes models share the same credential-level quota pool in the UsageManager. This addresses the grouping issue raised in the previous review.

tasks = [
refresh_single_credential(api_key, client) for api_key in credentials
]
await asyncio.gather(*tasks, return_exceptions=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix verified: The background job now correctly parallelizes quota fetches using asyncio.gather and a shared httpx.AsyncClient. This significantly improves efficiency when managing multiple credentials, as suggested.

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Chutes Quota Tracking

Summary

Well-structured PR that adds quota tracking for the Chutes provider using a clean mixin pattern. The architecture follows established patterns in the codebase.

Positive Aspects

  • ✅ Clean mixin architecture following existing patterns (BaseQuotaTracker)
  • ✅ Proper quota grouping via get_model_quota_group() returning "chutes_global"
  • ✅ Performance optimizations (parallel fetching with semaphore, client reuse)
  • ✅ Good error handling with consistent response structures
  • ✅ Correct integration with UsageManager (added to _REQUEST_COUNT_PROVIDERS)
  • ✅ Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL env var

Issues Found

See inline comments below for specifics.

Verdict

APPROVE with minor suggestions - Ready for merge after addressing the minor issues noted in inline comments.

# Create a local logger for this module
import logging

lib_logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent Logger: This uses __name__ while chutes_quota_tracker.py uses "rotator_library". For consistency with other providers in this codebase, consider:

lib_logger = logging.getLogger("rotator_library")

return [
f"chutes/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing HTTPStatusError handling: This only catches RequestError (connection issues) but not HTTPStatusError (4xx/5xx responses). If the API returns a 401 or 500, this will propagate as an unhandled exception.

Consider:

except (httpx.RequestError, httpx.HTTPStatusError) as e:

lib_logger.warning(f"Failed to refresh Chutes quota usage: {e}")

# Fetch all credentials in parallel with shared HTTP client
async with httpx.AsyncClient() as client:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The httpx.AsyncClient() is created without a timeout. While individual requests in fetch_quota_usage() have a 30s timeout, consider adding a default timeout to the client for robustness:

async with httpx.AsyncClient(timeout=30.0) as client:

import httpx

if TYPE_CHECKING:
from ...usage_manager import UsageManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: UsageManager is imported here under TYPE_CHECKING but never used in type hints in this file (it's only used in chutes_provider.py). Consider removing this unused import.

@Mirrowel
Copy link
Owner

@b3nw address those comments in the latest review and it can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants