-
-
Notifications
You must be signed in to change notification settings - Fork 51
feat(chutes): add quota tracking with daily credit limits #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
|
I'm starting my review of the Chutes quota tracking implementation. It looks like a solid addition with the new mixin pattern. Diving into the details now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Assessment
This is a well-structured PR that implements quota tracking for the Chutes provider using a clean mixin pattern. The addition of the background refresh job and the integration with UsageManager via a virtual model are solid architectural choices.
However, there are a few points regarding performance and a potentially critical missing piece (quota grouping) that should be addressed to ensure the tracking works correctly across all models.
Architectural Feedback
The use of a mixin (ChutesQuotaTracker) is excellent for keeping the main provider class clean. The primary concern is the missing quota grouping. Without implementing get_model_quota_group in ChutesProvider, the UsageManager will treat each model as having its own independent quota, which contradicts the "credential-level" nature of Chutes quotas.
Key Suggestions
- Implement Quota Grouping: Add
get_model_quota_grouptoChutesProviderto ensure all models share the same usage stats. - Parallelize Background Job: Use
asyncio.gatherinrun_background_jobto refresh multiple credentials efficiently. - Client Reuse: Allow passing an existing
httpx.AsyncClienttofetch_quota_usageto avoid repeated overhead.
Questions for the Author
- Is the authentication difference between
get_models(Bearer) andfetch_quota_usage(Raw) intentional? - Does the Chutes API return any reset-related headers that we could use instead of the local 00:00 UTC calculation?
This review was generated by an AI assistant.
src/rotator_library/providers/utilities/chutes_quota_tracker.py
Outdated
Show resolved
Hide resolved
src/rotator_library/providers/utilities/chutes_quota_tracker.py
Outdated
Show resolved
Hide resolved
Implement quota tracking for Chutes provider using a simple standalone mixin pattern: Core Implementation: - ChutesQuotaTracker: Standalone mixin (no complex base class inheritance) - Tracks credential-level quota (1 request = 1 credit consumed) - Daily quota reset at 00:00 UTC - Automatic tier detection (Legacy=200, Base=300, Plus=2000, Pro=5000) Features: - fetch_quota_usage(): Queries Chutes API for quota/used credits - Background refresh job: Parallel quota updates via asyncio.gather - Integration with UsageManager using virtual model 'chutes/_quota' - get_model_quota_group(): Returns 'chutes_global' for shared quota pool - Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL env var Performance: - Shared httpx.AsyncClient for connection reuse - Parallel credential fetching with semaphore (max 5 concurrent) Files: - NEW: chutes_quota_tracker.py - Standalone quota mixin - MODIFIED: chutes_provider.py - Add quota tracking + background job - MODIFIED: usage_manager.py - Add 'chutes' to _REQUEST_COUNT_PROVIDERS API Integration: - Endpoint: GET https://api.chutes.ai/users/me/quota_usage/me - Returns: { quota: int, used: float } - Auth: Bearer token in Authorization header Architecture: Simple standalone mixin for credential-level quota tracking. No OAuth complexity, no cost learning algorithms, no per-model quotas.
556658e to
8105c20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up Review
The author has done an excellent job of addressing the feedback from the previous review. The architecture is now more robust and efficient.
Assessment of New Changes:
- Quota Grouping: Successfully implemented
get_model_quota_group, ensuring shared usage tracking across all models. - Performance: Parallelized the background quota refresh job and implemented HTTP client reuse, which will reduce latency and overhead.
- Consistency: Unified the authentication scheme to use
Bearertokens across all endpoints. - Integration: Correctly integrated 'chutes' into the
UsageManager's request-count tracking.
Overall Status:
The PR is now in a high-quality state and ready for merging.
This review was generated by an AI assistant.
| os.environ.get("CHUTES_QUOTA_REFRESH_INTERVAL", "300") | ||
| ) | ||
|
|
||
| def get_model_quota_group(self, model: str) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix verified: Implementing get_model_quota_group correctly ensures that all Chutes models share the same credential-level quota pool in the UsageManager. This addresses the grouping issue raised in the previous review.
| tasks = [ | ||
| refresh_single_credential(api_key, client) for api_key in credentials | ||
| ] | ||
| await asyncio.gather(*tasks, return_exceptions=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix verified: The background job now correctly parallelizes quota fetches using asyncio.gather and a shared httpx.AsyncClient. This significantly improves efficiency when managing multiple credentials, as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review: Chutes Quota Tracking
Summary
Well-structured PR that adds quota tracking for the Chutes provider using a clean mixin pattern. The architecture follows established patterns in the codebase.
Positive Aspects
- ✅ Clean mixin architecture following existing patterns (
BaseQuotaTracker) - ✅ Proper quota grouping via
get_model_quota_group()returning"chutes_global" - ✅ Performance optimizations (parallel fetching with semaphore, client reuse)
- ✅ Good error handling with consistent response structures
- ✅ Correct integration with
UsageManager(added to_REQUEST_COUNT_PROVIDERS) - ✅ Configurable refresh interval via
CHUTES_QUOTA_REFRESH_INTERVALenv var
Issues Found
See inline comments below for specifics.
Verdict
APPROVE with minor suggestions - Ready for merge after addressing the minor issues noted in inline comments.
| # Create a local logger for this module | ||
| import logging | ||
|
|
||
| lib_logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent Logger: This uses __name__ while chutes_quota_tracker.py uses "rotator_library". For consistency with other providers in this codebase, consider:
lib_logger = logging.getLogger("rotator_library")| return [ | ||
| f"chutes/{model['id']}" for model in response.json().get("data", []) | ||
| ] | ||
| except httpx.RequestError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing HTTPStatusError handling: This only catches RequestError (connection issues) but not HTTPStatusError (4xx/5xx responses). If the API returns a 401 or 500, this will propagate as an unhandled exception.
Consider:
except (httpx.RequestError, httpx.HTTPStatusError) as e:| lib_logger.warning(f"Failed to refresh Chutes quota usage: {e}") | ||
|
|
||
| # Fetch all credentials in parallel with shared HTTP client | ||
| async with httpx.AsyncClient() as client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The httpx.AsyncClient() is created without a timeout. While individual requests in fetch_quota_usage() have a 30s timeout, consider adding a default timeout to the client for robustness:
async with httpx.AsyncClient(timeout=30.0) as client:| import httpx | ||
|
|
||
| if TYPE_CHECKING: | ||
| from ...usage_manager import UsageManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import: UsageManager is imported here under TYPE_CHECKING but never used in type hints in this file (it's only used in chutes_provider.py). Consider removing this unused import.
|
@b3nw address those comments in the latest review and it can be merged. |
Implement quota tracking for Chutes provider using a simple standalone mixin pattern:
Core Implementation:
Features:
Files:
API Integration:
Architecture: Simple standalone mixin for credential-level quota tracking.
Important
Adds quota tracking for Chutes provider using a mixin pattern, with daily resets and tier detection, integrated into
UsageManager.ChutesQuotaTrackermixin for quota tracking inchutes_quota_tracker.py.UsageManagerusing virtual modelchutes/_quota.fetch_quota_usage(): Fetches quota usage from Chutes API.chutes_provider.py.CHUTES_QUOTA_REFRESH_INTERVAL.chutes_quota_tracker.pyfor quota tracking mixin.chutes_provider.pyto include quota tracking and background job.usage_manager.pyto add 'chutes' to_REQUEST_COUNT_PROVIDERS.GET https://api.chutes.ai/users/me/quota_usage/me.This description was created by
for 556658e. You can customize this summary. It will automatically update as commits are pushed.