-
Notifications
You must be signed in to change notification settings - Fork 137
feat(server): add per-role rate limits to openai proxy #1698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @jezekra1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a robust role-based rate limiting system for the OpenAI proxy, allowing administrators to define distinct usage quotas for different user roles across various OpenAI API functionalities. It also includes a substantial refactoring of the OpenAI proxy's internal architecture, transitioning to a more flexible and maintainable adapter pattern. These changes ensure better resource management, prevent abuse, and improve the overall stability and extensibility of the proxy service. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a robust role-based rate limiting mechanism for the OpenAI proxy endpoints, significantly enhancing the server's control over resource consumption. The changes involve a substantial refactoring of the model provider interaction logic, moving it behind new interfaces (IOpenAIProxy, IOpenAIChatCompletionProxyAdapter, IOpenAIEmbeddingProxyAdapter) and adapter implementations for various model providers (OpenAI, WatsonX, RITS, Anthropic, Github, Voyage). This refactoring greatly improves modularity, testability, and extensibility.
Key improvements include:
- Modular Design: The introduction of proxy interfaces and adapters cleanly separates the concerns of model interaction from the API layer.
- Role-Based Rate Limiting: Users are now subject to rate limits based on their assigned roles, with separate limits for chat completion tokens, chat completion requests, and embedding inputs.
- Token Cost Estimation: A heuristic-based token cost estimation is implemented for LLM and embedding requests, allowing for more granular rate limiting.
- Improved Error Handling: Rate limit exceeded errors are now properly caught and returned with appropriate HTTP status codes and
X-RateLimit-*headers. - Configuration Updates: The
Configurationand Helm charts have been updated to support the new rate limiting parameters. - Comprehensive Testing: New integration tests have been added to validate the rate limiting functionality.
Overall, this is a well-executed feature that improves the stability and manageability of the server, particularly for AI model interactions.
fabd04a to
d826010
Compare
Signed-off-by: Radek Ježek <[email protected]>
d826010 to
da7c128
Compare
Summary
Linked Issues
Ref: #1450
Documentation