-
Notifications
You must be signed in to change notification settings - Fork 322
Added ability to configure concurrent_requests in litellm_model.py #911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey ! thanks for the fix, if the tests pass we are good to merge
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Nathan Habib <[email protected]>
Thanks for the review @NathanHB 🙇 I've updated with the suggested change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this is good to merge, i just missed that you need to add the new parametr to the docstring, thanks !
Sorry totally should have thought of that. Update made now |
This pull request updates the way concurrent API requests are configured for the LiteLLM model endpoint. Instead of using a hardcoded value, the number of concurrent requests is now configurable through the model's configuration object, making the code more flexible and easier to tune for different environments.
Configuration improvements:
concurrent_requests
to theLiteLLMModelConfig
class, allowing the number of concurrent API requests to be set via configuration instead of being hardcoded.LiteLLMClient
class to use the newconcurrent_requests
configuration value, removing the old hardcodedCONCURRENT_CALLS
attribute.Concurrency handling:
__call_api_parallel
method to use the configurableconcurrent_requests
value when creating theThreadPoolExecutor
, improving flexibility and maintainability.