Added ability to configure concurrent_requests in litellm_model.py #911
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request updates the way concurrent API requests are configured for the LiteLLM model endpoint. Instead of using a hardcoded value, the number of concurrent requests is now configurable through the model's configuration object, making the code more flexible and easier to tune for different environments.
Configuration improvements:
concurrent_requests
to theLiteLLMModelConfig
class, allowing the number of concurrent API requests to be set via configuration instead of being hardcoded.LiteLLMClient
class to use the newconcurrent_requests
configuration value, removing the old hardcodedCONCURRENT_CALLS
attribute.Concurrency handling:
__call_api_parallel
method to use the configurableconcurrent_requests
value when creating theThreadPoolExecutor
, improving flexibility and maintainability.