-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Inference API] add service and task type aware rate limiting #125880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] add service and task type aware rate limiting #125880
Conversation
| TreeMap<TaskType, MaxNodesPerGroupingStrategy> alibabaCloudSearchConfigs = new TreeMap<>(); | ||
| var alibabaCloudSearchService = serviceRegistry.getService(AlibabaCloudSearchService.NAME); | ||
| if (alibabaCloudSearchService.isPresent()) { | ||
| var alibabaCloudSearchTaskTypes = alibabaCloudSearchService.get().supportedTaskTypes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think eventually we'll want something like this but at the moment we don't support cross node streaming support so we'll definitely need to exclude the chat_completion task type.
| alibabaCloudSearchConfigs.put(taskType, defaultStrategy); | ||
| } | ||
| } | ||
| serviceNodeLocalRateLimitConfigs.put(AlibabaCloudSearchService.NAME, alibabaCloudSearchConfigs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this will ever happen but If the individual service is not present (isPresent() == false) do we still want to add the configs to the tree map?
| public static DeepSeekRequestManager.RateLimitGrouping of(DeepSeekChatCompletionModel model) { | ||
| Objects.requireNonNull(model); | ||
|
|
||
| return new DeepSeekRequestManager.RateLimitGrouping(model.apiKey().hashCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct - there is effectively no rate limiting for DeepSeek
The goal of this PR is to address the rate-limiting follow-up TODOs introduced by this PR and tracked by this issue in order to support service and task type aware rate-limiting.