[Inference API] Support service and task type aware rate-limiting #122333

brendan-jugan-elastic · 2025-02-12T04:57:38Z

The goal of this PR is to address the rate-limiting follow-up TODOs introduced by this PR in order to support service and task type aware rate-limiting.

brendan-jugan-elastic · 2025-02-12T04:59:08Z

...nce/src/main/java/org/elasticsearch/xpack/inference/external/http/sender/RequestManager.java

-    // TODO: add service() and taskType()
+    String service();
+
+    TaskType taskType();


I considered possibly making these static, but then they can't be overridden and enforce that new sub-classes implement them, so I added them to this interface.

However, I was wondering if it makes sense to make these methods abstract methods of the BaseRequestManager instead of adding them to the RequestManager interface. I see in the RequestExecutorService that we expect instances of RequestManager, so I'm wondering in what situations the RequestExecutorService's methods will be passed instances of RequestManager and not BaseRequestManager's, or if we can restrict the type a bit more. Maybe @jonathan-buttner or @timgrein can help clear things up for me? I don't have a strong opinion either way, but more-so wanting to understand the implications.

I haven't looked at all the request manager's recently but I suspect that their base class already has the service name and task type information. It should be stored in the model classes.

So I think what we can do is modify the RequestManager interface like you have to include those new methods. Then modify the BaseRequestManager constructor to take the service name string and the task type and implement the required methods you just added to the interface.

Then in the service abstract class access the generic model to retrieve the information and pass it along to BaseRequestManager.

For example here are the changes I'm thinking:

BaseRequestManager(ThreadPool threadPool, String inferenceEntityId, Object rateLimitGroup, RateLimitSettings rateLimitSettings, String serviceName, TaskType taskType) { ... }

protected OpenAiRequestManager(ThreadPool threadPool, OpenAiModel model, CheckedSupplier<URI, URISyntaxException> uriBuilder) { super( threadPool, model.getInferenceEntityId(), RateLimitGrouping.of(model, uriBuilder), model.rateLimitServiceSettings().rateLimitSettings(), model.getConfigurations().getService(), model.getConfigurations().getTaskType() ); }

Unfortunately we have to do that for every request manager 😭 Hopefully soon we can do some refactoring to reduce the footprint of changes like this in the future.

Ahh, I see! Makes sense. Will refactor!

brendan-jugan-elastic · 2025-02-12T05:00:50Z

...asticsearch/xpack/inference/external/http/sender/GoogleAiStudioCompletionRequestManager.java

+    @Override
+    public TaskType taskType() {
+        return COMPLETION;
+    }


For classes like ElasticInferenceServiceUnifiedCompletionRequestManager, GoogleAiStudioCompletionRequestManager, and OpenAiCompletionRequestManager (all completions classes in general?) should we be using COMPLETION or CHAT_COMPLETION? The use of ChatCompletionInput in each of these classes point me to COMPLETION, but wanted to make sure this is correct.

Yeah sorry our naming is confusing. I think my comment above should address this too though because we shouldn't need to add the method directly to these classes anymore. But in general when you see UnifiedChatInput that means chat_completion and then confusingly when you see ChatCompletionInput it means completion 😭

brendan-jugan-elastic · 2025-02-12T05:01:43Z

I sourced TaskTypes from inference API docs, the use of ChatCompletioninput, and logging verbiage used in each RequestManager class.

brendan-jugan-elastic · 2025-02-12T05:02:19Z

...n/java/org/elasticsearch/xpack/inference/external/http/sender/HuggingFaceRequestManager.java

+    @Override
+    public TaskType taskType() {
+        return TaskType.TEXT_EMBEDDING;
+    }


For HuggingFaceRequestManager, I used TEXT_EMBEDDING as the TaskType since it is directly mentioned in the requestType within the HuggingFaceActionCreator and the only subclasses of HuggingFaceModel are HuggingFaceEmbeddingsModel and HuggingFaceElserModel.

Let me know if the model approach I mentioned addresses this.

Yep, should be good here. Thanks!

brendan-jugan-elastic requested review from jonathan-buttner and timgrein February 12, 2025 04:57

elasticsearchmachine added the v9.1.0 label Feb 12, 2025

brendan-jugan-elastic commented Feb 12, 2025

View reviewed changes

brendan-jugan-elastic closed this Mar 24, 2025

brendan-jugan-elastic force-pushed the dev/rate-limiting-follow-ups branch from bca4565 to 162763b Compare March 24, 2025 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inference API] Support service and task type aware rate-limiting #122333

[Inference API] Support service and task type aware rate-limiting #122333

Uh oh!

brendan-jugan-elastic commented Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025 •

edited

Loading

Uh oh!

jonathan-buttner Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Uh oh!

jonathan-buttner Feb 12, 2025

Uh oh!

brendan-jugan-elastic commented Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Uh oh!

jonathan-buttner Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Inference API] Support service and task type aware rate-limiting #122333

[Inference API] Support service and task type aware rate-limiting #122333

Uh oh!

Conversation

brendan-jugan-elastic commented Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

brendan-jugan-elastic commented Feb 12, 2025

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

brendan-jugan-elastic Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brendan-jugan-elastic Feb 12, 2025 •

edited

Loading