Skip to content

Conversation

@brendan-jugan-elastic
Copy link
Contributor

The goal of this PR is to address the rate-limiting follow-up TODOs introduced by this PR in order to support service and task type aware rate-limiting.

// TODO: add service() and taskType()
String service();

TaskType taskType();
Copy link
Contributor Author

@brendan-jugan-elastic brendan-jugan-elastic Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered possibly making these static, but then they can't be overridden and enforce that new sub-classes implement them, so I added them to this interface.

However, I was wondering if it makes sense to make these methods abstract methods of the BaseRequestManager instead of adding them to the RequestManager interface. I see in the RequestExecutorService that we expect instances of RequestManager, so I'm wondering in what situations the RequestExecutorService's methods will be passed instances of RequestManager and not BaseRequestManager's, or if we can restrict the type a bit more. Maybe @jonathan-buttner or @timgrein can help clear things up for me? I don't have a strong opinion either way, but more-so wanting to understand the implications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at all the request manager's recently but I suspect that their base class already has the service name and task type information. It should be stored in the model classes.

So I think what we can do is modify the RequestManager interface like you have to include those new methods. Then modify the BaseRequestManager constructor to take the service name string and the task type and implement the required methods you just added to the interface.

Then in the service abstract class access the generic model to retrieve the information and pass it along to BaseRequestManager.

For example here are the changes I'm thinking:

BaseRequestManager(ThreadPool threadPool, String inferenceEntityId, Object rateLimitGroup, RateLimitSettings rateLimitSettings, String serviceName, TaskType taskType) { ... }
    protected OpenAiRequestManager(ThreadPool threadPool, OpenAiModel model, CheckedSupplier<URI, URISyntaxException> uriBuilder) {
        super(
            threadPool,
            model.getInferenceEntityId(),
            RateLimitGrouping.of(model, uriBuilder),
            model.rateLimitServiceSettings().rateLimitSettings(),
            model.getConfigurations().getService(),
            model.getConfigurations().getTaskType()
        );
    }

Unfortunately we have to do that for every request manager 😭 Hopefully soon we can do some refactoring to reduce the footprint of changes like this in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see! Makes sense. Will refactor!

@Override
public TaskType taskType() {
return COMPLETION;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For classes like ElasticInferenceServiceUnifiedCompletionRequestManager, GoogleAiStudioCompletionRequestManager, and OpenAiCompletionRequestManager (all completions classes in general?) should we be using COMPLETION or CHAT_COMPLETION? The use of ChatCompletionInput in each of these classes point me to COMPLETION, but wanted to make sure this is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry our naming is confusing. I think my comment above should address this too though because we shouldn't need to add the method directly to these classes anymore. But in general when you see UnifiedChatInput that means chat_completion and then confusingly when you see ChatCompletionInput it means completion 😭

@brendan-jugan-elastic
Copy link
Contributor Author

I sourced TaskTypes from inference API docs, the use of ChatCompletioninput, and logging verbiage used in each RequestManager class.

@Override
public TaskType taskType() {
return TaskType.TEXT_EMBEDDING;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HuggingFaceRequestManager, I used TEXT_EMBEDDING as the TaskType since it is directly mentioned in the requestType within the HuggingFaceActionCreator and the only subclasses of HuggingFaceModel are HuggingFaceEmbeddingsModel and HuggingFaceElserModel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the model approach I mentioned addresses this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, should be good here. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants