Skip to content

Conversation

@jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Sep 9, 2025

This PR modifies the _inference/_services API to make a call directly to EIS to get the authorization information to determine the configuration information to return in the API call.

The authorization response will dictate whether we show EIS in the response and the task types that are included for the EIS configuration.

Follow up PRs will modify how authorization works in general but I'm going to try to do it in small chunks.

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Sep 9, 2025
components.add(httpClientManager);
components.add(inferenceStatsBinding);
components.add(authorizationHandler);
components.add(new PluginComponentBinding<>(Sender.class, elasicInferenceServiceFactory.get().createSender()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this I get a binding error when running

);

return new InferenceServiceConfiguration.Builder().setService(NAME)
.setName(NAME)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't really necessary but I was running into a test failure because this field didn't exist until I switch the sorting field. Figured I'd leave the fix in though.

@Override
public Set<TaskType> supportedStreamingTasks() {
return authorizationHandler.supportedStreamingTasks();
return EnumSet.of(TaskType.CHAT_COMPLETION);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously if the cluster wasn't authorized for chat completion we'd return a vague error about not being able to stream. With this change we'll allow the request to get sent to EIS and if it isn't authorized, EIS will return a failure.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review September 10, 2025 15:32
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Comment on lines 464 to 466
// This shouldn't be called because the configuration changes based on the authorization
// Instead, retrieve the authorization directly from the EIS gateway and use the static method
// ElasticInferenceService.Configuration#createConfiguration() to create a configuration based on the authorization response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a javadoc comment instead, so that the method you're referencing stays correct even if it's modified?

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Some minor comments

super.setUp();
// Ensure the mock EIS server has an authorized response ready before each test because each test will
// use the services API which makes a call to EIS
mockEISServer.enqueueAuthorizeAllModelsResponse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init() method below also calls mockEISServer.enqueueAuthorizeAllModelsResponse(). Is it equivalent to change the annotation on that method to @Before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the original issue for using @BeforeClass is because I ran into some weird issues locally. The original PR that added the @BeforeClass is here: #128640

For some background, when the node for the test starts up it will reach out to EIS and get the auth response. If that fails (there isn't a response queued in the mock server), then the tests will fail. What I was observing is that the base classes static logic would only be executed once regardless of how many subclasses used the base. This resulted in the first test class succeeding but the second test class that leveraged the base would fail. To get around this I added the @BeforeClass and it seemed to fix the issue. The reason we need this in @BeforeClass is because we need a response queued before Elasticsearch is started. Elasticsearch is started only once at the beginning before all the tests run.

}
);

getServiceConfigurationsForServices(availableServices, mergeEisConfigListener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getServiceConfigurationsForServices() is a synchronous method and could be written with a return type instead of a listener. I think that would make this code easier to read as you wouldn't need to define the merge listener

}).<List<InferenceServiceConfiguration>>andThen((configurationListener, authorizationModel) -> {

                    var serviceConfigs = getServiceConfigurationsForServices(availableServices);

                    if (authorizationModel.isAuthorized() == false) {
                        delegate.onResponse(serviceConfigs);
                        return;
                    }


                    if (requestedTaskType != null && authorizationModel.getAuthorizedTaskTypes().contains(requestedTaskType) == false) {
                        delegate.onResponse(serviceConfigs);
                        return;
                    }

                    var config = ElasticInferenceService.createConfiguration(authorizationModel.getAuthorizedTaskTypes());
                    serviceConfigs.add(config);
                    serviceConfigs.sort(Comparator.comparing(InferenceServiceConfiguration::getService));
                    delegate.onResponse(serviceConfigs);
                }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ thank you, for some reason I thought it needed use a listener.

.stream()
.filter(
service -> service.getValue().hideFromConfigurationApi() == false
// exclude EIS here because the hideFromConfigurationApi() is not supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was slightly confused about "hideFromConfigurationApi() is not supported" in this comment

Suggested change
// exclude EIS here because the hideFromConfigurationApi() is not supported
// Exclude EIS as the EIS specific configurations are handled separately

configurationMap.put(
MODEL_ID,
new SettingsConfiguration.Builder(
EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.CHAT_COMPLETION, TaskType.RERANK, TaskType.TEXT_EMBEDDING)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this the intersection of enabledTaskTypes and the full set?

EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.CHAT_COMPLETION, TaskType.RERANK, TaskType.TEXT_EMBEDDING).retainAll(enabledTaskTypes);
``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of task types here tells the UI for which task types this field should be configurable. That should stay the same regardless of whether the user is authorized for a specific task type. There's a top level field for task types that indicate which ones are authorized and that's set here:

        return new InferenceServiceConfiguration.Builder().setService(NAME)
            .setName(SERVICE_NAME)
            .setTaskTypes(enabledTaskTypes) <--------
            .setConfigurations(configurationMap)
            .build();

@jonathan-buttner jonathan-buttner enabled auto-merge (squash) September 18, 2025 20:14
@jonathan-buttner jonathan-buttner merged commit aae1ffc into elastic:main Sep 18, 2025
34 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-eis-services-api-direct branch September 19, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants