-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML] Transition EIS auth polling to persistent task on a single node #136713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Transition EIS auth polling to persistent task on a single node #136713
Conversation
|
Hi @jonathan-buttner, I've created a changelog YAML for you. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use the master for admin tasks that don't actually need to run on the master. If you need a task to run approximately once in the cluster, use a persistent task instead.
b2cd14a to
10ad8f2
Compare
…sticsearch into ml-eis-auth-polling
…sticsearch into ml-eis-auth-polling
…sticsearch into ml-eis-auth-polling
|
Pinging @elastic/ml-core (Team:ML) |
I chatted with Dave offline and changed the implementation based on his feedback. Dave advised to not force the polling logic to occur on the master node and to do it within a persistent task which I've addressed.
| */ | ||
| public class AuthorizationTaskExecutorMultipleNodesIT extends ESIntegTestCase { | ||
|
|
||
| private static final String AUTH_TASK_ACTION = AuthorizationPoller.TASK_NAME + "[c]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var serviceToInferenceIds = new HashMap<String, Set<String>>(); | ||
| for (var entry : modelMap.entrySet()) { | ||
| var settings = entry.getValue(); | ||
| var serviceName = settings.service(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
settings.service() can return null, is that a problem? it looks like the map would not throw an error, so idk when this would happen or be a problem 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure why we allow service to be null 🤔 I added a test for it. If it works correctly, it should just bucket the null ones together.
Test case
public void testGetServiceInferenceIds_AcceptsNullKeys() {
var serviceA = "service_a";
var endpointId1 = "endpointId1";
var endpointId2 = "endpointId2";
var nullEndpoint1 = "nullEndpoint1";
var nullEndpoint2 = "nullEndpoint2";
var settings1 = MinimalServiceSettings.chatCompletion(serviceA);
var settings2 = MinimalServiceSettings.sparseEmbedding(serviceA);
// I'm not sure why minimal service settings would have a null service name, but testing it anyway
var nullServiceNameSettings1 = MinimalServiceSettings.sparseEmbedding(null);
var nullServiceNameSettings2 = MinimalServiceSettings.sparseEmbedding(null);
var models = Map.of(
endpointId1,
settings1,
endpointId2,
settings2,
nullEndpoint1,
nullServiceNameSettings1,
nullEndpoint2,
nullServiceNameSettings2
);
var metadata = new ModelRegistryMetadata(ImmutableOpenMap.builder(models).build());
var serviceEndpoints = metadata.getServiceInferenceIds(serviceA);
assertThat(serviceEndpoints, is(Set.of(endpointId1, endpointId2)));
assertThat(metadata.getServiceInferenceIds(null), is(Set.of(nullEndpoint1, nullEndpoint2)));
| if (lastAuthTask.get() != null) { | ||
| lastAuthTask.get().cancel(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it matters since scheduleAndSendAuthorizationRequest checks the shutdown status, but in theory one thread could set a different ScheduledCancellable in between 145 and 146
| if (lastAuthTask.get() != null) { | |
| lastAuthTask.get().cancel(); | |
| } | |
| var authTask = lastAuthTask.get(); | |
| if (authTask != null) { | |
| authTask.cancel(); | |
| } |
...va/org/elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationPoller.java
Show resolved
Hide resolved
...va/org/elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationPoller.java
Show resolved
Hide resolved
.../elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationTaskExecutor.java
Show resolved
Hide resolved
…lastic#136713) * Creating new cluster state listener to kick off polling logic * Update docs/changelog/136713.yaml * [CI] Auto commit changes from spotless * Starting persistent tasks * Switching to a persistent task, need to create the action though * Adding master action * Successful task creation * Starting tests * More tests * Even more tests * [CI] Auto commit changes from spotless * Starting integration tests * Adding test stub * [CI] Auto commit changes from spotless * Adding integration test * Fixing relocation test * [CI] Auto commit changes from spotless * working test * Some clean up * Removing unneeded tests * [CI] Auto commit changes from spotless * Refactoring tests * updating transport version * [CI] Auto commit changes from spotless * Fixing transport version * Fixing check for preconfigured endpoints * [CI] Auto commit changes from spotless * Fixing tests * Fixing text embedding test * Addressing feedback * Marking task as failed * Fixing flaky test --------- Co-authored-by: elasticsearchmachine <[email protected]>
This PR is based on: #136569Already mergedThis PR moves the EIS authorization polling logic to a persistent task on a single node.
Notable changes:
Testing
Start EIS
Start ES pointing at EIS
Retrieve all the endpoints from the inference API should return some EIS endpoints now
A task should be present in the list
eis-authorization-poller[c]