[ML] Transition EIS auth polling to persistent task on a single node #136713

jonathan-buttner · 2025-10-16T18:32:51Z

~~This PR is based on: #136569~~ Already merged

This PR moves the EIS authorization polling logic to a persistent task on a single node.

Notable changes:

It removes the polling logic from occurring on each node
A cluster state listener is registered which checks to see if the task exists and if it doesn't, it creates the task
If a node running the task shuts down, the persistent task framework handles moving the task to a new node
If the EIS url is empty or null, the persistent task will not be created
If a cluster is no longer authorized to access certain preconfigured endpoints, the endpoints will remain instead of being removed
The polling logic compares the received authorized models with the preconfigured inference endpoints that are already stored in cluster state to determine if any are new. Only new preconfigured inference endpoints are stored
The polling logic uses a new action to send the new inference endpoints to the master node to be store. The master node must do this logic because it updates the cluster state

Testing

Start EIS

cd eis-gateway
make TLS_VERIFY_CLIENT_CERTS=false run

Start ES pointing at EIS

run-es -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none -Dtests.es.xpack.inference.elastic.authorization_request_interval="5s" -Dtests.es.xpack.inference.elastic.max_authorization_request_jitter="1s"

Retrieve all the endpoints from the inference API should return some EIS endpoints now

GET _inference/_all

A task should be present in the list eis-authorization-poller[c]

GET _tasks

elasticsearchmachine · 2025-10-16T18:33:33Z

Hi @jonathan-buttner, I've created a changelog YAML for you.

DaveCTurner

Please don't use the master for admin tasks that don't actually need to run on the master. If you need a task to run approximately once in the cluster, use a persistent task instead.

…sticsearch into ml-eis-auth-polling

…uth-polling

…sticsearch into ml-eis-auth-polling

…uth-polling

…sticsearch into ml-eis-auth-polling

elasticsearchmachine · 2025-11-04T18:30:08Z

Pinging @elastic/ml-core (Team:ML)

I chatted with Dave offline and changed the implementation based on his feedback. Dave advised to not force the polling logic to occur on the master node and to do it within a persistent task which I've addressed.

prwhelan · 2025-11-06T14:43:01Z

.../org/elasticsearch/xpack/inference/integration/AuthorizationTaskExecutorMultipleNodesIT.java

+ */
+public class AuthorizationTaskExecutorMultipleNodesIT extends ESIntegTestCase {
+
+    private static final String AUTH_TASK_ACTION = AuthorizationPoller.TASK_NAME + "[c]";


https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/persistent/PersistentTasksNodeService.java#L261

prwhelan · 2025-11-06T15:06:10Z

...nference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistryMetadata.java

+        var serviceToInferenceIds = new HashMap<String, Set<String>>();
+        for (var entry : modelMap.entrySet()) {
+            var settings = entry.getValue();
+            var serviceName = settings.service();


settings.service() can return null, is that a problem? it looks like the map would not throw an error, so idk when this would happen or be a problem 🤷

Yeah I'm not sure why we allow service to be null 🤔 I added a test for it. If it works correctly, it should just bucket the null ones together.

https://github.com/elastic/elasticsearch/pull/136713/files#diff-d4185ce634ada9ae507c764714a8049806cb0c4fdd576eedc467c46796256572R333-R360

Test case

public void testGetServiceInferenceIds_AcceptsNullKeys() { var serviceA = "service_a"; var endpointId1 = "endpointId1"; var endpointId2 = "endpointId2"; var nullEndpoint1 = "nullEndpoint1"; var nullEndpoint2 = "nullEndpoint2"; var settings1 = MinimalServiceSettings.chatCompletion(serviceA); var settings2 = MinimalServiceSettings.sparseEmbedding(serviceA); // I'm not sure why minimal service settings would have a null service name, but testing it anyway var nullServiceNameSettings1 = MinimalServiceSettings.sparseEmbedding(null); var nullServiceNameSettings2 = MinimalServiceSettings.sparseEmbedding(null); var models = Map.of( endpointId1, settings1, endpointId2, settings2, nullEndpoint1, nullServiceNameSettings1, nullEndpoint2, nullServiceNameSettings2 ); var metadata = new ModelRegistryMetadata(ImmutableOpenMap.builder(models).build()); var serviceEndpoints = metadata.getServiceInferenceIds(serviceA); assertThat(serviceEndpoints, is(Set.of(endpointId1, endpointId2))); assertThat(metadata.getServiceInferenceIds(null), is(Set.of(nullEndpoint1, nullEndpoint2)));

prwhelan · 2025-11-06T18:07:48Z

...va/org/elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationPoller.java

+        if (lastAuthTask.get() != null) {
+            lastAuthTask.get().cancel();
+        }


i don't think it matters since scheduleAndSendAuthorizationRequest checks the shutdown status, but in theory one thread could set a different ScheduledCancellable in between 145 and 146

Suggested change

if (lastAuthTask.get() != null) {

lastAuthTask.get().cancel();

}

var authTask = lastAuthTask.get();

if (authTask != null) {

authTask.cancel();

}

...va/org/elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationPoller.java

.../elasticsearch/xpack/inference/services/elastic/authorization/AuthorizationTaskExecutor.java

…uth-polling

…lastic#136713) * Creating new cluster state listener to kick off polling logic * Update docs/changelog/136713.yaml * [CI] Auto commit changes from spotless * Starting persistent tasks * Switching to a persistent task, need to create the action though * Adding master action * Successful task creation * Starting tests * More tests * Even more tests * [CI] Auto commit changes from spotless * Starting integration tests * Adding test stub * [CI] Auto commit changes from spotless * Adding integration test * Fixing relocation test * [CI] Auto commit changes from spotless * working test * Some clean up * Removing unneeded tests * [CI] Auto commit changes from spotless * Refactoring tests * updating transport version * [CI] Auto commit changes from spotless * Fixing transport version * Fixing check for preconfigured endpoints * [CI] Auto commit changes from spotless * Fixing tests * Fixing text embedding test * Addressing feedback * Marking task as failed * Fixing flaky test --------- Co-authored-by: elasticsearchmachine <[email protected]>

jonathan-buttner added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 16, 2025

DaveCTurner previously requested changes Oct 16, 2025

View reviewed changes

jonathan-buttner mentioned this pull request Oct 20, 2025

[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192

Closed

jonathan-buttner added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Oct 21, 2025

jonathan-buttner and others added 15 commits October 29, 2025 12:59

Creating new cluster state listener to kick off polling logic

c6fc960

Update docs/changelog/136713.yaml

ecfe885

[CI] Auto commit changes from spotless

b6928b6

Starting persistent tasks

59ce7d0

Switching to a persistent task, need to create the action though

273d551

Adding master action

458664e

Successful task creation

a7f0f91

Starting tests

2e3246c

More tests

9e5ed51

Even more tests

36deff5

[CI] Auto commit changes from spotless

83f2c65

Starting integration tests

d39d7ef

Adding test stub

02d4766

[CI] Auto commit changes from spotless

b41b1d4

Adding integration test

10ad8f2

jonathan-buttner force-pushed the ml-eis-auth-polling branch from b2cd14a to 10ad8f2 Compare October 29, 2025 17:09

jonathan-buttner and others added 4 commits October 29, 2025 17:12

Fixing relocation test

9762fc6

[CI] Auto commit changes from spotless

0b1551d

working test

6f2c27b

Merge branch 'ml-eis-auth-polling' of github.com:jonathan-buttner/ela…

9b764a6

…sticsearch into ml-eis-auth-polling

jonathan-buttner changed the title ~~[ML] Transition EIS auth polling to master node~~ [ML] Transition EIS auth polling to persistent task on a single node Oct 30, 2025

Some clean up

fd0b0cf

jonathan-buttner and others added 13 commits October 30, 2025 16:03

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

8c2a40a

…uth-polling

Fixing transport version

4b2b33f

Fixing check for preconfigured endpoints

4b7e6cf

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

78c2a5c

…uth-polling

[CI] Auto commit changes from spotless

909ef5e

Fixing tests

f16b912

Merge branch 'ml-eis-auth-polling' of github.com:jonathan-buttner/ela…

4d23e3f

…sticsearch into ml-eis-auth-polling

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

fb2ac58

…uth-polling

Fixing text embedding test

45d167d

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

ad63742

…uth-polling

Merge branch 'main' into ml-eis-auth-polling

79ce4b1

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

bbe6d8b

…uth-polling

Merge branch 'ml-eis-auth-polling' of github.com:jonathan-buttner/ela…

c3badf1

…sticsearch into ml-eis-auth-polling

jonathan-buttner marked this pull request as ready for review November 4, 2025 18:29

jonathan-buttner requested a review from jimczi November 4, 2025 18:30

prwhelan reviewed Nov 6, 2025

View reviewed changes

jonathan-buttner added 3 commits November 6, 2025 15:04

Addressing feedback

a0a07bc

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

b83d2dd

…uth-polling

Marking task as failed

f994811

prwhelan approved these changes Nov 7, 2025

View reviewed changes

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

0fe203a

…uth-polling

jonathan-buttner enabled auto-merge (squash) November 7, 2025 16:31

jonathan-buttner disabled auto-merge November 7, 2025 21:55

jonathan-buttner added 2 commits November 7, 2025 16:55

Fixing flaky test

415c23b

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

61dcee4

…uth-polling

jonathan-buttner merged commit 26d49b9 into elastic:main Nov 10, 2025
35 checks passed

jonathan-buttner deleted the ml-eis-auth-polling branch November 10, 2025 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Transition EIS auth polling to persistent task on a single node #136713

[ML] Transition EIS auth polling to persistent task on a single node #136713

Uh oh!

jonathan-buttner commented Oct 16, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Oct 16, 2025

Uh oh!

DaveCTurner left a comment

Uh oh!

elasticsearchmachine commented Nov 4, 2025

Uh oh!

prwhelan Nov 6, 2025

Uh oh!

prwhelan Nov 6, 2025

Uh oh!

jonathan-buttner Nov 6, 2025

Uh oh!

prwhelan Nov 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        if (lastAuthTask.get() != null) {
-            lastAuthTask.get().cancel();
-        }
+        var authTask = lastAuthTask.get();
+        if (authTask != null) {
+            authTask.cancel();
+        }

[ML] Transition EIS auth polling to persistent task on a single node #136713

[ML] Transition EIS auth polling to persistent task on a single node #136713

Uh oh!

Conversation

jonathan-buttner commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

elasticsearchmachine commented Oct 16, 2025

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Nov 4, 2025

Uh oh!

prwhelan Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

prwhelan Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

prwhelan Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonathan-buttner commented Oct 16, 2025 •

edited

Loading