Skip to content

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Aug 7, 2025

This PR adds some logging to the EIS authorization call to help debugging authorization failures. While I was doing that I found a bug and made some improvements. The bug was that we didn't use the "merged" authorization object for the revoking but we use it for determining whether the service supports streaming and which task types.

After talking with the team we decided we don't need to merge the previous authorization object with the newly retrieved one. That way when we get a successful response it is always the source of truth. This also means we don't need a node reboot to perform revocation.

I'm backporting this to when we added the periodic auth call in this PR: #123639

New logs

Here's an example of what the logs will look like

[2025-08-07T15:51:45,535][DEBUG][o.e.x.i.s.e.a.ElasticInferenceServiceAuthorizationRequestHandler] [runTask-0] Received authorization information from gateway {modelName='rainbow-sprinkles', taskTypes='[chat_completion]'}
[2025-08-07T15:51:45,536][DEBUG][o.e.x.i.s.e.a.ElasticInferenceServiceAuthorizationHandler] [runTask-0] Received authorization response, {taskTypeToModels={chat_completion=[rainbow-sprinkles]}, authorizedTaskTypes=[chat_completion], authorizedModelIds=[rainbow-sprinkles]}
[2025-08-07T15:51:45,536][DEBUG][o.e.x.i.s.e.a.ElasticInferenceServiceAuthorizationHandler] [runTask-0] Authorization entity limited to service task types, {taskTypeToModels={chat_completion=[rainbow-sprinkles]}, authorizedTaskTypes=[chat_completion], authorizedModelIds=[rainbow-sprinkles]}
[2025-08-07T15:51:45,536][DEBUG][o.e.x.i.s.e.a.ElasticInferenceServiceAuthorizationHandler] [runTask-0] Synchronizing default inference endpoints, attempting to remove ids: [.elser-2-elastic, .multilingual-embed-v1-elastic, .rerank-v1-elastic]
[2025-08-07T15:51:45,543][DEBUG][o.e.x.i.s.e.a.ElasticInferenceServiceAuthorizationHandler] [runTask-0] Successfully revoked access to default inference endpoint IDs: [elser_model_2, rerank-v1, multilingual-embed-v1]

Testing

Modify the acl that EIS returns in this file to change which models are authorized:

eis_dir/acl/acl.yaml

Execute EIS locally:

make TLS_VERIFY_CLIENT_CERTS=false run

Run ES pointing to EIS

run-es -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none -Dtests.es.logger.org.elasticsearch.xpack.inference.services.elastic.authorization.ElasticInferenceServiceAuthorizationRequestHandler=DEBUG -Dtests.es.logger.org.elasticsearch.xpack.inference.services.elastic.authorization.ElasticInferenceServiceAuthorizationHandler=DEBUG

Change how often ES polls for authorization

PUT _cluster/settings
{
  "persistent": {
    "xpack.inference.elastic.authorization_request_interval": "10s",
    "xpack.inference.elastic.max_authorization_request_jitter": "2s",
    "logger.org.elasticsearch.xpack.inference.services.elastic.authorization.ElasticInferenceServiceAuthorizationHandler": "DEBUG"
  }
}

Then you can stop EIS, modify the ACL file and restart it and ES will pickup the change.

Then use GET _inference/_all to retrieve the authorized default endpoints to ensure they match what EIS is returning.

@jonathan-buttner jonathan-buttner added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v9.2.0 v8.19.2 v9.1.2 v8.18.6 v9.0.6 labels Aug 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jonathan-buttner, I've created a changelog YAML for you.

logger.debug(() -> Strings.format("Authorization entity limited to service task types, %s", authorizedTaskTypesAndModels));

// recalculate which default config ids and models are authorized now
var authorizedDefaultModelIds = getAuthorizedDefaultModelIds(auth);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is here and the line below where we reference auth instead of authorizedTaskTypesAndModels. In the fixed version we're using the auth.newLimitedToTaskTypes response instead.

ElasticInferenceServiceAuthorizationModel.of(
new ElasticInferenceServiceAuthorizationResponseEntity(
List.of(
new ElasticInferenceServiceAuthorizationResponseEntity.AuthorizedModel(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the first response was merged with the second auth response. Now the latest successful auth response dictates the auth so we need to repeat the model again in the second response for this test.

listener.onResponse(ElasticInferenceServiceAuthorizationModel.newDisabledService());

logger.warn(errorMessage);
listener.onFailure(new ElasticsearchException(errorMessage));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now returning an error when a failure occurs. The caller will ignore the error and it will not affect the authorization. This way we can differentiate between a failure authorization and a successful one. Whenever onResponse is called we'll use that response object as the source of truth for authorization.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review August 11, 2025 15:44
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

logger.debug("Received authorization response");
var authorizedTaskTypesAndModels = authorizedContent.get().taskTypesAndModels.merge(auth)
.newLimitedToTaskTypes(EnumSet.copyOf(implementedTaskTypes));
logger.debug(() -> Strings.format("Received authorization response, %s", auth));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, loggers have a formatter, I believe, something like:

logger.debug("Received authorization response, {}", auth);


@Override
public String toString() {
return String.join(", ", authorizedModels.stream().map(AuthorizedModel::toString).toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return String.join(", ", authorizedModels.stream().map(AuthorizedModel::toString).toList());
return authorizedModels.stream().map(AuthorizedModel::toString).collect(Collectors.joining(", "));

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.1
8.18
9.0

jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Aug 11, 2025
* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Aug 11, 2025
* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Aug 11, 2025
* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
@jonathan-buttner jonathan-buttner deleted the inference-log-eis-auth branch August 11, 2025 20:38
elasticsearchmachine pushed a commit that referenced this pull request Aug 11, 2025
…2690)

* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Aug 11, 2025
…2691)

* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Aug 13, 2025
#132693)

* [ML] Improve EIS auth call logs and fix revocation bug (#132546)

* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>

* Fixing mock registry

---------

Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Aug 13, 2025
… (#132692)

* [ML] Improve EIS auth call logs and fix revocation bug (#132546)

* Fixing revoking and adding logs

* Fixing tests

* Update docs/changelog/132546.yaml

* [CI] Auto commit changes from spotless

* Addressing feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>

* Fixing mock registry

---------

Co-authored-by: elasticsearchmachine <[email protected]>
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 15, 2025
* upstream/8.19: (62 commits)
  Use consistent terminology for transport version resources/references (elastic#132882) (elastic#132898)
  Move inner records out of TransportVersionUtils (elastic#132872) (elastic#132886)
  Forward port release notes for v8.18.5 (elastic#132758)
  Add more transport version files validation (elastic#132373) (elastic#132777)
  Forward port release notes for v8.17.10 (elastic#132760)
  Refactor TransportVersion loading to support external consumers (elastic#132694) (elastic#132862)
  manual backporting| (elastic#132783)
  9.1 docs backports for 8.19 features (elastic#132605)
  Migrate x-pack-deprecation REST tests (elastic#131444) (elastic#132802)
  Update 8.19.1.asciidoc (elastic#132755)
  Update wolfi (versioned) (elastic#132752)
  Update 8.19.0.asciidoc (elastic#132754)
  Prune changelogs after 8.19.2 release
  Bump versions after 8.19.2 release
  Finalize release notes for v8.19.2
  Bump versions after 8.17.10 release
  Prune changelogs after 8.18.5 release
  Bump versions after 8.18.5 release
  Add release notes for v8.19.2 release (elastic#132696)
  [ML] Improve EIS auth call logs and fix revocation bug (elastic#132546) (elastic#132690)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v8.18.6 v8.19.2 v9.0.6 v9.1.2 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants