Skip to content

Conversation

@timgrein
Copy link
Contributor

@timgrein timgrein commented Mar 4, 2025

This PR reads the X-elastic-product-use-case header containing the product use case EIS is called with (Assistants etc.). I had to use a workaround to propagate the information through the transport layer: I set the header explicitly in the ThreadContext as we would lose it otherwise when the InferenceActionProxy makes an internal call to InferenceAction or UnifiedCompletionAction (thread context gets stashed and then reconstructed losing most headers; as this is specific to the inference API/EIS we shouldn't add it to Task.HEADERS_TO_COPY - had this discussion with some ES devs).

I was hesitant to pass the InferenceContext through the base methods (doInfer, doUnifiedCompletionInfer etc.) as this would imply changes in all integrations making this PR even larger as it already is, especially considering what it does (passing one value). If we feel like that the product use case information is useful for all integrations (which it probably is) we can still follow-up on this initial change. For now I want to keep it isolated for EIS.

@timgrein timgrein changed the title [Draft] [Inference API] Read and propagate product use case http header [Draft] [Inference API] Read and propagate product use case http header to EIS Mar 4, 2025
@timgrein timgrein changed the title [Draft] [Inference API] Read and propagate product use case http header to EIS [Draft] [Inference API] Propagate product use case http header to EIS Mar 6, 2025
@timgrein timgrein changed the title [Draft] [Inference API] Propagate product use case http header to EIS [Inference API] Propagate product use case http header to EIS Mar 7, 2025
@timgrein timgrein marked this pull request as ready for review March 7, 2025 14:40
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 7, 2025
@prwhelan prwhelan added :ml Machine learning Team:ML Meta label for the ML team labels Mar 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @timgrein, I've created a changelog YAML for you.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left a couple comments.

this(in.readString());
}

public static InferenceContext empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we create a static instance that way we don't create multiple empty ones? Something like this:

public static final InferenceContext EMPTY_INSTANCE = new InferenceContext("");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static final TransportVersion INCLUDE_INDEX_MODE_IN_GET_DATA_STREAM = def(9_023_0_00);
public static final TransportVersion MAX_OPERATION_SIZE_REJECTIONS_ADDED = def(9_024_0_00);
public static final TransportVersion RETRY_ILM_ASYNC_ACTION_REQUIRE_ERROR = def(9_025_0_00);
public static final TransportVersion INFERENCE_CONTEXT = def(9_026_0_00);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder, if we do want to backport to 8.19 we'll need a TransportVersion for 8.x

for example: COHERE_BIT_EMBEDDING_TYPE_SUPPORT_ADDED_BACKPORT_8_X

We'll also need to change the onAfter() check. Here's an example:
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/cohere/embeddings/CohereEmbeddingType.java#L131-L132

The code in 8.x will look different too (since the 9.x transport version won't exist): https://github.com/elastic/elasticsearch/blob/8.x/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/cohere/embeddings/CohereEmbeddingType.java#L131

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation and the code examples.

Adjusted with Add TransportVersion for 8_X.

In the backport I would then need to replace TransportVersions.INFERENCE_CONTEXT with TransportVersions.INFERENCE_CONTEXT_8_X, right?


var context = request.getContext();
if (Objects.nonNull(context)) {
threadPool.getThreadContext().putHeader(InferencePlugin.X_ELASTIC_PRODUCT_USE_CASE_HTTP_HEADER, context.productUseCase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, another option would be to pass this through the infer calls. That'll probably change a ton of files though 🤔 . It does seem a bit strange to parse it out of the header and then put it back in a header when we already have it hmm.

Drilling through a bunch of function calls isn't great either though.

@prwhelan @dan-rubinstein @davidkyle what do you think?

I wonder if we should create a context/components object that is like a catch all for these types of changes. That way in the future we just add it to that class's definition and we don't have to drill it through a ton of places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that, though I vote we do that in a separate change since this one is already quite large

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, another option would be to pass this through the infer calls. That'll probably change a ton of files though 🤔 .

Yeah, that was basically my reasoning I've put in the PR comment "I was hesitant to pass the InferenceContext through the base methods (doInfer, doUnifiedCompletionInfer etc.) as this would imply changes in all integrations making this PR even larger as it already is, especially considering what it does (passing one value)". But nevertheless I also think it's cleaner to pass it through the methods, as it's then obvious from the signature that there's a context object.

I'd prefer that, though I vote we do that in a separate change since this one is already quite large

I would also prefer this and keep it as is for now 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer this and keep it as is for now 👍

Sounds good Tim!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually could you add a TODO above the line to as a reminder for us to move it to being passed through the various method calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// We always get the first value as the header doesn't allow multiple values
return productUseCaseHeaders.getFirst();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do backport this, it's going to complain about getFirst not being a call in 8.19. Might be worth just leave it as get(0) to avoid all that lol (I've run into it many times from experience).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 🎣 I've also ran into this in the past, adjusted with Use .get(0) instead of getFirst to avoid compilation errors in backport.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! If you could add a TODO for passing the inference context around that'd be great!


var context = request.getContext();
if (Objects.nonNull(context)) {
threadPool.getThreadContext().putHeader(InferencePlugin.X_ELASTIC_PRODUCT_USE_CASE_HTTP_HEADER, context.productUseCase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually could you add a TODO above the line to as a reminder for us to move it to being passed through the various method calls?

@timgrein timgrein merged commit 0b83425 into elastic:main Mar 12, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124025

@timgrein
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants