Skip to content

Conversation

DonalEvans
Copy link
Contributor

This commit restores the behaviour introduced in #125837 which was inadvertently undone by changes in #121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed.

In addition to the above change, refactor doChunkedInfer() and its implementations to take a List rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.

@DonalEvans DonalEvans added >refactoring :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v9.2.0 v8.19.5 v9.1.5 labels Sep 23, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.
@DonalEvans DonalEvans force-pushed the restore-delayed-string-copying branch from 311cbb7 to a6609a3 Compare September 23, 2025 15:44
@DonalEvans DonalEvans marked this pull request as ready for review September 24, 2025 16:32
@DonalEvans DonalEvans requested review from davidkyle and jonathan-buttner and removed request for davidkyle September 24, 2025 16:32
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, I left a few suggestions

return getInputs().stream().map(ChunkInferenceInput::input).collect(Collectors.toList());
public List<String> getInputs() {
// The supplier should only be invoked once
assert inputListSupplier != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert is removed in production so if this is called twice I believe it'll produce a null pointer. What if we wrapped the call in an atomic boolean and did a compareAndSet and if it has already been called we through an exception explaining why that's not allowed?

I assume we don't want to cache the result of inputListSupplier.get() in memory within this class right? If we were ok with that then we could do something with an atomic reference and locking to ensure it only gets set once and just returned after that.

This probably overkill but I think we only care about the situation when the class is initialized with a supplier. If the constructor were passed a List<String>, I don't think we need to guard against that supplier we create being called multiple times. If we did want to allow multiple calls in that scenario we could use a custom internal class that wraps the List<String> and allows multiple calls. For the supplier scenario we'd have a different internal class that does the atomic boolean check and throws.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@DonalEvans DonalEvans merged commit f3447d3 into elastic:main Sep 25, 2025
34 checks passed
@DonalEvans DonalEvans deleted the restore-delayed-string-copying branch September 25, 2025 17:28
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 135242

DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java
@DonalEvans
Copy link
Contributor Author

💔 Some backports could not be created

Status Branch Result
9.1
8.19 Conflict resolution was aborted by the user

Manual backport

To create the backport manually run:

backport --pr 135242

Questions ?

Please refer to the Backport tool documentation

DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java
@DonalEvans
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.1
8.19

Questions ?

Please refer to the Backport tool documentation

DonalEvans added a commit that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in #125837 which was
inadvertently undone by changes in #121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)
DonalEvans added a commit that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in #125837 which was
inadvertently undone by changes in #121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :ml Machine learning >refactoring Team:ML Meta label for the ML team v8.19.5 v9.1.5 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants