Skip to content

Conversation

@davidkyle
Copy link
Member

@davidkyle davidkyle commented Mar 28, 2025

The chunker stores the position of each chunk's text in the original text rather than making copies. However, when an inference request is made the actual chunk text is required, at this point a copy must be made. The copying is done when String#subString() is called.

The PR reduces the lifetime of the string copies but returning a string Supplier in from the chunker and performing the copy closer to where the request will be made. See RequestExecutorService

.execute(task.getInferenceInputs(), requestSender, task.getRequestCompletedFunction(), task.getListener());

As a follow up once #125567 is merged EmbeddingInputs will be moved to Request#createHttpRequest() so that the string copy will be made at the point the http request is constructed further reducing the lifespan of the copy.

I had to change the logic around InferenceInput#inputSize() to avoid calling the supplier function early of find out if there was more than 1 input.

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Mar 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle davidkyle added the auto-backport Automatically create backport pull requests when merged label Mar 31, 2025
@davidkyle davidkyle enabled auto-merge (squash) March 31, 2025 10:02
@davidkyle davidkyle merged commit c521264 into elastic:main Apr 1, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125837

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Apr 7, 2025
The chunked text is only required when the actual inference request is made,
using a string supplier means the string creation can be done much much closer
to where the request is made reducing the lifespan of the copied string.

(cherry picked from commit c521264)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java
@davidkyle
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Apr 7, 2025
The chunked text is only required when the actual inference request is made,
using a string supplier means the string creation can be done much much closer
to where the request is made reducing the lifespan of the copied string.

(cherry picked from commit c521264)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 22, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 23, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 23, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 23, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.
DonalEvans added a commit that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in #125837 which was
inadvertently undone by changes in #121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in elastic#125837 which was
inadvertently undone by changes in elastic#121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java
DonalEvans added a commit that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in #125837 which was
inadvertently undone by changes in #121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)
DonalEvans added a commit that referenced this pull request Sep 25, 2025
This commit restores the behaviour introduced in #125837 which was
inadvertently undone by changes in #121041, specifically, delaying
copying Strings as part of calling Request.chunkText() until the request
is being executed.

In addition to the above change, refactor doChunkedInfer() and its
implementations to take a List<ChunkInferenceInput> rather than
EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer()
was immediately discarded after extracting the ChunkInferenceInput list
from it. This change allowed the EmbeddingsInput class to be refactored
to not know about ChunkInferenceInput, simplifying it significantly.

This commit also simplifies EmbeddingRequestChunker.Request to take only
the input String rather than the entire list of all inputs, since only
one input is actually needed. This change prevents Requests from
retaining a reference to the input list, potentially allowing it to be
GC'd faster.

(cherry picked from commit f3447d3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending :ml Machine learning >refactoring Team:ML Meta label for the ML team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants