[ML] Delay copying chunked input strings #125837

davidkyle · 2025-03-28T12:30:35Z

The chunker stores the position of each chunk's text in the original text rather than making copies. However, when an inference request is made the actual chunk text is required, at this point a copy must be made. The copying is done when String#subString() is called.

The PR reduces the lifetime of the string copies but returning a string Supplier in from the chunker and performing the copy closer to where the request will be made. See RequestExecutorService

elasticsearch/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/http/sender/RequestExecutorService.java

Line 466 in a40370a

    
           .execute(task.getInferenceInputs(), requestSender, task.getRequestCompletedFunction(), task.getListener());

As a follow up once #125567 is merged EmbeddingInputs will be moved to Request#createHttpRequest() so that the string copy will be made at the point the http request is constructed further reducing the lifespan of the copy.

I had to change the logic around InferenceInput#inputSize() to avoid calling the supplier function early of find out if there was more than 1 input.

elasticsearchmachine · 2025-03-28T12:30:59Z

Pinging @elastic/ml-core (Team:ML)

jan-elastic

LGTM

elasticsearchmachine · 2025-04-01T13:55:14Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125837

The chunked text is only required when the actual inference request is made, using a string supplier means the string creation can be done much much closer to where the request is made reducing the lifespan of the copied string. (cherry picked from commit c521264) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java

davidkyle · 2025-04-07T12:42:21Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

The chunked text is only required when the actual inference request is made, using a string supplier means the string creation can be done much much closer to where the request is made reducing the lifespan of the copied string. (cherry picked from commit c521264) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java

This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.

This commit restores the behaviour introduced in #125837 which was inadvertently undone by changes in #121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.

This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster. (cherry picked from commit f3447d3) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java

This commit restores the behaviour introduced in #125837 which was inadvertently undone by changes in #121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster. (cherry picked from commit f3447d3)

davidkyle added 2 commits March 28, 2025 10:16

Copy inputs later

8473285

Avoid calling suppiler for size

afdba53

davidkyle added >refactoring :ml Machine learning v8.19.0 v9.1.0 labels Mar 28, 2025

elasticsearchmachine added the Team:ML Meta label for the ML team label Mar 28, 2025

prwhelan approved these changes Mar 28, 2025

View reviewed changes

Merge branch 'main' into late-requests

9fae2e2

jan-elastic approved these changes Mar 31, 2025

View reviewed changes

davidkyle added the auto-backport Automatically create backport pull requests when merged label Mar 31, 2025

davidkyle enabled auto-merge (squash) March 31, 2025 10:02

Merge branch 'main' into late-requests

d668f0f

davidkyle merged commit c521264 into elastic:main Apr 1, 2025
17 checks passed

elasticsearchmachine added the backport pending label Apr 1, 2025

davidkyle mentioned this pull request Apr 7, 2025

[8.x] [ML] Delay copying chunked input strings (#125837) #126402

Merged

DonalEvans mentioned this pull request Sep 23, 2025

[ML] Restore delayed string copying #135242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ML] Delay copying chunked input strings #125837

[ML] Delay copying chunked input strings #125837

Uh oh!

davidkyle commented Mar 28, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 28, 2025

Uh oh!

jan-elastic left a comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 1, 2025

Uh oh!

davidkyle commented Apr 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[ML] Delay copying chunked input strings #125837

[ML] Delay copying chunked input strings #125837

Uh oh!

Conversation

davidkyle commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 28, 2025

Uh oh!

jan-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 1, 2025

💔 Backport failed

Uh oh!

davidkyle commented Apr 7, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davidkyle commented Mar 28, 2025 •

edited

Loading