-
Couldn't load subscription status.
- Fork 25.6k
[ML] Delay copying chunked input strings #125837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
The chunked text is only required when the actual inference request is made, using a string supplier means the string creation can be done much much closer to where the request is made reducing the lifespan of the copied string. (cherry picked from commit c521264) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
The chunked text is only required when the actual inference request is made, using a string supplier means the string creation can be done much much closer to where the request is made reducing the lifespan of the copied string. (cherry picked from commit c521264) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java
This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.
This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.
This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.
This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.
This commit restores the behaviour introduced in #125837 which was inadvertently undone by changes in #121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster.
This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster. (cherry picked from commit f3447d3) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java
This commit restores the behaviour introduced in elastic#125837 which was inadvertently undone by changes in elastic#121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster. (cherry picked from commit f3447d3) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ai21/Ai21Service.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaService.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreator.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralService.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/SenderServiceTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchCompletionRequestManagerTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/action/AlibabaCloudSearchActionCreatorTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/action/AzureAiStudioActionAndCreatorTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/llama/action/LlamaActionCreatorTests.java
This commit restores the behaviour introduced in #125837 which was inadvertently undone by changes in #121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster. (cherry picked from commit f3447d3)
This commit restores the behaviour introduced in #125837 which was inadvertently undone by changes in #121041, specifically, delaying copying Strings as part of calling Request.chunkText() until the request is being executed. In addition to the above change, refactor doChunkedInfer() and its implementations to take a List<ChunkInferenceInput> rather than EmbeddingsInput, since the EmbeddingsInput passed into doChunkedInfer() was immediately discarded after extracting the ChunkInferenceInput list from it. This change allowed the EmbeddingsInput class to be refactored to not know about ChunkInferenceInput, simplifying it significantly. This commit also simplifies EmbeddingRequestChunker.Request to take only the input String rather than the entire list of all inputs, since only one input is actually needed. This change prevents Requests from retaining a reference to the input list, potentially allowing it to be GC'd faster. (cherry picked from commit f3447d3)
The chunker stores the position of each chunk's text in the original text rather than making copies. However, when an inference request is made the actual chunk text is required, at this point a copy must be made. The copying is done when
String#subString()is called.The PR reduces the lifetime of the string copies but returning a string Supplier in from the chunker and performing the copy closer to where the request will be made. See
RequestExecutorServiceelasticsearch/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/http/sender/RequestExecutorService.java
Line 466 in a40370a
As a follow up once #125567 is merged
EmbeddingInputswill be moved toRequest#createHttpRequest()so that the string copy will be made at the point the http request is constructed further reducing the lifespan of the copy.I had to change the logic around
InferenceInput#inputSize()to avoid calling the supplier function early of find out if there was more than 1 input.