Refactor EmbeddingRequestChunker #122818

jan-elastic · 2025-02-18T10:45:32Z

When preparing for/trying to implement a fix for OOM when performing inference on an extremely large document with a semantic_text field, I was annoyed by the amount of code duplication in EmbeddingRequestChunker (e.g. the switch/case by embedding type, different result fields per type, and different update/merge methods, ...). I've added generics to avoid all this duplication.

This saves almost 400 lines of code, while making the structure more clear (at least to me).

Furthermore, the naming of classes was pretty inconsistent/confusing, and there were some unclear interfaces. I've unified most of that. Here's the new naming scheme:

EmbeddingResults (new interface) raw results of the inference service with generics. Also contains EmbeddingResults.Embedding and EmbeddingResults.Chunk.
SparseEmbeddingResults (similar to before, but with ChunkedInferenceEmbeddingSparse merged into it; took this structure as inspiration for the other classes) implementation of EmbeddingResults for sparse embeddings
TextEmbeddingByteResults (previously InferenceTextEmbeddingByteResults, InferenceByteEmbedding and ChunkedInferenceEmbeddingByte) implementation of EmbeddingResults for dense byte embeddings
TextEmbeddingFloatResults (previously InferenceTextEmbeddingFloatResults and ChunkedInferenceEmbeddingFloat) implementation of EmbeddingResults for dense float embeddings
TextEmbeddingBitResults (previously InferenceTextEmbeddingBitResults) implementation of EmbeddingResults for dense bit embeddings. Note that chunking is now supported for this as well without any added code, even though it wasn't supported previously.
ChunkedInferenceEmbedding final result class that implements ChunkedInference. This is eventually produced by the EmbeddingRequestChunker

In the process, the EmbeddingInt interface (which just contains a getSize method) got removed too.

Apologies for the high number of tests that got updated, but that's a consequence of the ~15-fold code duplication for all inference providers.

Regarding the relations of the classes/interfaces:

class TextEmbedding{Bit,Byte,Float}Results is a interface TextEmbeddingResults
class SparseEmbeddingResults and interface TextEmbeddingResults is a interface EmbeddingResults
interface EmbeddingResults is a interface InferenceServiceResults

Furthermore:

class TextEmbedding{Bit,Byte,Float}Results and class SparseEmbeddingResults contain a lists of Embeddings
these Embeddings can be transformed to Chunks
these Chunks can be packed in a class ChunkedInferenceEmbedding, which is a interface ChunkedInference

jan-elastic · 2025-02-18T12:09:50Z

...e/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java

        assertThat(batches, hasSize(1));
-        assertEquals(batches.get(0).batch().inputs(), inputs);
-        var subBatches = batches.get(0).batch().subBatches();
+        EmbeddingRequestChunker.BatchRequest batch = batches.getFirst().batch();


Note: the only thing that has changed to the tests is asserting the requests, which are internal and not used outside of this class. (Which makes me wonder if this is a great test.)

github-actions · 2025-02-19T09:50:31Z

Warning

It looks like this PR modifies one or more .asciidoc files. These files are being migrated to Markdown, and any changes merged now will be lost. See the migration guide for details.

elasticsearchmachine · 2025-02-19T13:20:47Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner

Renames and switch case clean up look good! The addToBatches changes I'm not sure about, I'm not as familiar with this part of the code but I'll take another look.

jonathan-buttner · 2025-02-19T16:09:51Z

...erence/src/main/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunker.java

-            switch (embeddingType) {
-                case FLOAT -> floatResults.add(new AtomicArray<>(numberOfSubBatches));
-                case BYTE -> byteResults.add(new AtomicArray<>(numberOfSubBatches));
-                case SPARSE -> sparseResults.add(new AtomicArray<>(numberOfSubBatches));


Just wanted to confirm but it seems like AtomicArray permits null values. I can't tell if the previous code was trying to retrieve them though 🤔

The AtomicArray (old code) is just a wrapper of AtomicReferenceArray (new code), of which the additional logic wasn't really used (except for setOnce, which is just set with an extra validation).

jonathan-buttner · 2025-02-19T16:10:55Z

...erence/src/main/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunker.java

-        }
-    }
-
-    private int addToBatches(ChunkOffsetsAndInput chunk, int inputIndex) {


Did all of this logic distill to the stream group-by call below? Or is this the primary changes to begin addressing the OOMs?

Yes, this basically does the same with less code. This PR functionally doesn't change a thing.

It's just cleaning up, so that I can address the OOM issue in a clean kitchen (which I like).

Did all of this logic distill to the stream group-by call below?

All this logic distilled to the stream group-by 🫢

davidkyle

LGTM

Thanks for the clean up

davidkyle · 2025-02-21T12:14:07Z

...erence/src/main/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunker.java

+        AtomicInteger counter = new AtomicInteger();
+        this.batchRequests = requests.stream()
+            .flatMap(List::stream)
+            .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / maxNumberOfInputsPerBatch))


Nice, you've deleted hundreds of lines of code with this 😁

* refactor * inference generics * more refactor * unify naming * remove interface "EmbeddingInt" * more renaming * javadoc * revert accidental changeas * remove ununsed EmbeddingRequestChunker.EmbeddingType * polish * support chunking for text embedding bits * Polish error messagex * fix VoyageAI conflicts

* Refactor EmbeddingRequestChunker (#122818) * refactor * inference generics * more refactor * unify naming * remove interface "EmbeddingInt" * more renaming * javadoc * revert accidental changeas * remove ununsed EmbeddingRequestChunker.EmbeddingType * polish * support chunking for text embedding bits * Polish error messagex * fix VoyageAI conflicts * conflicts

jan-elastic marked this pull request as draft February 18, 2025 10:45

elasticsearchmachine added the v9.1.0 label Feb 18, 2025

jan-elastic force-pushed the refactor-EmbeddingRequestChunker branch from 73fe062 to 0ada0a9 Compare February 18, 2025 12:02

jan-elastic commented Feb 18, 2025

View reviewed changes

jan-elastic force-pushed the refactor-EmbeddingRequestChunker branch from 44cdcc4 to 93489dd Compare February 19, 2025 09:50

jan-elastic force-pushed the refactor-EmbeddingRequestChunker branch 2 times, most recently from 49fc34a to 7726027 Compare February 19, 2025 10:25

jan-elastic requested a review from davidkyle February 19, 2025 11:45

jan-elastic added :ml Machine learning Team:ML Meta label for the ML team labels Feb 19, 2025

jan-elastic requested a review from jonathan-buttner February 19, 2025 11:46

jan-elastic marked this pull request as ready for review February 19, 2025 13:20

jonathan-buttner reviewed Feb 19, 2025

View reviewed changes

davidkyle approved these changes Feb 21, 2025

View reviewed changes

jan-elastic added 12 commits February 21, 2025 13:29

refactor

70df343

inference generics

eca6419

more refactor

4ca308b

unify naming

68b7267

remove interface "EmbeddingInt"

4627c6d

more renaming

7680feb

javadoc

8a828fc

revert accidental changeas

4fc880b

remove ununsed EmbeddingRequestChunker.EmbeddingType

2c8e2ab

polish

fc5d1ee

support chunking for text embedding bits

cc04a5a

Polish error messagex

099c35a

jan-elastic force-pushed the refactor-EmbeddingRequestChunker branch from 7fdad46 to 099c35a Compare February 21, 2025 12:32

jan-elastic added the >non-issue label Feb 21, 2025

fix VoyageAI conflicts

97e0fae

jan-elastic merged commit 5f99708 into main Feb 21, 2025
18 checks passed

jan-elastic deleted the refactor-EmbeddingRequestChunker branch February 21, 2025 14:29

jan-elastic mentioned this pull request Feb 25, 2025

[8.x] Refactor EmbeddingRequestChunker (backport #122818) #123410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor EmbeddingRequestChunker #122818

Refactor EmbeddingRequestChunker #122818

Uh oh!

jan-elastic commented Feb 18, 2025 •

edited

Loading

Uh oh!

jan-elastic Feb 18, 2025

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

elasticsearchmachine commented Feb 19, 2025

Uh oh!

jonathan-buttner left a comment

Uh oh!

jonathan-buttner Feb 19, 2025

Uh oh!

jan-elastic Feb 21, 2025

Uh oh!

jonathan-buttner Feb 19, 2025

Uh oh!

jan-elastic Feb 21, 2025

Uh oh!

davidkyle Feb 21, 2025

Uh oh!

davidkyle left a comment

Uh oh!

davidkyle Feb 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refactor EmbeddingRequestChunker #122818

Refactor EmbeddingRequestChunker #122818

Uh oh!

Conversation

jan-elastic commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan-elastic Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

elasticsearchmachine commented Feb 19, 2025

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jan-elastic commented Feb 18, 2025 •

edited

Loading