[DiskBBQ] Break big posting lists into blocks #132498

iverase · 2025-08-06T12:46:28Z

The hierarchical k-means algorithm tries to build clusters with a recommended size but depending on the input data, it can end up building clusters with a much higher value. As an extreme case, if you add the same vector over and over again, you end up with just one cluster containing all the vectors.

This is dangerous at search time because we read the docIds of a posting list before scoring the vectors, so we would need to real all the docIds in one go. This PR prevents that by writing big posting lists in blocks so we can read block by block in that case the posting list is bigger than a threshold. I have randomly chosen 1600 as the threshold for writing blocks.

john-wagster · 2025-08-06T18:23:07Z

The approach makes sense. I was wondering why we might want to solve this here instead of in the hkmeans algo though. I need to go look again, but I would think ideally we'd stack a lot of duplicative centroids on top of one another effectively creating these blocks anyway by arbitrarily breaking up the duplicative vectors into duplicative centroids. I'll go review the code some more.

benwtrent · 2025-08-06T19:52:35Z

I think the real fix is to not read the entire doc id set in and hold it in memory. If we knew they were in ascending order, can we utilize some off-heap reader?

iverase · 2025-08-07T11:17:03Z

I think the real fix is to not read the entire doc id set in and hold it in memory

I don't know about this, it seems it will break the disk friendliness of the format if we need to be doing what it feels random reads on the doc ids.

benwtrent · 2025-08-07T12:30:42Z

I don't know about this, it seems it will break the disk friendliness of the format if we need to be doing what it feels random reads on the doc ids.

@iverase why would it be random?

We always read postings list in order. Do you mean random in that we need to go back to the "start" of the list to decode the next block of docs?

iverase · 2025-08-07T12:42:05Z

Do you mean random in that we need to go back to the "start" of the list to decode the next block of docs?

yes

benwtrent · 2025-08-07T13:05:59Z

server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java


        @Override
        public int visit(KnnCollector knnCollector) throws IOException {
+            byte postingListType = indexInput.readByte();


What if we just always put the block of docs at the start of every block?

So every block is
[blk0, blk1, blk2, ... tail] [[encoded doc ids, vectors],...[tail encoded doc ids, vectors]]`

We know the block size (16), we know the previous base block (if we want to delta encode eventually).

If we ever split soar and regular docs, we can delta encode with the "doc base" (just like regular postings list).

Are we concerned about speed or just size increase?

This is fair, I can have a go to it.

Introduced this approach in 71e30a1. Much simpler.

Much simpler indeed. My only concern is index size & performance. I would expect them to be mostly comparable, but you never know.

I think the way docIdsWriter works, I would expect better compression of docIds with the penalty of one byte per 16 vectors, so all in all it should be the same or even smaller (I am checking).
I don't expect and see any performance implications.

Checking the posting list size of 1m vectors with 1024 dims:

main: 302.780.965 bytes
PR: 301.815.585 bytes

even smaller?!?!? awesome!

Flamegraphs show a performance penalty (because of the extra byte).

main:

PR:

Ah, it breaks alignment, which is frustrating.

I am testing with sorted doc-ids right now (now that we aren't skipping duplicate vectors).

GroupVarInt also has a "single byte read" to determine the output flag. Having a single byte read for every group of 16 integers does seem weird.

I wonder if we can do something more clever by delta-encoding all the vectors (we read all the blocks in order anyways, so we can keep the running sum), and pick the appropriate encoding that works for all the blocks. Then we can write that encoding byte at the front of the entire list, and have uniform encoding for every block.

This might be slightly less disk efficient, but it will likely align better.

# Conflicts: # server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

[DiskBBQ] Break big posting lists into blocks

a66087f

iverase requested review from benwtrent and john-wagster August 6, 2025 12:46

iverase added >non-issue :Search Relevance/Vectors Vector search v9.2.0 labels Aug 6, 2025

iverase marked this pull request as draft August 6, 2025 12:46

benwtrent reviewed Aug 7, 2025

View reviewed changes

iverase added 3 commits August 7, 2025 14:41

Merge branch 'main' into blockPostingList

07474a3

# Conflicts: # server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

16 size blocks

71e30a1

doh

66dcaf8

iverase closed this Sep 5, 2025

[DiskBBQ] Break big posting lists into blocks #132498

[DiskBBQ] Break big posting lists into blocks #132498

Uh oh!

Conversation

iverase commented Aug 6, 2025

Uh oh!

john-wagster commented Aug 6, 2025

Uh oh!

benwtrent commented Aug 6, 2025

Uh oh!

iverase commented Aug 7, 2025

Uh oh!

benwtrent commented Aug 7, 2025

Uh oh!

iverase commented Aug 7, 2025

Uh oh!

benwtrent Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iverase Aug 7, 2025 •

edited

Loading

iverase Aug 7, 2025 •

edited

Loading