-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Build tiny segments on the CPU rather than the GPU #136387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good!
Some nitpicks and only 1 real comment on the duplication between flush/merge
x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsFormat.java
Outdated
Show resolved
Hide resolved
|
||
OnHeapHnswGraph buildGraphWithTheCPU(RandomVectorScorerSupplier scorerSupplier, int numVectors) throws IOException { | ||
assert numVectors > 0; | ||
var hnswGraphBuilder = HnswGraphBuilder.create(scorerSupplier, M, beamWidth, HnswGraphBuilder.randSeed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a follow-up: I saw discussion about adjusting M for CPU vs GPU; should we somehow adjust it?
(Just a reminder, I don't think this should go in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the graphs here are quite small, so should be fine, but any reference or hints would be gratefully appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @mayya-sharipova chime in here -- it's way out of my comfort zone :)
x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java
Outdated
Show resolved
Hide resolved
writeMeta(fieldInfo, vectorIndexOffset, vectorIndexLength, datasetSize, graph, graphLevelNodeOffsets); | ||
} | ||
|
||
void createGraphWithCPUAndWriteMeta(FieldInfo fieldInfo, IndexInput input, int size) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very close to flushFieldBuildingGraphOnCPU/generateCPUGraphAndWriteMeta (besides supporting BYTE in merge) -- can we simplify? Or reorganize/give these more specific names? I got a bit lost and had to navigate to the caller to understand the differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, they are subtly different. I did try a few different options, but ultimately how we deal with things on the CPU is quite different to the GPU, and I felt that abstracting out things too much hurt readability. Tho, I do agree that it's a bit "windy" to follow! :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking more about how we deal with things in flush and merge.
If I got this correctly, we have
flush -> flushFieldBuildingGraphOnCPU -> generateCPUGraphAndWriteMeta
and
mergeOneField -> createGraphWithCPUAndWriteMeta
Both generateCPUGraphAndWriteMeta
and createGraphWithCPUAndWriteMeta
call buildGraphWithTheCPU
+ writeGraphAndMeta
; createGraphWithCPUAndWriteMeta
looks a lot like generateCPUGraphAndWriteMeta
and flushFieldBuildingGraphOnCPU
combined, but for the BYTE case.
Maybe they can be merged? Or at least renamed, to make clear one is for the flush case and the other for the merge case?
This commit avoids building tiny segments on the GPU, but rather builds them on the CPU.
We pick the threshold of 10k vectors as the default, lower than this threshold the graph will be built on the CPU. Ultimately I'd prefer to just brute-force below this threshold, but the lucene reader does not yet support this. It should do in the yet-to-be-released Lucene 10.4.