Skip to content

Conversation

ChrisHegarty
Copy link
Contributor

@ChrisHegarty ChrisHegarty commented Oct 10, 2025

This commit avoids building tiny segments on the GPU, but rather builds them on the CPU.

We pick the threshold of 10k vectors as the default, lower than this threshold the graph will be built on the CPU. Ultimately I'd prefer to just brute-force below this threshold, but the lucene reader does not yet support this. It should do in the yet-to-be-released Lucene 10.4.

@ChrisHegarty ChrisHegarty requested a review from ldematte October 10, 2025 13:00
@ChrisHegarty ChrisHegarty added :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.1 v9.3.0 labels Oct 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@ldematte ldematte changed the title Build tiny segments on the CPU rather than the CPU Build tiny segments on the CPU rather than the GPU Oct 10, 2025
@ChrisHegarty ChrisHegarty added the test-gpu Run tests using a GPU label Oct 12, 2025
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good!
Some nitpicks and only 1 real comment on the duplication between flush/merge


OnHeapHnswGraph buildGraphWithTheCPU(RandomVectorScorerSupplier scorerSupplier, int numVectors) throws IOException {
assert numVectors > 0;
var hnswGraphBuilder = HnswGraphBuilder.create(scorerSupplier, M, beamWidth, HnswGraphBuilder.randSeed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow-up: I saw discussion about adjusting M for CPU vs GPU; should we somehow adjust it?
(Just a reminder, I don't think this should go in this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the graphs here are quite small, so should be fine, but any reference or hints would be gratefully appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @mayya-sharipova chime in here -- it's way out of my comfort zone :)

writeMeta(fieldInfo, vectorIndexOffset, vectorIndexLength, datasetSize, graph, graphLevelNodeOffsets);
}

void createGraphWithCPUAndWriteMeta(FieldInfo fieldInfo, IndexInput input, int size) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very close to flushFieldBuildingGraphOnCPU/generateCPUGraphAndWriteMeta (besides supporting BYTE in merge) -- can we simplify? Or reorganize/give these more specific names? I got a bit lost and had to navigate to the caller to understand the differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, they are subtly different. I did try a few different options, but ultimately how we deal with things on the CPU is quite different to the GPU, and I felt that abstracting out things too much hurt readability. Tho, I do agree that it's a bit "windy" to follow! :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking more about how we deal with things in flush and merge.
If I got this correctly, we have
flush -> flushFieldBuildingGraphOnCPU -> generateCPUGraphAndWriteMeta
and
mergeOneField -> createGraphWithCPUAndWriteMeta

Both generateCPUGraphAndWriteMeta and createGraphWithCPUAndWriteMeta call buildGraphWithTheCPU + writeGraphAndMeta; createGraphWithCPUAndWriteMeta looks a lot like generateCPUGraphAndWriteMeta and flushFieldBuildingGraphOnCPU combined, but for the BYTE case.
Maybe they can be merged? Or at least renamed, to make clear one is for the flush case and the other for the merge case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-gpu Run tests using a GPU v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants