Skip to content

Conversation

@ldematte
Copy link
Contributor

@ldematte ldematte commented Sep 3, 2025

When we build the Cagra index using a GPU, we need to copy vector data to GPU memory.
We want to optimize this scenario, or better, we want to avoid wasteful use of memory.
Currently, vector data is held

  1. on-heap
  2. on-disk
  3. native memory
  4. GPU memory

Some host-memory copies are probably necessary: when data is transferred from host to GPU memory it needs to be as a MemorySegment holding the 2D data as contiguous rows.

To improve on the current situation we can:

a. get rid of the on-heap copy; instead of using a FlatFieldVectorsWriter<float[]>, we used something that stores data to a MemorySegment directly (instead of a List of float[])
b. get rid of the native memory copy: we use the on-disk data and memory-map that file, so that a copy happens between the mmap file and GPU memory. Since we have just flushed the vector data, it is likely the mmapped data is still in the process memory
c. get rid of the native memory copy: we copy data directly from on-heap memory to GPU memory; this is technically possible but currently it is a non-optimized scenario and unsupported scenario in cuvs-java (even if we have plan to cover it), so it will be work for a follow-up PR.

This PR implements (b)

# Conflicts:
#	x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ESGpuHnswVectorsWriter.java
# Conflicts:
#	x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ESGpuHnswVectorsWriter.java
@ldematte ldematte added :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU labels Sep 3, 2025
}
}

public static IndexOutput getVectorDataIndexOutput(FlatVectorsWriter flatVectorWriter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called getQuantizedVectorDataIndexOutput and below method getVectorDataIndexOutput?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Relevance/Vectors Vector search test-gpu Run tests using a GPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants