Description
Spin-off from discussion in #14708. One of the concern with with full precision (FP) re-ranking (for quantized vectors) is that if we use off-heap vector reader it will page-in the FP vector data and can compete with quantized vector data which are used for HNSW graph search. As HNSW will suffer the performance greatly if the vectors are not in memory, for instance with limited memory, can we support a mode to let the FP vectors be loaded with direct I/O? (Or if this is already possible?)
For integrating with the existing quantized vectors codec, is my understanding correct that we will need to create a new codec/vector reader that extend from the existing reader and use a different raw vector format?
I can try this, but wondering what the community think about it. Is there other use case that needs a on-heap direct I/O vector readers as well?