Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,11 @@
import org.apache.lucene.util.hnsw.UpdateableRandomVectorScorer;
import org.apache.lucene.util.quantization.OptimizedScalarQuantizer;

/** Copied from Lucene, replace with Lucene's implementation sometime after Lucene 10 */
/**
* Writes quantized vector values and metadata to index segments in the format for Lucene 10.4.
*
* @lucene.experimental
*/
public class Lucene104ScalarQuantizedVectorsWriter extends FlatVectorsWriter {
private static final long SHALLOW_RAM_BYTES_USED =
shallowSizeOfInstance(Lucene104ScalarQuantizedVectorsWriter.class);
Expand All @@ -72,12 +76,8 @@ public class Lucene104ScalarQuantizedVectorsWriter extends FlatVectorsWriter {
private final Lucene104ScalarQuantizedVectorScorer vectorsScorer;
private boolean finished;

/**
* Sole constructor
*
* @param vectorsScorer the scorer to use for scoring vectors
*/
protected Lucene104ScalarQuantizedVectorsWriter(
/** Sole constructor */
public Lucene104ScalarQuantizedVectorsWriter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK, but why make it public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the writer (and so the format it represents) can be created and used by other classes outside of Lucene104ScalarQuantizedVectorsFormat. The reader is already public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that is kind of a tautology - I will be like the 5-year-old here and ask so why do we want the writer to be created and used by other classes outside of Lucene104ScalarQuantizedVectorsFormat :) I really have no objection - just curious what motivated this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See elastic/elasticsearch#136627 (comment) - we want to use the new quantizer in some updated formats in Elasticsearch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it thanks. Would love to see the bfloat16 support contributed here at some point!

SegmentWriteState state,
ScalarEncoding encoding,
FlatVectorsWriter rawVectorDelegate,
Expand Down
Loading