datastax
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 17 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 4 additions & 4 deletions b/‎README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎UPGRADING.md‎
Lines changed: 9 additions & 2 deletions b/‎UPGRADING.md‎
Lines changed: 9 additions & 2 deletions
diff --git a/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/ParallelWriteBenchmark.java‎
Lines changed: 5 additions & 5 deletions b/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/ParallelWriteBenchmark.java‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/RecallWithRandomVectorsBenchmark.java‎
Lines changed: 2 additions & 5 deletions b/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/RecallWithRandomVectorsBenchmark.java‎
Lines changed: 2 additions & 5 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/BufferedRandomAccessWriter.java‎
Lines changed: 8 additions & 4 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/BufferedRandomAccessWriter.java‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/ByteBufferReader.java‎
Lines changed: 5 additions & 0 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/ByteBufferReader.java‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/IndexWriter.java‎
Lines changed: 5 additions & 0 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/IndexWriter.java‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/RandomAccessReader.java‎
Lines changed: 68 additions & 1 deletion b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/RandomAccessReader.java‎
Lines changed: 68 additions & 1 deletion
@@ -1,4 +1,5 @@
 target/
+local/
 .mvn/wrapper/maven-wrapper.jar
 .java-version
 
 
@@ -4,8 +4,25 @@ All notable changes to this project will be documented in this file. Dates are d
 
 Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog).
 
+#### [4.0.0-rc.6](https://github.com/datastax/jvector/compare/4.0.0-rc.5...4.0.0-rc.6)
+
+- release 4.0.0-rc.6 [`#571`](https://github.com/datastax/jvector/pull/571)
+- fix javadoc error [`#570`](https://github.com/datastax/jvector/pull/570)
+- Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes [`#569`](https://github.com/datastax/jvector/pull/569)
+- Computation of reconstruction errors for vector compressors [`#567`](https://github.com/datastax/jvector/pull/567)
+- Add NVQ paper in README [`#560`](https://github.com/datastax/jvector/pull/560)
+- Add ImmutableGraphIndex.isHierarchical [`#563`](https://github.com/datastax/jvector/pull/563)
+- Harden tests for heap graph reconstruction [`#543`](https://github.com/datastax/jvector/pull/543)
+- Make the thresholds in TestLowCardinalityFiltering tighter [`#559`](https://github.com/datastax/jvector/pull/559)
+- Begin development on 4.0.0-rc.6 [`#558`](https://github.com/datastax/jvector/pull/558)
+- Revert "Start development on 4.0.0-rc.6-SNAPSHOT" [`4f661d9`](https://github.com/datastax/jvector/commit/4f661d99ca840f22e5b700ae4e28653d2316bc21)
+- Start development on 4.0.0-rc.6-SNAPSHOT [`fdee577`](https://github.com/datastax/jvector/commit/fdee5779a27b46934a2770e260ea5d9e5335a505)
+
 #### [4.0.0-rc.5](https://github.com/datastax/jvector/compare/4.0.0-rc.4...4.0.0-rc.5)
 
+> 22 October 2025
+
+- chore: update changelog for 4.0.0-rc.5 [`#557`](https://github.com/datastax/jvector/pull/557)
 - Release 4.0.0-rc.5 [`#556`](https://github.com/datastax/jvector/pull/556)
 - Add RemappedRandomAccessVectorValues to fix BuildScoreProvider::randomAccessScoreProvider [`#555`](https://github.com/datastax/jvector/pull/555)
 - Fix ScoreTracker initialization and reset methods [`#551`](https://github.com/datastax/jvector/pull/551)
 
@@ -25,7 +25,7 @@ The upper layers of the hierarchy are represented by an in-memory adjacency list
 The bottom layer of the graph is represented by an on-disk adjacency list per node. JVector uses additional data stored inline to support two-pass searches, with the first pass powered by lossily compressed representations of the vectors kept in memory, and the second by a more accurate representation read from disk.  The first pass can be performed with
 * Product quantization (PQ), optionally with [anisotropic weighting](https://arxiv.org/abs/1908.10396)
 * [Binary quantization](https://huggingface.co/blog/embedding-quantization) (BQ)
-* Fused ADC, where PQ codebooks are transposed and written inline with the graph adjacency list
+* Fused PQ, where PQ codebooks are written inline with the graph adjacency list
 
 The second pass can be performed with
 * Full resolution float32 vectors
@@ -265,13 +265,13 @@ Commentary:
 
 * Embeddings models produce output from a consistent distribution of vectors. This means that you can save and re-use ProductQuantization codebooks, even for a different set of vectors, as long as you had a sufficiently large training set to build it the first time around. ProductQuantization.MAX_PQ_TRAINING_SET_SIZE (128,000 vectors) has proven to be sufficiently large.
 * JDK ThreadLocal objects cannot be referenced except from the thread that created them.  This is a difficult design into which to fit caching of Closeable objects like GraphSearcher.  JVector provides the ExplicitThreadLocal class to solve this.
-* Fused ADC is only compatible with Product Quantization, not Binary Quantization.  This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/).  That said, BQ continues to be supported with non-Fused indexes.
+* Fused PQ is only compatible with Product Quantization, not Binary Quantization.  This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/).  That said, BQ continues to be supported with non-Fused indexes.
 * JVector heavily utilizes the Panama Vector API(SIMD) for ANN indexing and search.  We have seen cases where the memory bandwidth is saturated during indexing and product quantization and can cause the process to slow down. To avoid this, the batch methods for index and PQ builds use a [PhysicalCoreExecutor](https://javadoc.io/doc/io.github.jbellis/jvector/latest/io/github/jbellis/jvector/util/PhysicalCoreExecutor.html) to limit the amount of operations to the physical core count. The default value is 1/2 the processor count seen by Java. This may not be correct in all setups (e.g. no hyperthreading or hybrid architectures) so if you wish to override the default use the `-Djvector.physical_core_count` property, or pass in your own ForkJoinPool instance.
 
 
 ### Advanced features
 
-* Fused ADC is represented as a Feature that is supported during incremental index construction, like InlineVectors above.  [See the Grid class for sample code](https://github.com/jbellis/jvector/blob/main/jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java).
+* Fused PQ is represented as a Feature that is supported during incremental index construction, like InlineVectors above.  [See the Grid class for sample code](https://github.com/jbellis/jvector/blob/main/jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java).
 * Anisotropic PQ is built into the ProductQuantization class and can improve recall, but nobody knows how to tune it (with the T/threshold parameter) except experimentally on a per-model basis, and choosing the wrong setting can make things worse.  From Figure 3 in the paper: 
 ![APQ performnce on Glove first improves and then degrades as T increases](https://github.com/jbellis/jvector/assets/42158/fd459222-6929-43ca-a405-ac34dbaf6646)
 
@@ -284,7 +284,7 @@ Commentary:
 * Foundational work: [HNSW](https://ieeexplore.ieee.org/abstract/document/8594636) and [DiskANN](https://suhasjs.github.io/files/diskann_neurips19.pdf) papers, and [a higher level explainer](https://www.datastax.com/guides/hierarchical-navigable-small-worlds)
 * [Anisotropic PQ paper](https://arxiv.org/abs/1908.10396)
 * [Quicker ADC paper](https://arxiv.org/abs/1812.09162)
-
+* [NVQ paper](https://arxiv.org/abs/2509.18471)
 
 ## Developing and Testing
 This project is organized as a [multimodule Maven build](https://maven.apache.org/guides/mini/guide-multiple-modules.html). The intent is to produce a multirelease jar suitable for use as
 
@@ -8,7 +8,15 @@
 - Support for hierarchical graph indices. This new type of index blends HNSW and DiskANN in a novel way. An
   HNSW-like hierarchy resides in memory for quickly seeding the search. This also reduces the need for caching the
   DiskANN graph near the entrypoint. The base layer of the hierarchy is a DiskANN-like index and inherits its
-  properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.  
+  properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.
+- The feature previously known as Fused ADC has been renamed to Fused PQ. This feature allows to offload the PQ
+  codebooks from memory during search, storing them within the graph in a way that does not slow down the search.
+  Implementation notes: The implementation of this feature has been overhauled to not require native code acceleration.
+  This explores a design space allowing for packed representations of vectors fused into the graph in shapes optimal 
+  for approximate score calculation. This new feature of graph indexes is opt-in but fully functional now. Any graph
+  degree limitations have been lifted. At this time, only 256-cluster ProductQuantization can use fused PQ.
+  Version 6 or greater of the file disk format is required to use this feature.
+
 
 ## API changes
 - MemorySegmentReader.Supplier and SimpleMappedReader.Supplier must now be explicitly closed, instead of being
@@ -20,7 +28,6 @@
   we do early termination of the search. In certain cases, this can accelerate the search at the potential cost of some
   accuracy. It is set to false by default.
 - The constructors of GraphIndexBuilder allow to specify different maximum out-degrees for the graphs in each layer.
-  However, this feature does not work with FusedADC in this version.
 
 ### API changes in 3.0.6
 
 
@@ -26,7 +26,7 @@
 import io.github.jbellis.jvector.graph.disk.OrdinalMapper;
 import io.github.jbellis.jvector.graph.disk.feature.Feature;
 import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
-import io.github.jbellis.jvector.graph.disk.feature.FusedADC;
+import io.github.jbellis.jvector.graph.disk.feature.FusedPQ;
 import io.github.jbellis.jvector.graph.disk.feature.NVQ;
 import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
 import io.github.jbellis.jvector.quantization.NVQuantization;
@@ -85,7 +85,7 @@ public class ParallelWriteBenchmark {
 
     // Feature state reused between iterations
     private NVQ nvqFeature;
-    private FusedADC fusedAdcFeature;
+    private FusedPQ fusedPQFeature;
     private OrdinalMapper identityMapper;
     private Map<FeatureId, IntFunction<Feature.State>> inlineSuppliers;
 
@@ -119,7 +119,7 @@ public void setup() throws IOException {
         int nSubVectors = floatVectors.dimension() == 2 ? 1 : 2;
         var nvq = NVQuantization.compute(floatVectors, nSubVectors);
         nvqFeature = new NVQ(nvq);
-        fusedAdcFeature = new FusedADC(graph.maxDegree(), pqVectors.getCompressor());
+        fusedPQFeature = new FusedPQ(graph.maxDegree(), pqVectors.getCompressor());
 
         inlineSuppliers = new EnumMap<>(FeatureId.class);
         inlineSuppliers.put(FeatureId.NVQ_VECTORS, ordinal -> new NVQ.State(nvq.encode(floatVectors.getVector(ordinal))));
@@ -189,13 +189,13 @@ private void writeGraph(ImmutableGraphIndex graph,
         try (var writer = new OnDiskGraphIndexWriter.Builder(graph, path)
                 .withParallelWrites(parallel)
                 .with(nvqFeature)
-                .with(fusedAdcFeature)
+                .with(fusedPQFeature)
                 .withMapper(identityMapper)
                 .build()) {
             var view = graph.getView();
             Map<FeatureId, IntFunction<Feature.State>> writeSuppliers = new EnumMap<>(FeatureId.class);
             writeSuppliers.put(FeatureId.NVQ_VECTORS, inlineSuppliers.get(FeatureId.NVQ_VECTORS));
-            writeSuppliers.put(FeatureId.FUSED_ADC, ordinal -> new FusedADC.State(view, pqVectors, ordinal));
+            writeSuppliers.put(FeatureId.FUSED_PQ, ordinal -> new FusedPQ.State(view, pqVectors, ordinal));
 
             writer.write(writeSuppliers);
             view.close();
 
@@ -247,11 +247,8 @@ private double calculateRecall(Set<Integer> predicted, int[] groundTruth, int k)
         int actualK = Math.min(k, Math.min(predicted.size(), groundTruth.length));
 
         for (int i = 0; i < actualK; i++) {
-            for (int j = 0; j < actualK; j++) {
-                if (predicted.contains(groundTruth[j])) {
-                    hits++;
-                    break;
-                }
+            if (predicted.contains(groundTruth[i])) {
+                hits++;
             }
         }
 
 
@@ -35,6 +35,11 @@ public class BufferedRandomAccessWriter implements RandomAccessWriter {
     private final RandomAccessFile raf;
     private final DataOutputStream stream;
 
+    /**
+     * Creates a buffered random access writer for the given path.
+     * @param path the path to write to
+     * @throws FileNotFoundException if the file cannot be created
+     */
     public BufferedRandomAccessWriter(Path path) throws FileNotFoundException {
         raf = new RandomAccessFile(path.toFile(), "rw");
         stream = new DataOutputStream(new BufferedOutputStream(new RandomAccessOutputStream(raf)));
@@ -88,10 +93,9 @@ public void flush() throws IOException {
     }
 
     /**
-     * return the CRC32 checksum for the range [startOffset .. endOffset)
-     * <p>
-     * the file pointer will be left at endOffset.
-     * <p>
+     * Returns the CRC32 checksum for the range [startOffset .. endOffset)
+     *
+     * The file pointer will be left at endOffset.
      */
     @Override
     public long checksum(long startOffset, long endOffset) throws IOException {
 
@@ -23,8 +23,13 @@
  * RandomAccessReader that reads from a ByteBuffer
  */
 public class ByteBufferReader implements RandomAccessReader {
+    /** The underlying ByteBuffer. */
     protected final ByteBuffer bb;
 
+    /**
+     * Creates a ByteBufferReader from the given ByteBuffer.
+     * @param sourceBB the source ByteBuffer
+     */
     public ByteBufferReader(ByteBuffer sourceBB) {
         bb = sourceBB;
     }
 
@@ -20,9 +20,14 @@
 import java.io.DataOutput;
 import java.io.IOException;
 
+/**
+ * Interface for writing index data.
+ */
 public interface IndexWriter extends DataOutput, Closeable {
     /**
+     * Returns the current position in the output.
      * @return the current position in the output
+     * @throws IOException if an I/O error occurs
      */
     long position() throws IOException;
 }
@@ -32,32 +32,99 @@
  * uses the ReaderSupplier API to create a RandomAccessReader per thread, as needed.
  */
 public interface RandomAccessReader extends AutoCloseable {
+    /**
+     * Seeks to the specified offset.
+     * @param offset the offset to seek to
+     * @throws IOException if an I/O error occurs
+     */
     void seek(long offset) throws IOException;
 
+    /**
+     * Returns the current position.
+     * @return the current position
+     * @throws IOException if an I/O error occurs
+     */
     long getPosition() throws IOException;
 
+    /**
+     * Reads an integer.
+     * @return the integer value
+     * @throws IOException if an I/O error occurs
+     */
     int readInt() throws IOException;
 
+    /**
+     * Reads a float.
+     * @return the float value
+     * @throws IOException if an I/O error occurs
+     */
     float readFloat() throws IOException;
 
+    /**
+     * Reads a long.
+     * @return the long value
+     * @throws IOException if an I/O error occurs
+     */
     long readLong() throws IOException;
 
+    /**
+     * Reads bytes into the array.
+     * @param bytes the byte array to read into
+     * @throws IOException if an I/O error occurs
+     */
     void readFully(byte[] bytes) throws IOException;
 
+    /**
+     * Reads bytes into the buffer.
+     * @param buffer the ByteBuffer to read into
+     * @throws IOException if an I/O error occurs
+     */
     void readFully(ByteBuffer buffer) throws IOException;
 
+    /**
+     * Reads floats into the array.
+     * @param floats the float array to read into
+     * @throws IOException if an I/O error occurs
+     */
     default void readFully(float[] floats) throws IOException {
         read(floats, 0, floats.length);
     }
 
+    /**
+     * Reads longs into the array.
+     * @param vector the long array to read into
+     * @throws IOException if an I/O error occurs
+     */
     void readFully(long[] vector) throws IOException;
 
+    /**
+     * Reads integers into the array.
+     * @param ints the int array to read into
+     * @param offset the offset in the array
+     * @param count the number of integers to read
+     * @throws IOException if an I/O error occurs
+     */
     void read(int[] ints, int offset, int count) throws IOException;
 
+    /**
+     * Reads floats into the array.
+     * @param floats the float array to read into
+     * @param offset the offset in the array
+     * @param count the number of floats to read
+     * @throws IOException if an I/O error occurs
+     */
     void read(float[] floats, int offset, int count) throws IOException;
 
+    /**
+     * Closes this reader.
+     * @throws IOException if an I/O error occurs
+     */
     void close() throws IOException;
 
-    // Length of the reader slice
+    /**
+     * Returns the length of the reader slice.
+     * @return the length
+     * @throws IOException if an I/O error occurs
+     */
     long length() throws IOException;
 }
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`target/`
	`2`	`+local/`
`2`	`3`	`.mvn/wrapper/maven-wrapper.jar`
`3`	`4`	`.java-version`
`4`	`5`
Original file line number	Diff line number	Diff line change
`@@ -247,11 +247,8 @@ private double calculateRecall(Set<Integer> predicted, int[] groundTruth, int k)`
`247`	`247`	`int actualK = Math.min(k, Math.min(predicted.size(), groundTruth.length));`
`248`	`248`
`249`	`249`	`for (int i = 0; i < actualK; i++) {`
`250`		`- for (int j = 0; j < actualK; j++) {`
`251`		`- if (predicted.contains(groundTruth[j])) {`
`252`		`- hits++;`
`253`		`- break;`
`254`		`- }`
	`250`	`+ if (predicted.contains(groundTruth[i])) {`
	`251`	`+ hits++;`
`255`	`252`	`}`
`256`	`253`	`}`
`257`	`254`