datastax
diff --git a/‎README.md‎
Lines changed: 3 additions & 4 deletions b/‎README.md‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎UPGRADING.md‎
Lines changed: 9 additions & 2 deletions b/‎UPGRADING.md‎
Lines changed: 9 additions & 2 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java‎
Lines changed: 5 additions & 19 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java‎
Lines changed: 5 additions & 19 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/ImmutableGraphIndex.java‎
Lines changed: 25 additions & 0 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/ImmutableGraphIndex.java‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/OnHeapGraphIndex.java‎
Lines changed: 13 additions & 0 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/OnHeapGraphIndex.java‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java‎
Lines changed: 83 additions & 12 deletions b/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java‎
Lines changed: 83 additions & 12 deletions
@@ -25,7 +25,7 @@ The upper layers of the hierarchy are represented by an in-memory adjacency list
 The bottom layer of the graph is represented by an on-disk adjacency list per node. JVector uses additional data stored inline to support two-pass searches, with the first pass powered by lossily compressed representations of the vectors kept in memory, and the second by a more accurate representation read from disk.  The first pass can be performed with
 * Product quantization (PQ), optionally with [anisotropic weighting](https://arxiv.org/abs/1908.10396)
 * [Binary quantization](https://huggingface.co/blog/embedding-quantization) (BQ)
-* Fused ADC, where PQ codebooks are transposed and written inline with the graph adjacency list
+* Fused PQ, where PQ codebooks are written inline with the graph adjacency list
 
 The second pass can be performed with
 * Full resolution float32 vectors
@@ -265,13 +265,13 @@ Commentary:
 
 * Embeddings models produce output from a consistent distribution of vectors. This means that you can save and re-use ProductQuantization codebooks, even for a different set of vectors, as long as you had a sufficiently large training set to build it the first time around. ProductQuantization.MAX_PQ_TRAINING_SET_SIZE (128,000 vectors) has proven to be sufficiently large.
 * JDK ThreadLocal objects cannot be referenced except from the thread that created them.  This is a difficult design into which to fit caching of Closeable objects like GraphSearcher.  JVector provides the ExplicitThreadLocal class to solve this.
-* Fused ADC is only compatible with Product Quantization, not Binary Quantization.  This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/).  That said, BQ continues to be supported with non-Fused indexes.
+* Fused PQ is only compatible with Product Quantization, not Binary Quantization.  This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/).  That said, BQ continues to be supported with non-Fused indexes.
 * JVector heavily utilizes the Panama Vector API(SIMD) for ANN indexing and search.  We have seen cases where the memory bandwidth is saturated during indexing and product quantization and can cause the process to slow down. To avoid this, the batch methods for index and PQ builds use a [PhysicalCoreExecutor](https://javadoc.io/doc/io.github.jbellis/jvector/latest/io/github/jbellis/jvector/util/PhysicalCoreExecutor.html) to limit the amount of operations to the physical core count. The default value is 1/2 the processor count seen by Java. This may not be correct in all setups (e.g. no hyperthreading or hybrid architectures) so if you wish to override the default use the `-Djvector.physical_core_count` property, or pass in your own ForkJoinPool instance.
 
 
 ### Advanced features
 
-* Fused ADC is represented as a Feature that is supported during incremental index construction, like InlineVectors above.  [See the Grid class for sample code](https://github.com/jbellis/jvector/blob/main/jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java).
+* Fused PQ is represented as a Feature that is supported during incremental index construction, like InlineVectors above.  [See the Grid class for sample code](https://github.com/jbellis/jvector/blob/main/jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java).
 * Anisotropic PQ is built into the ProductQuantization class and can improve recall, but nobody knows how to tune it (with the T/threshold parameter) except experimentally on a per-model basis, and choosing the wrong setting can make things worse.  From Figure 3 in the paper: 
 ![APQ performnce on Glove first improves and then degrades as T increases](https://github.com/jbellis/jvector/assets/42158/fd459222-6929-43ca-a405-ac34dbaf6646)
 
@@ -286,7 +286,6 @@ Commentary:
 * [Quicker ADC paper](https://arxiv.org/abs/1812.09162)
 * [NVQ paper](https://arxiv.org/abs/2509.18471)
 
-
 ## Developing and Testing
 This project is organized as a [multimodule Maven build](https://maven.apache.org/guides/mini/guide-multiple-modules.html). The intent is to produce a multirelease jar suitable for use as
 a dependency from any Java 11 code. When run on a Java 20+ JVM with the Vector module enabled, optimized vector
 
@@ -8,7 +8,15 @@
 - Support for hierarchical graph indices. This new type of index blends HNSW and DiskANN in a novel way. An
   HNSW-like hierarchy resides in memory for quickly seeding the search. This also reduces the need for caching the
   DiskANN graph near the entrypoint. The base layer of the hierarchy is a DiskANN-like index and inherits its
-  properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.  
+  properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.
+- The feature previously known as Fused ADC has been renamed to Fused PQ. This feature allows to offload the PQ
+  codebooks from memory during search, storing them within the graph in a way that does not slow down the search.
+  Implementation notes: The implementation of this feature has been overhauled to not require native code acceleration.
+  This explores a design space allowing for packed representations of vectors fused into the graph in shapes optimal 
+  for approximate score calculation. This new feature of graph indexes is opt-in but fully functional now. Any graph
+  degree limitations have been lifted. At this time, only 256-cluster ProductQuantization can use fused PQ.
+  Version 6 or greater of the file disk format is required to use this feature.
+
 
 ## API changes
 - MemorySegmentReader.Supplier and SimpleMappedReader.Supplier must now be explicitly closed, instead of being
@@ -20,7 +28,6 @@
   we do early termination of the search. In certain cases, this can accelerate the search at the potential cost of some
   accuracy. It is set to false by default.
 - The constructors of GraphIndexBuilder allow to specify different maximum out-degrees for the graphs in each layer.
-  However, this feature does not work with FusedADC in this version.
 
 ### API changes in 3.0.6
 
 
@@ -396,7 +396,6 @@ void searchOneLayer(SearchScoreProvider scoreProvider,
 
             // track scores to predict when we are done with threshold queries
             var scoreTracker = scoreTrackerFactory.getScoreTracker(pruneSearch, rerankK, threshold);
-            VectorFloat<?> similarities = null;
 
             // the main search loop
             while (candidates.size() > 0) {
@@ -423,25 +422,12 @@ void searchOneLayer(SearchScoreProvider scoreProvider,
 
                 // score the neighbors of the top candidate and add them to the queue
                 var scoreFunction = scoreProvider.scoreFunction();
-                var useEdgeLoading = scoreFunction.supportsEdgeLoadingSimilarity();
-                if (useEdgeLoading) {
-                    similarities = scoreFunction.edgeLoadingSimilarityTo(topCandidateNode);
-                }
-                int i = 0;
-                for (var it = view.getNeighborsIterator(level, topCandidateNode); it.hasNext(); ) {
-                    var friendOrd = it.nextInt();
-                    if (!visited.add(friendOrd)) {
-                        continue;
-                    }
+                ImmutableGraphIndex.NeighborProcessor neighborProcessor = (node2, score) -> {
+                    scoreTracker.track(score);
+                    candidates.push(node2, score);
                     visitedCount++;
-
-                    float friendSimilarity = useEdgeLoading
-                            ? similarities.get(i)
-                            : scoreFunction.similarityTo(friendOrd);
-                    scoreTracker.track(friendSimilarity);
-                    candidates.push(friendOrd, friendSimilarity);
-                    i++;
-                }
+                };
+                view.processNeighbors(level, topCandidateNode, scoreFunction, visited::add, neighborProcessor);
             }
         } catch (Throwable t) {
             // clear scratch structures if terminated via throwable, as they may not have been drained
 
@@ -35,6 +35,7 @@
 
 import java.io.Closeable;
 import java.io.IOException;
+import java.util.function.Function;
 
 /**
  * Represents a graph-based vector index.  Nodes are represented as ints, and edges are
@@ -140,6 +141,24 @@ default boolean containsNode(int nodeId) {
      */
     int size(int level);
 
+    /**
+     * The steps needed to process a neighbor during a search. That is, adding it to the priority queue, etc.
+     */
+    interface NeighborProcessor {
+        void process(int friendOrd, float similarity);
+    }
+
+    /**
+     * Serves as an abstract interface for marking nodes as visited
+     */
+    @FunctionalInterface
+    interface IntMarker {
+        /**
+         * Marks the node and returns true if it was not marked previously. Returns false otherwise
+         */
+        boolean mark(int value);
+    }
+
     /**
      * Encapsulates the state of a graph for searching.  Re-usable across search calls,
      * but each thread needs its own.
@@ -151,6 +170,12 @@ interface View extends Closeable {
          */
         NodesIterator getNeighborsIterator(int level, int node);
 
+        /**
+         * Iterates over the neighbors of a given node if they have not been visited yet.
+         * For each non-visited neighbor, it computes its similarity and processes it using the given processor.
+         */
+        void processNeighbors(int level, int node, ScoreFunction scoreFunction, IntMarker visited, NeighborProcessor neighborProcessor);
+
         /**
          * This method is deprecated as most View usages should not need size.
          * Where they do, they could access the graph.
 
@@ -28,6 +28,7 @@
 import io.github.jbellis.jvector.disk.RandomAccessReader;
 import io.github.jbellis.jvector.graph.ConcurrentNeighborMap.Neighbors;
 import io.github.jbellis.jvector.graph.diversity.DiversityProvider;
+import io.github.jbellis.jvector.graph.similarity.ScoreFunction;
 import io.github.jbellis.jvector.util.Accountable;
 import io.github.jbellis.jvector.util.BitSet;
 import io.github.jbellis.jvector.util.Bits;
@@ -48,6 +49,7 @@
 import java.util.concurrent.atomic.AtomicIntegerArray;
 import java.util.concurrent.atomic.AtomicReference;
 import java.util.concurrent.locks.StampedLock;
+import java.util.function.Function;
 import java.util.stream.IntStream;
 
 /**
@@ -470,6 +472,17 @@ public NodesIterator getNeighborsIterator(int level, int node) {
 
         }
 
+        @Override
+        public void processNeighbors(int level, int node, ScoreFunction scoreFunction, IntMarker visited, NeighborProcessor neighborProcessor) {
+            for (var it = getNeighborsIterator(level, node); it.hasNext(); ) {
+                var friendOrd = it.nextInt();
+                if (visited.mark(friendOrd)) {
+                    float friendSimilarity = scoreFunction.similarityTo(friendOrd);
+                    neighborProcessor.process(friendOrd, friendSimilarity);
+                }
+            }
+        }
+
         @Override
         public int size() {
             return OnHeapGraphIndex.this.size(0);
 
@@ -18,11 +18,20 @@
 
 import io.github.jbellis.jvector.disk.IndexWriter;
 import io.github.jbellis.jvector.graph.ImmutableGraphIndex;
-import io.github.jbellis.jvector.graph.disk.feature.*;
+import io.github.jbellis.jvector.graph.disk.feature.Feature;
+import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
+import io.github.jbellis.jvector.graph.disk.feature.FusedFeature;
+import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
+import io.github.jbellis.jvector.graph.disk.feature.NVQ;
+import io.github.jbellis.jvector.graph.disk.feature.SeparatedFeature;
+import io.github.jbellis.jvector.graph.disk.feature.SeparatedNVQ;
+import io.github.jbellis.jvector.graph.disk.feature.SeparatedVectors;
+
 import org.agrona.collections.Int2IntHashMap;
 
 import java.io.IOException;
 import java.util.EnumMap;
+import java.util.LinkedHashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
@@ -38,20 +47,18 @@ public abstract class AbstractGraphIndexWriter<T extends IndexWriter> implements
     final ImmutableGraphIndex graph;
     final OrdinalMapper ordinalMapper;
     final int dimension;
-    // we don't use Map features but EnumMap is the best way to make sure we don't
-    // accidentally introduce an ordering bug in the future
-    final EnumMap<FeatureId, Feature> featureMap;
+    final Map<FeatureId, Feature> featureMap;
     final T out; /* output for graph nodes and inline features */
     final int headerSize;
     volatile int maxOrdinalWritten = -1;
     final List<Feature> inlineFeatures;
 
     AbstractGraphIndexWriter(T out,
-                                     int version,
-                                     ImmutableGraphIndex graph,
-                                     OrdinalMapper oldToNewOrdinals,
-                                     int dimension,
-                                     EnumMap<FeatureId, Feature> features)
+                             int version,
+                             ImmutableGraphIndex graph,
+                             OrdinalMapper oldToNewOrdinals,
+                             int dimension,
+                             EnumMap<FeatureId, Feature> features)
     {
         if (graph.getMaxLevel() > 0 && version < 4) {
             throw new IllegalArgumentException("Multilayer graphs must be written with version 4 or higher");
@@ -60,8 +67,28 @@ public abstract class AbstractGraphIndexWriter<T extends IndexWriter> implements
         this.graph = graph;
         this.ordinalMapper = oldToNewOrdinals;
         this.dimension = dimension;
-        this.featureMap = features;
-        this.inlineFeatures = features.values().stream().filter(f -> !(f instanceof SeparatedFeature)).collect(Collectors.toList());
+
+        if (version <= 5) {
+            // Versions <= 5 use the old feature ordering, simply provided by the FeatureId
+            this.featureMap = features;
+            this.inlineFeatures = features.values().stream().filter(f -> !(f instanceof SeparatedFeature)).collect(Collectors.toList());
+        } else {
+            // Version 6 uses the new feature ordering to place fused features last in the list
+            var sortedFeatures = features.values().stream().sorted().collect(Collectors.toList());
+            this.featureMap = new LinkedHashMap<>();
+            for (var feature : sortedFeatures) {
+                this.featureMap.put(feature.id(), feature);
+            }
+            this.inlineFeatures = sortedFeatures.stream().filter(f -> !(f instanceof SeparatedFeature)).sorted().collect(Collectors.toList());
+        }
+
+        long fusedFeaturesCount = this.inlineFeatures.stream().filter(Feature::isFused).count();
+        if (fusedFeaturesCount > 1) {
+            throw new IllegalArgumentException("At most one fused feature is allowed");
+        }
+        if (fusedFeaturesCount == 1 && version < 6) {
+            throw new IllegalArgumentException("Fused features require version 6 or higher");
+        }
         this.out = out;
 
         // create a mock Header to determine the correct size
@@ -164,7 +191,7 @@ public synchronized void writeHeader(ImmutableGraphIndex.View view, long startOf
         assert out.position() == startOffset + headerSize : String.format("%d != %d", out.position(), startOffset + headerSize);
     }
 
-    void writeSparseLevels(ImmutableGraphIndex.View view) throws IOException {
+    void writeSparseLevels(ImmutableGraphIndex.View view, Map<FeatureId, IntFunction<Feature.State>> featureStateSuppliers) throws IOException {
         // write sparse levels
         for (int level = 1; level <= graph.getMaxLevel(); level++) {
             int layerSize = graph.size(level);
@@ -193,6 +220,50 @@ void writeSparseLevels(ImmutableGraphIndex.View view) throws IOException {
                 throw new IllegalStateException("Mismatch between layer size and nodes written");
             }
         }
+
+        // In V6, fused features for the in-memory hierarchy are written in a block after the top layers of the graph.
+        // Since everything in level 1 is also contained in the higher levels, we only need to write the fused features for level 1.
+        if (version == 6) {
+            // There should be only one fused feature per node. This is checked in the class constructor.
+            // This is the only place where we explicitly need the fused feature. If there are more places in the
+            // future, it may be worth having fusedFeature as class member.
+            FusedFeature fusedFeature = null;
+            for (var feature : inlineFeatures) {
+                if (feature.isFused()) {
+                    fusedFeature = (FusedFeature) feature;
+                }
+            }
+            if (fusedFeature != null) {
+                var supplier = featureStateSuppliers.get(fusedFeature.id());
+                if (supplier == null) {
+                    throw new IllegalStateException("Supplier for feature " + fusedFeature.id() + " not found");
+                }
+
+                if (graph.getMaxLevel() >= 1) {
+                    int level = 1;
+                    int layerSize = graph.size(level);
+                    int nodesWritten = 0;
+                    for (var it = graph.getNodes(level); it.hasNext(); ) {
+                        int originalOrdinal = it.nextInt();
+
+                        // We write the ordinal (node id) so that we can map it to the corresponding feature
+                        final int newOrdinal = ordinalMapper.oldToNew(originalOrdinal);
+                        out.writeInt(newOrdinal);
+                        fusedFeature.writeSourceFeature(out, supplier.apply(originalOrdinal));
+                        nodesWritten++;
+                    }
+                    if (nodesWritten != layerSize) {
+                        throw new IllegalStateException("Mismatch between layer 1 size and features written");
+                    }
+                } else {
+                    // Write the source feature of the entry node
+                    final int originalEntryNode = view.entryNode().node;
+                    final int entryNode = ordinalMapper.oldToNew(originalEntryNode);
+                    out.writeInt(entryNode);
+                    fusedFeature.writeSourceFeature(out, supplier.apply(originalEntryNode));
+                }
+            }
+        }
     }
 
     void writeSeparatedFeatures(Map<FeatureId, IntFunction<Feature.State>> featureStateSuppliers) throws IOException {