Skip to content

Commit 30e8932

Browse files
Enable the fused graph index (#561)
This PR does extensive work to bring back the Fused Graph Index (FGI)
1 parent d8848fc commit 30e8932

File tree

53 files changed

+1489
-1039
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1489
-1039
lines changed

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The upper layers of the hierarchy are represented by an in-memory adjacency list
2525
The bottom layer of the graph is represented by an on-disk adjacency list per node. JVector uses additional data stored inline to support two-pass searches, with the first pass powered by lossily compressed representations of the vectors kept in memory, and the second by a more accurate representation read from disk. The first pass can be performed with
2626
* Product quantization (PQ), optionally with [anisotropic weighting](https://arxiv.org/abs/1908.10396)
2727
* [Binary quantization](https://huggingface.co/blog/embedding-quantization) (BQ)
28-
* Fused ADC, where PQ codebooks are transposed and written inline with the graph adjacency list
28+
* Fused PQ, where PQ codebooks are written inline with the graph adjacency list
2929

3030
The second pass can be performed with
3131
* Full resolution float32 vectors
@@ -265,13 +265,13 @@ Commentary:
265265

266266
* Embeddings models produce output from a consistent distribution of vectors. This means that you can save and re-use ProductQuantization codebooks, even for a different set of vectors, as long as you had a sufficiently large training set to build it the first time around. ProductQuantization.MAX_PQ_TRAINING_SET_SIZE (128,000 vectors) has proven to be sufficiently large.
267267
* JDK ThreadLocal objects cannot be referenced except from the thread that created them. This is a difficult design into which to fit caching of Closeable objects like GraphSearcher. JVector provides the ExplicitThreadLocal class to solve this.
268-
* Fused ADC is only compatible with Product Quantization, not Binary Quantization. This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/). That said, BQ continues to be supported with non-Fused indexes.
268+
* Fused PQ is only compatible with Product Quantization, not Binary Quantization. This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/). That said, BQ continues to be supported with non-Fused indexes.
269269
* JVector heavily utilizes the Panama Vector API(SIMD) for ANN indexing and search. We have seen cases where the memory bandwidth is saturated during indexing and product quantization and can cause the process to slow down. To avoid this, the batch methods for index and PQ builds use a [PhysicalCoreExecutor](https://javadoc.io/doc/io.github.jbellis/jvector/latest/io/github/jbellis/jvector/util/PhysicalCoreExecutor.html) to limit the amount of operations to the physical core count. The default value is 1/2 the processor count seen by Java. This may not be correct in all setups (e.g. no hyperthreading or hybrid architectures) so if you wish to override the default use the `-Djvector.physical_core_count` property, or pass in your own ForkJoinPool instance.
270270

271271

272272
### Advanced features
273273

274-
* Fused ADC is represented as a Feature that is supported during incremental index construction, like InlineVectors above. [See the Grid class for sample code](https://github.com/jbellis/jvector/blob/main/jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java).
274+
* Fused PQ is represented as a Feature that is supported during incremental index construction, like InlineVectors above. [See the Grid class for sample code](https://github.com/jbellis/jvector/blob/main/jvector-examples/src/main/java/io/github/jbellis/jvector/example/Grid.java).
275275
* Anisotropic PQ is built into the ProductQuantization class and can improve recall, but nobody knows how to tune it (with the T/threshold parameter) except experimentally on a per-model basis, and choosing the wrong setting can make things worse. From Figure 3 in the paper:
276276
![APQ performnce on Glove first improves and then degrades as T increases](https://github.com/jbellis/jvector/assets/42158/fd459222-6929-43ca-a405-ac34dbaf6646)
277277

@@ -286,7 +286,6 @@ Commentary:
286286
* [Quicker ADC paper](https://arxiv.org/abs/1812.09162)
287287
* [NVQ paper](https://arxiv.org/abs/2509.18471)
288288

289-
290289
## Developing and Testing
291290
This project is organized as a [multimodule Maven build](https://maven.apache.org/guides/mini/guide-multiple-modules.html). The intent is to produce a multirelease jar suitable for use as
292291
a dependency from any Java 11 code. When run on a Java 20+ JVM with the Vector module enabled, optimized vector

UPGRADING.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,15 @@
88
- Support for hierarchical graph indices. This new type of index blends HNSW and DiskANN in a novel way. An
99
HNSW-like hierarchy resides in memory for quickly seeding the search. This also reduces the need for caching the
1010
DiskANN graph near the entrypoint. The base layer of the hierarchy is a DiskANN-like index and inherits its
11-
properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.
11+
properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.
12+
- The feature previously known as Fused ADC has been renamed to Fused PQ. This feature allows to offload the PQ
13+
codebooks from memory during search, storing them within the graph in a way that does not slow down the search.
14+
Implementation notes: The implementation of this feature has been overhauled to not require native code acceleration.
15+
This explores a design space allowing for packed representations of vectors fused into the graph in shapes optimal
16+
for approximate score calculation. This new feature of graph indexes is opt-in but fully functional now. Any graph
17+
degree limitations have been lifted. At this time, only 256-cluster ProductQuantization can use fused PQ.
18+
Version 6 or greater of the file disk format is required to use this feature.
19+
1220

1321
## API changes
1422
- MemorySegmentReader.Supplier and SimpleMappedReader.Supplier must now be explicitly closed, instead of being
@@ -20,7 +28,6 @@
2028
we do early termination of the search. In certain cases, this can accelerate the search at the potential cost of some
2129
accuracy. It is set to false by default.
2230
- The constructors of GraphIndexBuilder allow to specify different maximum out-degrees for the graphs in each layer.
23-
However, this feature does not work with FusedADC in this version.
2431

2532
### API changes in 3.0.6
2633

jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java

Lines changed: 5 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,6 @@ void searchOneLayer(SearchScoreProvider scoreProvider,
396396

397397
// track scores to predict when we are done with threshold queries
398398
var scoreTracker = scoreTrackerFactory.getScoreTracker(pruneSearch, rerankK, threshold);
399-
VectorFloat<?> similarities = null;
400399

401400
// the main search loop
402401
while (candidates.size() > 0) {
@@ -423,25 +422,12 @@ void searchOneLayer(SearchScoreProvider scoreProvider,
423422

424423
// score the neighbors of the top candidate and add them to the queue
425424
var scoreFunction = scoreProvider.scoreFunction();
426-
var useEdgeLoading = scoreFunction.supportsEdgeLoadingSimilarity();
427-
if (useEdgeLoading) {
428-
similarities = scoreFunction.edgeLoadingSimilarityTo(topCandidateNode);
429-
}
430-
int i = 0;
431-
for (var it = view.getNeighborsIterator(level, topCandidateNode); it.hasNext(); ) {
432-
var friendOrd = it.nextInt();
433-
if (!visited.add(friendOrd)) {
434-
continue;
435-
}
425+
ImmutableGraphIndex.NeighborProcessor neighborProcessor = (node2, score) -> {
426+
scoreTracker.track(score);
427+
candidates.push(node2, score);
436428
visitedCount++;
437-
438-
float friendSimilarity = useEdgeLoading
439-
? similarities.get(i)
440-
: scoreFunction.similarityTo(friendOrd);
441-
scoreTracker.track(friendSimilarity);
442-
candidates.push(friendOrd, friendSimilarity);
443-
i++;
444-
}
429+
};
430+
view.processNeighbors(level, topCandidateNode, scoreFunction, visited::add, neighborProcessor);
445431
}
446432
} catch (Throwable t) {
447433
// clear scratch structures if terminated via throwable, as they may not have been drained

jvector-base/src/main/java/io/github/jbellis/jvector/graph/ImmutableGraphIndex.java

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535

3636
import java.io.Closeable;
3737
import java.io.IOException;
38+
import java.util.function.Function;
3839

3940
/**
4041
* Represents a graph-based vector index. Nodes are represented as ints, and edges are
@@ -140,6 +141,24 @@ default boolean containsNode(int nodeId) {
140141
*/
141142
int size(int level);
142143

144+
/**
145+
* The steps needed to process a neighbor during a search. That is, adding it to the priority queue, etc.
146+
*/
147+
interface NeighborProcessor {
148+
void process(int friendOrd, float similarity);
149+
}
150+
151+
/**
152+
* Serves as an abstract interface for marking nodes as visited
153+
*/
154+
@FunctionalInterface
155+
interface IntMarker {
156+
/**
157+
* Marks the node and returns true if it was not marked previously. Returns false otherwise
158+
*/
159+
boolean mark(int value);
160+
}
161+
143162
/**
144163
* Encapsulates the state of a graph for searching. Re-usable across search calls,
145164
* but each thread needs its own.
@@ -151,6 +170,12 @@ interface View extends Closeable {
151170
*/
152171
NodesIterator getNeighborsIterator(int level, int node);
153172

173+
/**
174+
* Iterates over the neighbors of a given node if they have not been visited yet.
175+
* For each non-visited neighbor, it computes its similarity and processes it using the given processor.
176+
*/
177+
void processNeighbors(int level, int node, ScoreFunction scoreFunction, IntMarker visited, NeighborProcessor neighborProcessor);
178+
154179
/**
155180
* This method is deprecated as most View usages should not need size.
156181
* Where they do, they could access the graph.

jvector-base/src/main/java/io/github/jbellis/jvector/graph/OnHeapGraphIndex.java

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import io.github.jbellis.jvector.disk.RandomAccessReader;
2929
import io.github.jbellis.jvector.graph.ConcurrentNeighborMap.Neighbors;
3030
import io.github.jbellis.jvector.graph.diversity.DiversityProvider;
31+
import io.github.jbellis.jvector.graph.similarity.ScoreFunction;
3132
import io.github.jbellis.jvector.util.Accountable;
3233
import io.github.jbellis.jvector.util.BitSet;
3334
import io.github.jbellis.jvector.util.Bits;
@@ -48,6 +49,7 @@
4849
import java.util.concurrent.atomic.AtomicIntegerArray;
4950
import java.util.concurrent.atomic.AtomicReference;
5051
import java.util.concurrent.locks.StampedLock;
52+
import java.util.function.Function;
5153
import java.util.stream.IntStream;
5254

5355
/**
@@ -470,6 +472,17 @@ public NodesIterator getNeighborsIterator(int level, int node) {
470472

471473
}
472474

475+
@Override
476+
public void processNeighbors(int level, int node, ScoreFunction scoreFunction, IntMarker visited, NeighborProcessor neighborProcessor) {
477+
for (var it = getNeighborsIterator(level, node); it.hasNext(); ) {
478+
var friendOrd = it.nextInt();
479+
if (visited.mark(friendOrd)) {
480+
float friendSimilarity = scoreFunction.similarityTo(friendOrd);
481+
neighborProcessor.process(friendOrd, friendSimilarity);
482+
}
483+
}
484+
}
485+
473486
@Override
474487
public int size() {
475488
return OnHeapGraphIndex.this.size(0);

jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/AbstractGraphIndexWriter.java

Lines changed: 83 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,20 @@
1818

1919
import io.github.jbellis.jvector.disk.IndexWriter;
2020
import io.github.jbellis.jvector.graph.ImmutableGraphIndex;
21-
import io.github.jbellis.jvector.graph.disk.feature.*;
21+
import io.github.jbellis.jvector.graph.disk.feature.Feature;
22+
import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
23+
import io.github.jbellis.jvector.graph.disk.feature.FusedFeature;
24+
import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
25+
import io.github.jbellis.jvector.graph.disk.feature.NVQ;
26+
import io.github.jbellis.jvector.graph.disk.feature.SeparatedFeature;
27+
import io.github.jbellis.jvector.graph.disk.feature.SeparatedNVQ;
28+
import io.github.jbellis.jvector.graph.disk.feature.SeparatedVectors;
29+
2230
import org.agrona.collections.Int2IntHashMap;
2331

2432
import java.io.IOException;
2533
import java.util.EnumMap;
34+
import java.util.LinkedHashMap;
2635
import java.util.List;
2736
import java.util.Map;
2837
import java.util.Set;
@@ -38,20 +47,18 @@ public abstract class AbstractGraphIndexWriter<T extends IndexWriter> implements
3847
final ImmutableGraphIndex graph;
3948
final OrdinalMapper ordinalMapper;
4049
final int dimension;
41-
// we don't use Map features but EnumMap is the best way to make sure we don't
42-
// accidentally introduce an ordering bug in the future
43-
final EnumMap<FeatureId, Feature> featureMap;
50+
final Map<FeatureId, Feature> featureMap;
4451
final T out; /* output for graph nodes and inline features */
4552
final int headerSize;
4653
volatile int maxOrdinalWritten = -1;
4754
final List<Feature> inlineFeatures;
4855

4956
AbstractGraphIndexWriter(T out,
50-
int version,
51-
ImmutableGraphIndex graph,
52-
OrdinalMapper oldToNewOrdinals,
53-
int dimension,
54-
EnumMap<FeatureId, Feature> features)
57+
int version,
58+
ImmutableGraphIndex graph,
59+
OrdinalMapper oldToNewOrdinals,
60+
int dimension,
61+
EnumMap<FeatureId, Feature> features)
5562
{
5663
if (graph.getMaxLevel() > 0 && version < 4) {
5764
throw new IllegalArgumentException("Multilayer graphs must be written with version 4 or higher");
@@ -60,8 +67,28 @@ public abstract class AbstractGraphIndexWriter<T extends IndexWriter> implements
6067
this.graph = graph;
6168
this.ordinalMapper = oldToNewOrdinals;
6269
this.dimension = dimension;
63-
this.featureMap = features;
64-
this.inlineFeatures = features.values().stream().filter(f -> !(f instanceof SeparatedFeature)).collect(Collectors.toList());
70+
71+
if (version <= 5) {
72+
// Versions <= 5 use the old feature ordering, simply provided by the FeatureId
73+
this.featureMap = features;
74+
this.inlineFeatures = features.values().stream().filter(f -> !(f instanceof SeparatedFeature)).collect(Collectors.toList());
75+
} else {
76+
// Version 6 uses the new feature ordering to place fused features last in the list
77+
var sortedFeatures = features.values().stream().sorted().collect(Collectors.toList());
78+
this.featureMap = new LinkedHashMap<>();
79+
for (var feature : sortedFeatures) {
80+
this.featureMap.put(feature.id(), feature);
81+
}
82+
this.inlineFeatures = sortedFeatures.stream().filter(f -> !(f instanceof SeparatedFeature)).sorted().collect(Collectors.toList());
83+
}
84+
85+
long fusedFeaturesCount = this.inlineFeatures.stream().filter(Feature::isFused).count();
86+
if (fusedFeaturesCount > 1) {
87+
throw new IllegalArgumentException("At most one fused feature is allowed");
88+
}
89+
if (fusedFeaturesCount == 1 && version < 6) {
90+
throw new IllegalArgumentException("Fused features require version 6 or higher");
91+
}
6592
this.out = out;
6693

6794
// create a mock Header to determine the correct size
@@ -164,7 +191,7 @@ public synchronized void writeHeader(ImmutableGraphIndex.View view, long startOf
164191
assert out.position() == startOffset + headerSize : String.format("%d != %d", out.position(), startOffset + headerSize);
165192
}
166193

167-
void writeSparseLevels(ImmutableGraphIndex.View view) throws IOException {
194+
void writeSparseLevels(ImmutableGraphIndex.View view, Map<FeatureId, IntFunction<Feature.State>> featureStateSuppliers) throws IOException {
168195
// write sparse levels
169196
for (int level = 1; level <= graph.getMaxLevel(); level++) {
170197
int layerSize = graph.size(level);
@@ -193,6 +220,50 @@ void writeSparseLevels(ImmutableGraphIndex.View view) throws IOException {
193220
throw new IllegalStateException("Mismatch between layer size and nodes written");
194221
}
195222
}
223+
224+
// In V6, fused features for the in-memory hierarchy are written in a block after the top layers of the graph.
225+
// Since everything in level 1 is also contained in the higher levels, we only need to write the fused features for level 1.
226+
if (version == 6) {
227+
// There should be only one fused feature per node. This is checked in the class constructor.
228+
// This is the only place where we explicitly need the fused feature. If there are more places in the
229+
// future, it may be worth having fusedFeature as class member.
230+
FusedFeature fusedFeature = null;
231+
for (var feature : inlineFeatures) {
232+
if (feature.isFused()) {
233+
fusedFeature = (FusedFeature) feature;
234+
}
235+
}
236+
if (fusedFeature != null) {
237+
var supplier = featureStateSuppliers.get(fusedFeature.id());
238+
if (supplier == null) {
239+
throw new IllegalStateException("Supplier for feature " + fusedFeature.id() + " not found");
240+
}
241+
242+
if (graph.getMaxLevel() >= 1) {
243+
int level = 1;
244+
int layerSize = graph.size(level);
245+
int nodesWritten = 0;
246+
for (var it = graph.getNodes(level); it.hasNext(); ) {
247+
int originalOrdinal = it.nextInt();
248+
249+
// We write the ordinal (node id) so that we can map it to the corresponding feature
250+
final int newOrdinal = ordinalMapper.oldToNew(originalOrdinal);
251+
out.writeInt(newOrdinal);
252+
fusedFeature.writeSourceFeature(out, supplier.apply(originalOrdinal));
253+
nodesWritten++;
254+
}
255+
if (nodesWritten != layerSize) {
256+
throw new IllegalStateException("Mismatch between layer 1 size and features written");
257+
}
258+
} else {
259+
// Write the source feature of the entry node
260+
final int originalEntryNode = view.entryNode().node;
261+
final int entryNode = ordinalMapper.oldToNew(originalEntryNode);
262+
out.writeInt(entryNode);
263+
fusedFeature.writeSourceFeature(out, supplier.apply(originalEntryNode));
264+
}
265+
}
266+
}
196267
}
197268

198269
void writeSeparatedFeatures(Map<FeatureId, IntFunction<Feature.State>> featureStateSuppliers) throws IOException {

0 commit comments

Comments
 (0)