All notable changes to this project will be documented in this file. Dates are displayed in UTC.
Generated by auto-changelog.
19 September 2025
- Regression enhancements
#526 - chore: update changelog for 4.0.0-rc.3
#527 - streamline PR checklist
#525 - add checklist template and initial CONTRIBUTIONS.md guide
#523 - GraphIndexBuilder::addGraphNode must iterate all graph levels to estimate used bytes
#521 - GitHub actions regression test
#499 - Remove unused construction batch member from OnHeapGraphIndex
#510 - Switch from syncronized to concurrent map for pq codebook
#518 - Enable specifying the benchmarks in the yaml file
#515 - Create partial sums for PQ codebook for use during diversity checks
#511 - PQ ranging bugfix and refactoring
#508 - Reducing the number of allocations in GraphSearcher
#501 - SimdOps and NativeSimd ops refactored, VectorUtilSupport simplified
#498 - Add specific BuildScoreProvider for diversity to avoid extra encoding…
#503 - Release 4.0.0-rc.3
f3d235c - Start development on 4.0.0-rc.3-SNAPSHOT
631515d
22 July 2025
- chore: update changelog for 4.0.0-rc.2
#505 - Improvements to throughput benchmark
#502 - Fix dataset naming when using default.yml
#500 - Fix bad assert hit on CC cluster
#491 - AUX counters and correlated benchmarks
#492 - Enable Generate Changelog GHA to use label
#495 - Update permissions so bot can push branch.
#493 - New changelog automation via manual GHA
#488 - Revert "Release 4.0.0 rc.1"
#489 - Release 4.0.0-rc.1
117f127 - Release 4.0.0-rc.2
2f7d54e - start development on 4.0.0-rc.2-SNAPSHOT
7b36335
2 July 2025
- CHANGELOG.md for 4.0.0-rc.1 and earlier releases.
#487 - Fix issue when calling cleanup while concurrently executing searches
#483 - Improve the efficiency and memory usage of GraphIndexBuilder.cleanup
#477 - add PQ training benchmark
#482 - Remove extraneous character from datasets.yml
#484 - Upgrade YAML files to v5 after the format was introduced in the last update
#478 - New chunked memory-mapped reader that supports >2GB files
61bffbe - release 4.0.0-rc.1
1602706 - Fix comparison in TestADCGraphIndex
b637f65
13 June 2025
- Add a new graph node using a search score
#473 - chore(release): Bump tag version and update changelog
#471 - Sequential disk writer (#475). Upgrades file format from 4 to 5
d0ccb32 - Allow empty sections in datasets.yml & add colbert-1M.yml
2bf5f9a - chore (release): Start release version 4.0.0-beta.6
9a453a3
23 May 2025
- Bench improvements with YAML config files
#462 - avx512 test runners, jobs, and assertions
#469 - Find insertion point before performing copy in ConcurrentNeighborMap
#468 - Factorize the computation of diverse edges
#466 - Random vector index build jmh + setup scripts
#424 - Better control over setting string formats for the benchmark metrics
#461 - Check values of clusterCount in PQ
#464 - Perf metrics improvement v1.1
#460 - Simplify OnDiskGraphIndex.View to avoid code duplication
#458 - Improve perf metrics measurement and summarization
#459 - limit concurrency on single test node
3ce5dfd - testing avx512 on branch
e1ef819 - add better diagnostic titles
e2ed6ad
15 April 2025
- Creating starting point for changelog tracking.
#456 - chore(release) adjust changelog generation steps
#437 - Workflow/tag release update
#452 - Workflow/tag release update
#450 - Workflow/tag release update
#448 - Workflow/tag release update
#446 - Workflow/tag release update
#444 - Workflow/tag release update
#442 - Workflow/tag release update
#440 - Fixes for GHA tag-release workflow
#438 - Fix minor bug in getNodes.
#434 - Fix/refactor NodeScoreIterator, BoundedLongHeap, and GrowableLongHeap bulk addition implementations
#433 - Remove extra prefix of v from tag version
#432 - Update only the root level pom.xml as part of the GHA workflow
#431 - New GHA workflow to create new tag and update changelog
#428 - bugfix for 429 - eliminate maven-resources-plugin warning
#430 - Release 4.0.0-beta.4
ae85838 - Start development on 4.0.0-beta.4-SNAPSHOT
b5d9b85
9 April 2025
- Update test resume
#422 - Fix calls to deprecated GraphIndex.size()
#426 - Fix NPE in GraphIndexBuilder.load
#425 - Reduce the number of vector allocations in BuildScoreProvider.pqBuilderScoreProvider
#419 - Improve the computation of accuracy
#408 - Merge latest commits from hnsw-3
#423 - Fix native implementations of PQ assembleAndSum and pqDecodedCosineSimilarity
#420 - Fix FusedADC.writeInline
#417 - Implement NodeQueue#pushAll and AbstractLongHeap#addAll
#415 - Release 4.0.0-beta.3
ee56efc - Start development on 4.0.0-beta.3-SNAPSHOT
41ce85d
2 April 2025
- Count expanded nodes
#406 - Search pruning & fix the reported number of visited nodes
#405 - Fix flaky tests and eliminate console output
#404 - Remove query-time usage of ByteSequence::slice to reduce object allocations
#403 - add index construction benchmark
#398 - Add jmh benchmarks
#396 - Fix MutableBQVectors parameterization. Add basic test coverage.
#395 - make examples use index view
#392 - Update Test2DThreshold to control for averages instead of worst-case statistics
#391 - Change variable names to improve readability
#388 - Fix NVQ distance computations in Native provider
#389 - Improved use ScoreTracker to avoid wasteful searching for very large k
#387 - Use ScoreTracker to avoid wasteful searching for very large k
#384 - Squashed merge of PR #402: Add hierarchical structure to the graph index
00a13a8 - SimpleMappedReader no longer closes its ReaderSupplier
9613109 - add jmh skeleton
c9aa09d
9 January 2025
- Fix CI on Windows due to missing posix_madvise support
#383 - add MADV_RANDOM
#382 - Make ravv usage thread-safe
#381 - Non-uniform vector quantization
#374 - Hand-unroll the SIMD dot product loop
#380 - Fix regression in assembleAndSum PQ decoder performance
#379 - MutableBQVectors grows incrementally like MutablePQVectors
0a25715 - make vectorCount atomic in MutablePQ
72044bf - ada2-1M
431538e
24 December 2024
- replace test that allocated multiple GB of PQVectors with calculateChunkParameters, this makes JUnit's small VMs happy
01ec971 - MutablePQVectors grows dynamically, this is a better fit for CompactionGraph
b76d9c3 - fix math in PQVectors.load
11722d7
23 December 2024
- split PQVectors and BQVectors into Mutable and Immutable implementations; extract MutableCompressedVectors
3fe81be - add GraphIndexBuilder.rescore()
6078177 - Add missing licenses
ceba0da
3 December 2024
- Store compressed vectors in dense ByteSequence for PQVectors
#370 - Reenable SimdOps.assembleAndSum; implement Panama/Native equivalent for CosineDecoder acceleration
#368 - Use fma in SimdOps.cosineSimilarity sum vector
#363 - Remove max JDK version check
3c18670 - Don't use segment hashCode in MemorySegmentVectorFloat, as it depends on segment base/offset in the heap rather than contents. This breaks testing around PQVectors hashcodes.
90e84a9 - Use fma in VectorSimdOps.cosineSimilarity
8f115d7
30 September 2024
- Improve performance of reconnectOrphanedNodes
#359 - Use float in cosine metric final calculation in default vectorization provider
#358 - approximateMediod returns a random node when the graph is too disconnected to search for the centroid
#356 - Use float in cosine metric final calculation in default vectorization provider (#358)
#357 - @SuppressWarnings("StatementWithEmptyBody")
80f9e40 - rename insertOne to insertEdge
ca8538d - simplify
d65aa9a
13 August 2024
- Remove check for VBMI on CPU. With Fused ADC using shorts rather than bytes, we no longer need vpermi2b.
#352 - Set IdentityMapper maxOrdinal correctly in Grid/SiftSmall.
#351 - Release 3.0.0
542cb3c - Start development on 3.0.0-beta.17-SNAPSHOT
1dc1b88
13 August 2024
- add support for non-sequential remapped ordinals
#349 - fix global centering and add test that raw computation equals precomputed
0f78056 - remove duplicate vectors from Bench datasets
0ddead5 - add sanity check for M
355c93a
3 July 2024
- make OrdinalMapper top-level and make MapMapper public
b518038 - renumber the entry point when writing the graph
fa97330 - Release 3.0.0-beta.15
b919e3d
2 July 2024
- cache reranked scores
#341 - extract RandomAccessWriter interface from BRAW
#340 - Release 3.0.0-beta.14
0619242 - Start development on 3.0.0-beta.14-SNAPSHOT
13512e7 - Override buildCompression for 2dgrid in Bench
f352a2a
7 June 2024
- Clear scratch structures if search terminates exceptionally
#337 - Reduce tendency of reconnectOrphanedNodes to leave orphaned nodes
#335 - Release 3.0.0-beta.13
9126f0e - throw an error if the user (me) is dumb and asks to write a Feature that doesn't exist
b7da318 - javadoc
0f139e3
29 May 2024
- add writeHeader and getPath methods to OnDiskGraphIndexWriter
80ac365 - update javadoc
5292005 - Release 3.0.0-beta.12
3205bf9
28 May 2024
- Ecapsulate NodeArray internals
#328 - Remove on-disk reranking
#327 - standardize ReaderSupplier implementations as inner classes of their respective RandomAccessReaders, and add a Supplier for SimpleReader
#323 - Migrate ConcurrentNeighborSet to ConcurrentNeighborMap + CNM.Neighbors
084eb50 - Implement support for COSINE in fused ADC
d8a2b49 - More Neighbors memory savings:
a53c92c
17 May 2024
- Fix InlineVectorValues.size/LvqVectorValues.size
#320 - Release 3.0.0-beta.10
0d8c51c - Start development on 3.0.0-beta.10-SNAPSHOT
c918354
13 May 2024
- improve ramBytesUsed estimates
005e202 - copy nodes during insertDiverse to keep array size within expected bounds
624e4e5 - rename maxConnections -> maxDegree
bfb4085
8 May 2024
- Fix writing jvector2-compatible indexes incrementally
#313 - remove BQ centering
50c4fa8 - test demonstrating that centering is harmful to bq since it changes the angles between vectors
fa5c050 - Release 3.0.0-beta.7
e24eb09
6 May 2024
- add rerankK to GraphSearcher::search, and worstApproximateScoreInTopK to SearchResult
7e3ace9 - cleanup
7d4cd51 - Release 3.0.0-beta.6
39f98e7
3 May 2024
- rewrite readme, step 1
411ec29 - rewrite readme, step 2
3ed199c - add ability to write old versions of PQ and ODGI. current version standardized as 3 to avoid confusion
a9b694b
2 May 2024
- Switch Fused ADC from 32-cluster to 256-cluster PQ, maxDegree 32 graphs. Implement support in default/Panama SIMD, but these will degraded performance. Native support is required for performance improvement.
8cac3ad - turning SiftSmall into a tutorial
9bdf319 - more tutorial code
4e09bf8
22 April 2024
- OrdinalMapper
#299 - Add a memory-mapped RandomAccessReader using MemorySegment api
#296 - support building larger-than-memory datasets via writeInline incremental construction and new FeatureSource interface
dc4e89d - extract InlineVectorValues, LvqVectorValues, CachingVectorValues
ade8b6a - extract ByteBufferReader from SimpleMappedReader
b70f2bc
22 April 2024
- tweaks to make life easier for upgraders
#290 - Restructure BRAW to implement DataOutput directly to allow buffering across multiple write operations
48c0366 - add VectorCompressor::encodeAll(RandomAccessVectorValues ravv)
ce215db - add MapRandomAccessVectorValues
97e523c
12 April 2024
- remove (broken) concurrency support from removeDeletedNodes
#273 - Reduce allocation by pooling GraphSearcher objects
#270 - track visited nodes using IntHashSet instead of different BitSets
#269 - make Test2DThreshold less fragile
#266 - Generify new abstracted ODGI, such that the caching hierarchy doesn't require casting back to view/cachednode types. Fix any leakage of return types outside of visibility scope.
#262 - Support building indexes using compressed vectors
#244 - Enable JDK 22 CI
#258 - Use on-heap MemorySegments for native vectors/sequences
#246 - Merge 3.0-alpha
#256 - Replace NormalDistributionTracker with TwoPhaseTracker
#247 - Anisotropic PQ
#201 - add a diverseBefore marker to avoid recomputing diversity that hasn't changed
#242 - only compute aMagnitude for cosines once, since it is independent of the query vector
#220 - Introduce VectorFloat/ByteSequence abstractions to allow alternative implementations of indexed vectors/auxiliary byte structures. Remove support for graph indexes over byte vectors. Introduce experimental native vectorization provider. Introduce experimental fused graphs.
52a395e - Introduce OnDiskGraphIndex feature abstraction to embed additional content in graph (e.g., fused ADC or LVQ). Add the ability to incrementally write inline content to a graph index. Caching index only implements caching of edges.
09f6850 - Implement single-level 8-bit Turbo LVQ. This includes on-disk format and dot product support.
f45239d
19 March 2024
- Replace NormalDistributionTracker with TwoPhaseTracker
#247 - only compute aMagnitude for cosines once, since it is independent of the query vector
#220 - Release 3.0.0-alpha.7
a7b41f2 - add missing join() to parallel reconnect code
ef79c01 - Start development on 3.0.0-alpha.7
5af058f
26 February 2024
- replace optimistic locking with pessimistic to prevent size() inconsistencies
#218 - use euclidean similarity for centroid search if the centroid is zero and the index uses cosine similarity
733fac3 - increase contention in testConcurrency to highlight the consistency issue between put, remove, and size
b089be5 - de-pessimize extractSubvectors
fef6cf1
9 February 2024
- Fix GraphIndexBuilder.reconnectOrphanedNodes
#214 - Release 3.0.0-alpha.5
751d30b - fix dataset url
e4c314e - Start development on 3.0.0-alpha.5-SNAPSHOT
4bd7ad5
8 February 2024
- Misc cleanup of KMPPClusterer after recent change. Expose MAX_PQ_TRAINING_SET_SIZE.
1b944ba - add
getCompressorto CompressedVectors335a24d - getCompressor returns specific instance type
b4f6914
8 February 2024
- Improve GraphIndexBuilder#cleanup doc for concurrent searches
#188 - remove PoolingSupport in favor of direct usage of ExplicitThreadLocal
240770c - add PQ.refine
926fc7f - cleanup VectorUtil:
78eb7c5
11 January 2024
- Cleanup for 3.0 release
#182 - implement resume()
0b7ce1f - only rerank candidates whose approximate score is greater than rerankFloor (experimental)
6f8dcc1 - parameter
thresholdno longer experimental, and clarify javdoc for search()94c8d64
11 January 2024
- Release 2.0.5
#171 - Make GraphIndexBuilder.markNodeDeleted thread-safe
#174 - Fix DenseIntMap size
a8422d6 - Remove caching of vectors encountered during search, update reranker interface accordingly
afa761c - remove BQ as a default option and move 2D grid to end of the run (run is now ordered ~most -> ~least interesting results)
4d18363
20 December 2023
- Remove ThreadPooling self reference in Pooled object to prevent memory leak
#169 - PhysicalCoreExecutor can exit gracefully; CachingGraphIndex's cacheDistance can be customized
#160 - Add optional FJP args for indexing and quantization
#162 - Fix some bugs
#156 - Use fma in SIMD Euclidean/cosine
#153 - Fix usage of null acceptOrds in SiftSmall example
#152 - add FJP args to GraphIndexBuilder , ProductQuantization and BinaryQuantization
6dec317 - Scrub both fvec/ivec and hdf5 dot product datasets
9c79fa4 - rm code about FJP approach
bb505c6
10 November 2023
- vectorsEncountered not always in sync with resultQueue, causing NPE when breaking out of loop due to threshold probability
#150 - Run verify phase in CI (which includes license checks)
#149 - add e5-v2-base, e5-v2-large, and gecko datasets
43d0e35 - integrate 2dgrid with Bench regex
733b74c - minimum nodes to start checking probability -> 300
0b93065
8 November 2023
- add upgrade guide
#147 - Split LongHeap into Growable and Bounded flavors
#146 - Revert "cap initial neighborqueue size at 1024"
316f43f - cap initial neighborqueue size at 1024
3bccb44 - Release 2.0.3
15c2516
7 November 2023
- Release 2.0.1
#144 - Release 2.0.2
0835401 - seek offset needs to be long
9cc3408 - Start development on 2.0.2
fce770a
7 November 2023
- add getOriginalSize and getCompressedSize to CompressedVectors interface
#143 - Cherry-pick various bench improvements from PR #76.
#133 - Cherry-pick various bench improvements from PR #76. Incorporate other miscellaneous Bench fixes/improvements.
6d614fa - Swap maybeDownloadFvecs to single arg, as we're now doing per-dataset downloads
1e2865c - Release 2.0.1
5dc06ec
6 November 2023
- Fix running single test using Maven commandline
#141 - Reconnect orphaned nodes in cleanup()
#138 - Add binary quantization
#135 - Updated downloadhelper
#134 - downloads wikipedia fvec files for 100k, switched to squad based query vectors
#130 - Addresses issue #36 by adding license header checks. Added headers on…
#129 - Ipcexample
#127 - Deletes
#117 - CI improvement
#131 - Adds DenseIntMap for building graph with much less contention. back to zero dependency
#128 - reformat
393bfac - add minimum similarity threshold parameters to search
bcbd65e - Addresses issue #36 by adding license header checks. Added headers on missing files, added exlsion list for files that dont necessarily need them
6ced572
31 October 2023
- fix mergeNeighbors to not add duplicate nodes, and fix test to check for duplicates
#119 - Mt index build fixes
#113 - Fork test VM per core
#111 - Add improved test coverage for on-disk graph caching
#109 - Add pooling and rm thread locals
2c6c758 - WIP
d96ba05 - Cleanup and docs
1cdb8b1
6 October 2023
- add test (currently failing) that exercises decoded similarity functions
78b76fc - turn testEncodings into an assert
31ac3f3 - Release 1.0.1
71e269a
6 October 2023
- Fix SimpleMappedReader to respect offset
#106 - KMeansPlusPlusClusterer optimizations
#100 - Add simd approach for summing the cached PQ products of each encoded vector
#104 - README tweaks
#101 - Add simd approach for summing the cached PQ query factors of each encoded vector. Remove some more collections. Add memory size of PQ and compressed vectors
051e099 - read graph neighbors all at once so that it is not possible to mangle state between invocations of nextInt()
f08048b - rename maxEdgesPerNode -> maxDegree
672eb07
6 October 2023
- Remove triangle inequality from k means plus plus
#98 - Clean up all build warnings related to multimodule versioning.
#97 - Fragment cache
#94 - wikipedia datasets in readme
#95 - Add decodedCosine fast path
#91 - Implement optimized decoded square distance
#89 - Fix code coverage in IntelliJ
#88 - Fix recall regression for centered PQ with non-dot product metrics
#84 - Refactor tests into jvector-tests module. Set up configurations to be able to run tests with JDK11 features and JDK20 features.
#75 - move sharing from annotation to method; use that in PQ
#83 - rename BFS_DISTANCE to CACHE_distance; fixes #96
#96 - clean out unused desc=false mode from NeighborArray
dcc3dad - cleanup
9755de3 - add wikipedia dataset using SiftLoader
82a9697
6 October 2023
- Minor documentation updates
#77 - make View extend AutoCloseable. Fixes #78
#78 - update PQ to take RAVV. fixes #74
#74 - Release 0.9.2.
a023467 - call validateEntryNode in Builder.complete
ca6a9ed - Merge #78 and regularize exception handling in close()
d2162aa
6 October 2023
- Change package from com.github.jbellis to io.github.jbellis
#71 - Release 0.9.0
#69 - Update README.md
ccc08c5 - Release 0.9.1
021e356 - Update README.md
1fef413
6 October 2023
- Build rework for release. Split examples out into separate project.
#68 - Build updates to prep for release
#66 - Javadoc fixes and produce javadoc as part of the build
#64 - Revert changes to hdf5 path in Bench, set up mvn exec to use parent basedir
#65 - Migrate to Maven multi-module project and produce multirelease jar
#61 - Swap Maven jdk11 build to use -release 11, which also checks for too-new APIs
#59 - Remove MacOS runner
#58 - Fix Windows tests
#57 - Switch to maven profiles, add gh workflow for tests and add package f…
#55 - Switch to ThreadLocal cached searcher in GraphIndexBuilder
#56 - Fix npe
#53 - Track numVisited when performing graph searches
#51 - Replace NBHM with CHM, remove NBHM dependency
#49 - Perf improvements
#47 - JVECTOR-31
#46 - JVECTOR-8
#42 - Fix offset math in testing MappedRandomAccessReader readFloatsAt
#44 - Issue 38 multiple jdk builds
#40 - JVECTOR-20
#41 - Faster on-disk index flush in Bench
#37 - JVector 24: DiskANN
#28 - Integrate PQ to recall Bench
#14 - Add PanamaVectorSupport for Java 20+
#16 - Added details for running both example classes via maven exec plugin
#18 - Integrate recall benchmark functionality
#10 - Add mvn wrapper
#7 - Initial clean up of pom.xml for packaging and release
#5 - fix dotProduct-with-offsets in Default provider. fixes #60
#60 - apply alpha-diversity incrementally in enforceMaxConnLimit, like we do in insertDiverse. This significantly improves the quality of connections retained, vs just starting at the back with max alpha. Closes #33
#33 - Created initial README.md
#6 - import hnsw + pq
9f5b9c6 - tests wip
9370807 - tests, and Sift works again (fixed GraphSearcher)
7dd96f5