Skip to content

Latest commit

 

History

History
597 lines (448 loc) · 47 KB

File metadata and controls

597 lines (448 loc) · 47 KB

Changelog

All notable changes to this project will be documented in this file. Dates are displayed in UTC.

Generated by auto-changelog.

19 September 2025

  • Regression enhancements #526
  • chore: update changelog for 4.0.0-rc.3 #527
  • streamline PR checklist #525
  • add checklist template and initial CONTRIBUTIONS.md guide #523
  • GraphIndexBuilder::addGraphNode must iterate all graph levels to estimate used bytes #521
  • GitHub actions regression test #499
  • Remove unused construction batch member from OnHeapGraphIndex #510
  • Switch from syncronized to concurrent map for pq codebook #518
  • Enable specifying the benchmarks in the yaml file #515
  • Create partial sums for PQ codebook for use during diversity checks #511
  • PQ ranging bugfix and refactoring #508
  • Reducing the number of allocations in GraphSearcher #501
  • SimdOps and NativeSimd ops refactored, VectorUtilSupport simplified #498
  • Add specific BuildScoreProvider for diversity to avoid extra encoding… #503
  • Release 4.0.0-rc.3 f3d235c
  • Start development on 4.0.0-rc.3-SNAPSHOT 631515d

22 July 2025

  • chore: update changelog for 4.0.0-rc.2 #505
  • Improvements to throughput benchmark #502
  • Fix dataset naming when using default.yml #500
  • Fix bad assert hit on CC cluster #491
  • AUX counters and correlated benchmarks #492
  • Enable Generate Changelog GHA to use label #495
  • Update permissions so bot can push branch. #493
  • New changelog automation via manual GHA #488
  • Revert "Release 4.0.0 rc.1" #489
  • Release 4.0.0-rc.1 117f127
  • Release 4.0.0-rc.2 2f7d54e
  • start development on 4.0.0-rc.2-SNAPSHOT 7b36335

2 July 2025

  • CHANGELOG.md for 4.0.0-rc.1 and earlier releases. #487
  • Fix issue when calling cleanup while concurrently executing searches #483
  • Improve the efficiency and memory usage of GraphIndexBuilder.cleanup #477
  • add PQ training benchmark #482
  • Remove extraneous character from datasets.yml #484
  • Upgrade YAML files to v5 after the format was introduced in the last update #478
  • New chunked memory-mapped reader that supports >2GB files 61bffbe
  • release 4.0.0-rc.1 1602706
  • Fix comparison in TestADCGraphIndex b637f65

13 June 2025

  • Add a new graph node using a search score #473
  • chore(release): Bump tag version and update changelog #471
  • Sequential disk writer (#475). Upgrades file format from 4 to 5 d0ccb32
  • Allow empty sections in datasets.yml & add colbert-1M.yml 2bf5f9a
  • chore (release): Start release version 4.0.0-beta.6 9a453a3

23 May 2025

  • Bench improvements with YAML config files #462
  • avx512 test runners, jobs, and assertions #469
  • Find insertion point before performing copy in ConcurrentNeighborMap #468
  • Factorize the computation of diverse edges #466
  • Random vector index build jmh + setup scripts #424
  • Better control over setting string formats for the benchmark metrics #461
  • Check values of clusterCount in PQ #464
  • Perf metrics improvement v1.1 #460
  • Simplify OnDiskGraphIndex.View to avoid code duplication #458
  • Improve perf metrics measurement and summarization #459
  • limit concurrency on single test node 3ce5dfd
  • testing avx512 on branch e1ef819
  • add better diagnostic titles e2ed6ad

15 April 2025

  • Creating starting point for changelog tracking. #456
  • chore(release) adjust changelog generation steps #437
  • Workflow/tag release update #452
  • Workflow/tag release update #450
  • Workflow/tag release update #448
  • Workflow/tag release update #446
  • Workflow/tag release update #444
  • Workflow/tag release update #442
  • Workflow/tag release update #440
  • Fixes for GHA tag-release workflow #438
  • Fix minor bug in getNodes. #434
  • Fix/refactor NodeScoreIterator, BoundedLongHeap, and GrowableLongHeap bulk addition implementations #433
  • Remove extra prefix of v from tag version #432
  • Update only the root level pom.xml as part of the GHA workflow #431
  • New GHA workflow to create new tag and update changelog #428
  • bugfix for 429 - eliminate maven-resources-plugin warning #430
  • Release 4.0.0-beta.4 ae85838
  • Start development on 4.0.0-beta.4-SNAPSHOT b5d9b85

9 April 2025

  • Update test resume #422
  • Fix calls to deprecated GraphIndex.size() #426
  • Fix NPE in GraphIndexBuilder.load #425
  • Reduce the number of vector allocations in BuildScoreProvider.pqBuilderScoreProvider #419
  • Improve the computation of accuracy #408
  • Merge latest commits from hnsw-3 #423
  • Fix native implementations of PQ assembleAndSum and pqDecodedCosineSimilarity #420
  • Fix FusedADC.writeInline #417
  • Implement NodeQueue#pushAll and AbstractLongHeap#addAll #415
  • Release 4.0.0-beta.3 ee56efc
  • Start development on 4.0.0-beta.3-SNAPSHOT 41ce85d

2 April 2025

  • Count expanded nodes #406
  • Search pruning & fix the reported number of visited nodes #405
  • Fix flaky tests and eliminate console output #404
  • Remove query-time usage of ByteSequence::slice to reduce object allocations #403
  • add index construction benchmark #398
  • Add jmh benchmarks #396
  • Fix MutableBQVectors parameterization. Add basic test coverage. #395
  • make examples use index view #392
  • Update Test2DThreshold to control for averages instead of worst-case statistics #391
  • Change variable names to improve readability #388
  • Fix NVQ distance computations in Native provider #389
  • Improved use ScoreTracker to avoid wasteful searching for very large k #387
  • Use ScoreTracker to avoid wasteful searching for very large k #384
  • Squashed merge of PR #402: Add hierarchical structure to the graph index 00a13a8
  • SimpleMappedReader no longer closes its ReaderSupplier 9613109
  • add jmh skeleton c9aa09d

9 January 2025

  • Fix CI on Windows due to missing posix_madvise support #383
  • add MADV_RANDOM #382
  • Make ravv usage thread-safe #381
  • Non-uniform vector quantization #374
  • Hand-unroll the SIMD dot product loop #380
  • Fix regression in assembleAndSum PQ decoder performance #379
  • MutableBQVectors grows incrementally like MutablePQVectors 0a25715
  • make vectorCount atomic in MutablePQ 72044bf
  • ada2-1M 431538e

24 December 2024

  • replace test that allocated multiple GB of PQVectors with calculateChunkParameters, this makes JUnit's small VMs happy 01ec971
  • MutablePQVectors grows dynamically, this is a better fit for CompactionGraph b76d9c3
  • fix math in PQVectors.load 11722d7

23 December 2024

  • split PQVectors and BQVectors into Mutable and Immutable implementations; extract MutableCompressedVectors 3fe81be
  • add GraphIndexBuilder.rescore() 6078177
  • Add missing licenses ceba0da

3 December 2024

  • Store compressed vectors in dense ByteSequence for PQVectors #370
  • Reenable SimdOps.assembleAndSum; implement Panama/Native equivalent for CosineDecoder acceleration #368
  • Use fma in SimdOps.cosineSimilarity sum vector #363
  • Remove max JDK version check 3c18670
  • Don't use segment hashCode in MemorySegmentVectorFloat, as it depends on segment base/offset in the heap rather than contents. This breaks testing around PQVectors hashcodes. 90e84a9
  • Use fma in VectorSimdOps.cosineSimilarity 8f115d7

30 September 2024

  • Improve performance of reconnectOrphanedNodes #359
  • Use float in cosine metric final calculation in default vectorization provider #358
  • approximateMediod returns a random node when the graph is too disconnected to search for the centroid #356
  • Use float in cosine metric final calculation in default vectorization provider (#358) #357
  • @SuppressWarnings("StatementWithEmptyBody") 80f9e40
  • rename insertOne to insertEdge ca8538d
  • simplify d65aa9a

13 August 2024

  • Remove check for VBMI on CPU. With Fused ADC using shorts rather than bytes, we no longer need vpermi2b. #352
  • Set IdentityMapper maxOrdinal correctly in Grid/SiftSmall. #351
  • Release 3.0.0 542cb3c
  • Start development on 3.0.0-beta.17-SNAPSHOT 1dc1b88

13 August 2024

  • add support for non-sequential remapped ordinals #349
  • fix global centering and add test that raw computation equals precomputed 0f78056
  • remove duplicate vectors from Bench datasets 0ddead5
  • add sanity check for M 355c93a

3 July 2024

  • make OrdinalMapper top-level and make MapMapper public b518038
  • renumber the entry point when writing the graph fa97330
  • Release 3.0.0-beta.15 b919e3d

2 July 2024

  • cache reranked scores #341
  • extract RandomAccessWriter interface from BRAW #340
  • Release 3.0.0-beta.14 0619242
  • Start development on 3.0.0-beta.14-SNAPSHOT 13512e7
  • Override buildCompression for 2dgrid in Bench f352a2a

7 June 2024

  • Clear scratch structures if search terminates exceptionally #337
  • Reduce tendency of reconnectOrphanedNodes to leave orphaned nodes #335
  • Release 3.0.0-beta.13 9126f0e
  • throw an error if the user (me) is dumb and asks to write a Feature that doesn't exist b7da318
  • javadoc 0f139e3

29 May 2024

  • add writeHeader and getPath methods to OnDiskGraphIndexWriter 80ac365
  • update javadoc 5292005
  • Release 3.0.0-beta.12 3205bf9

28 May 2024

  • Ecapsulate NodeArray internals #328
  • Remove on-disk reranking #327
  • standardize ReaderSupplier implementations as inner classes of their respective RandomAccessReaders, and add a Supplier for SimpleReader #323
  • Migrate ConcurrentNeighborSet to ConcurrentNeighborMap + CNM.Neighbors 084eb50
  • Implement support for COSINE in fused ADC d8a2b49
  • More Neighbors memory savings: a53c92c

17 May 2024

  • Fix InlineVectorValues.size/LvqVectorValues.size #320
  • Release 3.0.0-beta.10 0d8c51c
  • Start development on 3.0.0-beta.10-SNAPSHOT c918354

13 May 2024

  • improve ramBytesUsed estimates 005e202
  • copy nodes during insertDiverse to keep array size within expected bounds 624e4e5
  • rename maxConnections -> maxDegree bfb4085

8 May 2024

  • Fix writing jvector2-compatible indexes incrementally #313
  • remove BQ centering 50c4fa8
  • test demonstrating that centering is harmful to bq since it changes the angles between vectors fa5c050
  • Release 3.0.0-beta.7 e24eb09

6 May 2024

  • add rerankK to GraphSearcher::search, and worstApproximateScoreInTopK to SearchResult 7e3ace9
  • cleanup 7d4cd51
  • Release 3.0.0-beta.6 39f98e7

3 May 2024

  • rewrite readme, step 1 411ec29
  • rewrite readme, step 2 3ed199c
  • add ability to write old versions of PQ and ODGI. current version standardized as 3 to avoid confusion a9b694b

2 May 2024

  • Switch Fused ADC from 32-cluster to 256-cluster PQ, maxDegree 32 graphs. Implement support in default/Panama SIMD, but these will degraded performance. Native support is required for performance improvement. 8cac3ad
  • turning SiftSmall into a tutorial 9bdf319
  • more tutorial code 4e09bf8

22 April 2024

  • OrdinalMapper #299
  • Add a memory-mapped RandomAccessReader using MemorySegment api #296
  • support building larger-than-memory datasets via writeInline incremental construction and new FeatureSource interface dc4e89d
  • extract InlineVectorValues, LvqVectorValues, CachingVectorValues ade8b6a
  • extract ByteBufferReader from SimpleMappedReader b70f2bc

22 April 2024

  • tweaks to make life easier for upgraders #290
  • Restructure BRAW to implement DataOutput directly to allow buffering across multiple write operations 48c0366
  • add VectorCompressor::encodeAll(RandomAccessVectorValues ravv) ce215db
  • add MapRandomAccessVectorValues 97e523c

12 April 2024

  • remove (broken) concurrency support from removeDeletedNodes #273
  • Reduce allocation by pooling GraphSearcher objects #270
  • track visited nodes using IntHashSet instead of different BitSets #269
  • make Test2DThreshold less fragile #266
  • Generify new abstracted ODGI, such that the caching hierarchy doesn't require casting back to view/cachednode types. Fix any leakage of return types outside of visibility scope. #262
  • Support building indexes using compressed vectors #244
  • Enable JDK 22 CI #258
  • Use on-heap MemorySegments for native vectors/sequences #246
  • Merge 3.0-alpha #256
  • Replace NormalDistributionTracker with TwoPhaseTracker #247
  • Anisotropic PQ #201
  • add a diverseBefore marker to avoid recomputing diversity that hasn't changed #242
  • only compute aMagnitude for cosines once, since it is independent of the query vector #220
  • Introduce VectorFloat/ByteSequence abstractions to allow alternative implementations of indexed vectors/auxiliary byte structures. Remove support for graph indexes over byte vectors. Introduce experimental native vectorization provider. Introduce experimental fused graphs. 52a395e
  • Introduce OnDiskGraphIndex feature abstraction to embed additional content in graph (e.g., fused ADC or LVQ). Add the ability to incrementally write inline content to a graph index. Caching index only implements caching of edges. 09f6850
  • Implement single-level 8-bit Turbo LVQ. This includes on-disk format and dot product support. f45239d

19 March 2024

  • Replace NormalDistributionTracker with TwoPhaseTracker #247
  • only compute aMagnitude for cosines once, since it is independent of the query vector #220
  • Release 3.0.0-alpha.7 a7b41f2
  • add missing join() to parallel reconnect code ef79c01
  • Start development on 3.0.0-alpha.7 5af058f

26 February 2024

  • replace optimistic locking with pessimistic to prevent size() inconsistencies #218
  • use euclidean similarity for centroid search if the centroid is zero and the index uses cosine similarity 733fac3
  • increase contention in testConcurrency to highlight the consistency issue between put, remove, and size b089be5
  • de-pessimize extractSubvectors fef6cf1

9 February 2024

  • Fix GraphIndexBuilder.reconnectOrphanedNodes #214
  • Release 3.0.0-alpha.5 751d30b
  • fix dataset url e4c314e
  • Start development on 3.0.0-alpha.5-SNAPSHOT 4bd7ad5

8 February 2024

  • Misc cleanup of KMPPClusterer after recent change. Expose MAX_PQ_TRAINING_SET_SIZE. 1b944ba
  • add getCompressor to CompressedVectors 335a24d
  • getCompressor returns specific instance type b4f6914

8 February 2024

  • Improve GraphIndexBuilder#cleanup doc for concurrent searches #188
  • remove PoolingSupport in favor of direct usage of ExplicitThreadLocal 240770c
  • add PQ.refine 926fc7f
  • cleanup VectorUtil: 78eb7c5

11 January 2024

  • Cleanup for 3.0 release #182
  • implement resume() 0b7ce1f
  • only rerank candidates whose approximate score is greater than rerankFloor (experimental) 6f8dcc1
  • parameter threshold no longer experimental, and clarify javdoc for search() 94c8d64

11 January 2024

  • Release 2.0.5 #171
  • Make GraphIndexBuilder.markNodeDeleted thread-safe #174
  • Fix DenseIntMap size a8422d6
  • Remove caching of vectors encountered during search, update reranker interface accordingly afa761c
  • remove BQ as a default option and move 2D grid to end of the run (run is now ordered ~most -> ~least interesting results) 4d18363

20 December 2023

  • Remove ThreadPooling self reference in Pooled object to prevent memory leak #169
  • PhysicalCoreExecutor can exit gracefully; CachingGraphIndex's cacheDistance can be customized #160
  • Add optional FJP args for indexing and quantization #162
  • Fix some bugs #156
  • Use fma in SIMD Euclidean/cosine #153
  • Fix usage of null acceptOrds in SiftSmall example #152
  • add FJP args to GraphIndexBuilder , ProductQuantization and BinaryQuantization 6dec317
  • Scrub both fvec/ivec and hdf5 dot product datasets 9c79fa4
  • rm code about FJP approach bb505c6

10 November 2023

  • vectorsEncountered not always in sync with resultQueue, causing NPE when breaking out of loop due to threshold probability #150
  • Run verify phase in CI (which includes license checks) #149
  • add e5-v2-base, e5-v2-large, and gecko datasets 43d0e35
  • integrate 2dgrid with Bench regex 733b74c
  • minimum nodes to start checking probability -> 300 0b93065

8 November 2023

  • add upgrade guide #147
  • Split LongHeap into Growable and Bounded flavors #146
  • Revert "cap initial neighborqueue size at 1024" 316f43f
  • cap initial neighborqueue size at 1024 3bccb44
  • Release 2.0.3 15c2516

7 November 2023

7 November 2023

  • add getOriginalSize and getCompressedSize to CompressedVectors interface #143
  • Cherry-pick various bench improvements from PR #76. #133
  • Cherry-pick various bench improvements from PR #76. Incorporate other miscellaneous Bench fixes/improvements. 6d614fa
  • Swap maybeDownloadFvecs to single arg, as we're now doing per-dataset downloads 1e2865c
  • Release 2.0.1 5dc06ec

6 November 2023

  • Fix running single test using Maven commandline #141
  • Reconnect orphaned nodes in cleanup() #138
  • Add binary quantization #135
  • Updated downloadhelper #134
  • downloads wikipedia fvec files for 100k, switched to squad based query vectors #130
  • Addresses issue #36 by adding license header checks. Added headers on… #129
  • Ipcexample #127
  • Deletes #117
  • CI improvement #131
  • Adds DenseIntMap for building graph with much less contention. back to zero dependency #128
  • reformat 393bfac
  • add minimum similarity threshold parameters to search bcbd65e
  • Addresses issue #36 by adding license header checks. Added headers on missing files, added exlsion list for files that dont necessarily need them 6ced572

31 October 2023

  • fix mergeNeighbors to not add duplicate nodes, and fix test to check for duplicates #119
  • Mt index build fixes #113
  • Fork test VM per core #111
  • Add improved test coverage for on-disk graph caching #109
  • Add pooling and rm thread locals 2c6c758
  • WIP d96ba05
  • Cleanup and docs 1cdb8b1

6 October 2023

  • add test (currently failing) that exercises decoded similarity functions 78b76fc
  • turn testEncodings into an assert 31ac3f3
  • Release 1.0.1 71e269a

6 October 2023

  • Fix SimpleMappedReader to respect offset #106
  • KMeansPlusPlusClusterer optimizations #100
  • Add simd approach for summing the cached PQ products of each encoded vector #104
  • README tweaks #101
  • Add simd approach for summing the cached PQ query factors of each encoded vector. Remove some more collections. Add memory size of PQ and compressed vectors 051e099
  • read graph neighbors all at once so that it is not possible to mangle state between invocations of nextInt() f08048b
  • rename maxEdgesPerNode -> maxDegree 672eb07

6 October 2023

  • Remove triangle inequality from k means plus plus #98
  • Clean up all build warnings related to multimodule versioning. #97
  • Fragment cache #94
  • wikipedia datasets in readme #95
  • Add decodedCosine fast path #91
  • Implement optimized decoded square distance #89
  • Fix code coverage in IntelliJ #88
  • Fix recall regression for centered PQ with non-dot product metrics #84
  • Refactor tests into jvector-tests module. Set up configurations to be able to run tests with JDK11 features and JDK20 features. #75
  • move sharing from annotation to method; use that in PQ #83
  • rename BFS_DISTANCE to CACHE_distance; fixes #96 #96
  • clean out unused desc=false mode from NeighborArray dcc3dad
  • cleanup 9755de3
  • add wikipedia dataset using SiftLoader 82a9697

6 October 2023

  • Minor documentation updates #77
  • make View extend AutoCloseable. Fixes #78 #78
  • update PQ to take RAVV. fixes #74 #74
  • Release 0.9.2. a023467
  • call validateEntryNode in Builder.complete ca6a9ed
  • Merge #78 and regularize exception handling in close() d2162aa

6 October 2023

  • Change package from com.github.jbellis to io.github.jbellis #71
  • Release 0.9.0 #69
  • Update README.md ccc08c5
  • Release 0.9.1 021e356
  • Update README.md 1fef413

0.9.0

6 October 2023

  • Build rework for release. Split examples out into separate project. #68
  • Build updates to prep for release #66
  • Javadoc fixes and produce javadoc as part of the build #64
  • Revert changes to hdf5 path in Bench, set up mvn exec to use parent basedir #65
  • Migrate to Maven multi-module project and produce multirelease jar #61
  • Swap Maven jdk11 build to use -release 11, which also checks for too-new APIs #59
  • Remove MacOS runner #58
  • Fix Windows tests #57
  • Switch to maven profiles, add gh workflow for tests and add package f… #55
  • Switch to ThreadLocal cached searcher in GraphIndexBuilder #56
  • Fix npe #53
  • Track numVisited when performing graph searches #51
  • Replace NBHM with CHM, remove NBHM dependency #49
  • Perf improvements #47
  • JVECTOR-31 #46
  • JVECTOR-8 #42
  • Fix offset math in testing MappedRandomAccessReader readFloatsAt #44
  • Issue 38 multiple jdk builds #40
  • JVECTOR-20 #41
  • Faster on-disk index flush in Bench #37
  • JVector 24: DiskANN #28
  • Integrate PQ to recall Bench #14
  • Add PanamaVectorSupport for Java 20+ #16
  • Added details for running both example classes via maven exec plugin #18
  • Integrate recall benchmark functionality #10
  • Add mvn wrapper #7
  • Initial clean up of pom.xml for packaging and release #5
  • fix dotProduct-with-offsets in Default provider. fixes #60 #60
  • apply alpha-diversity incrementally in enforceMaxConnLimit, like we do in insertDiverse. This significantly improves the quality of connections retained, vs just starting at the back with max alpha. Closes #33 #33
  • Created initial README.md #6
  • import hnsw + pq 9f5b9c6
  • tests wip 9370807
  • tests, and Sift works again (fixed GraphSearcher) 7dd96f5