Releases: TileDB-Inc/TileDB-Vector-Search
Releases · TileDB-Inc/TileDB-Vector-Search
0.8.1
What's Changed
- Update local-benchmarks script to save results to a new directory during each run by @jparismorgan in #464
- [automated] Update backwards-compatibility-data for release 0.8.0 by @github-actions in #467
- Fix C++ warnings across codebase by @jparismorgan in #461
- Enable more benchmarks in ann-benchmarks script by @jparismorgan in #470
- Optimize distance computations for IVFPQ by @NikolaosPapailiou in #468
- Fix C++ incorrect or ambiguous primitive types by @jparismorgan in #455
- Add C++ sanitizers by @jparismorgan in #471
- Use single shared seed for rng generation in C++ by @jparismorgan in #469
- Make logging utils thread-safe by @jparismorgan in #474
- Add nightly CI which runs sanitizers by @jparismorgan in #476
- Add new test and cleanup C++ code by @jparismorgan in #475
- Fix
counting_sum_of_squares_distancethread sanitizer error by @jparismorgan in #472 - Parallelize Vamana query by @jparismorgan in #463
- Fix address sanitizer error when storing metadata in destructor by @jparismorgan in #473
- Fix mutex lock error in logging utils by @jparismorgan in #478
- Add option to ann-benchmarks.py to skip benchmarks and leave instance running by @jparismorgan in #477
- Add dump() methods to logging classes by @jparismorgan in #479
- Add more testing for Vamana helpers, add option to skip top_k to improve ingestion speed by @jparismorgan in #481
- Avoid copies in query code by @jparismorgan in #483
- Add Vamana storage spec by @jparismorgan in #484
- Add more comments about IVF parameters
partitionsandnprobeby @jparismorgan in #487 - IVF_FLAT Distance metric by @cainamisir in #451
- Update matrix to take in a
std::vectorinstead ofstd::initializer_listby @jparismorgan in #492 - Fix flaky IVF_PQ Python test and add more C++ testing around pq-encoding by @jparismorgan in #494
- Fix infinite loop in kmeans by @jparismorgan in #491
- Distance metric integration for vamana, and refactoring of distance metrics by @cainamisir in #460
- Add out-of-core query() support to IVF PQ by @jparismorgan in #485
- IVF_PQ: Remove unused pq_ivf_centroids, remove extra call to train_ivf(), add comments, cleanup code by @jparismorgan in #489
- Refactor logging helpers by @jparismorgan in #497
- Name each benchmark within
local-benchmarks.pyby @jparismorgan in #498 - Add scGPT and scvi embeddings and improve SOMA reader by @NikolaosPapailiou in #501
- Vlad/l2 sumofsquares by @cainamisir in #486
- Update ann-benchmarks.py to connect to running instance by @jparismorgan in #500
- Fix some TODOs after the cloud release by @NikolaosPapailiou in #505
- Various small tdb matrix cleanups by @jparismorgan in #510
- Distance metric small fixes: uninitialized value in IVF_PQ and pass setting during
consolidate_updates()by @jparismorgan in #508 - Add
fixed_min_triplet_heapby @jparismorgan in #507 - Update IVF_PQ array names by @jparismorgan in #511
- Check finite and infinite IVF_PQ queries return the same ids and distances, fix
count_intersections()to not modify inputs, updateread_index_finite()to return data by @jparismorgan in #509 - Add new plot to local-benchmarks.py showing results from all indexes by @jparismorgan in #512
- DirectoryReader: set text/plain as default mime type if it is not found by @NikolaosPapailiou in #513
- Add option to run local-benchmarks.py with index at a tiledb uri by @jparismorgan in #514
- IVF_PQ re-ranking by @jparismorgan in #502
- Support AWS index URI in
local-benchmarks.pyby @jparismorgan in #516 - Add
k_factortolocal-benchmarks.pyby @jparismorgan in #517
Full Changelog: 0.8.0...0.8.1
0.8.0
What's Changed
- Remove use of
set_coords_filter_listfrom dense array creation by @jparismorgan in #439 - [automated] Update backwards-compatibility-data for release 0.7.0 by @github-actions in #438
- Tune default ingestion configuration to avoid OOM errors by @NikolaosPapailiou in #440
- Add ids to Python
FeatureVectorArrayby @jparismorgan in #442 - Add
Optionalto Python code that was missing it by @jparismorgan in #443 - For type-erased Python indexes, 1) Don't consolidate parts and ids Arrays 2) Avoid extra Schema open in constructor by @jparismorgan in #444
- Cleanups to
ivf_index()C++ code, and small cleanups in Python by @jparismorgan in #430 - Support IVF PQ consolidation by storing raw feature vectors and external IDs by @jparismorgan in #447
tdbPartitionedMatrixwill automatically close Array's when done reading by @jparismorgan in #448- Re-enable IVF PQ tests by @jparismorgan in #450
- Save kmeans settings to IVF PQ metadata by @jparismorgan in #452
- Allow setting IVF PQ partitions when re-ingesting, fix IVF PQ object index tests by @jparismorgan in #453
- Avoid creating one temp array for each ingestion work item by @NikolaosPapailiou in #449
- Add Vector Search storage format spec by @NikolaosPapailiou in #456
- Fix markdown format for storage spec by @NikolaosPapailiou in #457
- Update dimensions to be uint64_t in C++ by @jparismorgan in #454
- Configure memory budget for distributed OOC queries by @NikolaosPapailiou in #462
- Add local benchmarking script by @jparismorgan in #459
- Distance metrics integration by @cainamisir in #422
- Close tdbMatrix and tdbMatrixWithIds Array's when we have nothing left to to read by @jparismorgan in #466
- Update to TileDB Core 2.25.0 by @jparismorgan in #465
Full Changelog: 0.7.0...0.8.0
0.7.0
What's Changed
- [automated] Update backwards-compatibility-data for release 0.6.0 by @github-actions in #432
- Remove
apis/python/requirements-py.txtby @jparismorgan in #433 - Fix bug where we did not set compression filters when creating TileDB Array's in C++ by @jparismorgan in #436
- Update to TileDB Core 2.24.2 by @jparismorgan in #437
Full Changelog: 0.6.0...0.7.0
0.6.0
What's Changed
- Pin numpy to fix Python CI failures by @NikolaosPapailiou in #419
- Enable OOC processing for IVF_FLAT distributed query execution by @NikolaosPapailiou in #418
- Cleanup IVF PQ C++ index code by @jparismorgan in #421
- Add debug info and remove stripping by @dudoslav in #424
- Fix type-erased indexes writing fragments at timestamp=0, thus fixing IVF PQ time travel by @jparismorgan in #425
- Improve benchmark script - add other vector search libraries and download full results by @jparismorgan in #415
- Remove
b_backtrackfrom Vamana index by @jparismorgan in #428 - Expose Vamana graph building params by @jparismorgan in #423
- Update to TileDB Core 2.24.1 by @jparismorgan in #431
Full Changelog: 0.5.1...0.6.0
0.5.1
What's Changed
- Add script to create an EC2 instance and run ann-benchmarks by @jparismorgan in #412
- [automated] Update backwards-compatibility-data for release 0.5.0 by @github-actions in #414
- Replace
asserts with exceptions in C++ code by @jparismorgan in #410 - Add debug symbols in release build by @NikolaosPapailiou in #416
- Fix crash in tdb_matrix_with_ids when there was an empty partition by @jparismorgan in #389
Full Changelog: 0.5.0...0.5.1
0.5.0
What's Changed
- Fix CLion UI issue with long test names, cleanup C++ unit tests by @jparismorgan in #392
- Add missing
temporal_policytoopen_array()calls by @jparismorgan in #393 - Fix Vamana index not setting metadata when loaded by URI by @jparismorgan in #394
- Native code refactors and cleanups by @jparismorgan in #395
- Share max int / float values and index lists in Python by @jparismorgan in #396
- Update l2 distance to support
int8_tby @jparismorgan in #400 - Update
feature_vector_array_with_idsto requireid()accessor by @jparismorgan in #401 - Update
add()for IVF Flat and IVF PQ to take in IDs by @jparismorgan in #402 - Add query with driver implementation by @NikolaosPapailiou in #398
- Initial work to get IVF PQ working like Vamana does by @jparismorgan in #390
- Add Python IVF PQ Index by @jparismorgan in #404
- Add option to install libtiledb.so by @dudoslav in #361
- Fix bug with
adjacency_row_index_uri()array type being incorrect, removeadjacency_row_index_typefrom API by @jparismorgan in #405 - Cleanup IVF PQ test and add better logging on failure by @jparismorgan in #406
- Cleanup tdb_partitioned_matrix - move state to local variables and remove unused code by @jparismorgan in #408
- Add Vamana backwards compatibility data generation and tests by @jparismorgan in #409
- Pin mdspan version by @NikolaosPapailiou in #413
- Rename
opt_ltol_searchby @jparismorgan in #411 - Update TileDB core 2.24.0 by @NikolaosPapailiou in #399
- Initial scaling strategy by @cainamisir in #407
New Contributors
- @cainamisir made their first contribution in #407
Full Changelog: 0.4.1...0.5.0
0.4.1
What's Changed
- Update
matrix_with_idsto hold aunique_ptrwith an ids array instead of astd::vectorby @jparismorgan in #364 - [automated] Update backwards-compatibility-data for release 0.4.0 by @github-actions in #370
- Add clear_history() to C++, refactor group to allow static function to open a group by @jparismorgan in #371
- Use multiple workers for Python test execution by @NikolaosPapailiou in #367
- Add
clear_history()implementation and Python API by @jparismorgan in #372 - Optimized graph algorithms for vamana by @lums658 in #282
- Test pytest parallelism by @jparismorgan in #377
- Add
TemporalPolicyto Python, usestd::optionalinstead oftimestamp_end = 0to mark not set by @jparismorgan in #360 - Speed up
test_ingestion.pytests by running on less data, mix up dimensions size more between tests by @jparismorgan in #376 - Rename
dimensiontodimensionsin C++ code to match Python by @jparismorgan in #374 - Enable C++ Vamana test when there are no vectors in index so we should return defaults by @jparismorgan in #379
- Update Vamana metadata keys to use full names of values by @jparismorgan in #378
- Simplify and improve matrix debug helpers by @jparismorgan in #380
- Hide logs on noisy C++ tests by @jparismorgan in #387
- Restructure Python Documentation by @NikolaosPapailiou in #382
- Improve pytest latency by @NikolaosPapailiou in #383
- Rename
ivf_flat_index_grouptoivf_flat_groupby @jparismorgan in #385 - Add initial Vamana docstrings to Python by @jparismorgan in #386
- Add partioned_matrix and tdb_partitioned_matrix unit tests & improve error handling by @jparismorgan in #381
- Lums/sc 42599/implement ivf pq index by @lums658 in #279
- Expose
build_config()in Python by @jparismorgan in #388
Full Changelog: 0.4.0...0.4.1
0.4.0
What's Changed
- [automated] Update backwards-compatibility-data for release 0.3.0 by @github-actions in #336
- Fix tiledb direct URIs in C++ group's by @jparismorgan in #332
- Remove
std::reference_wrapperaroundtiledb::Contextby @jparismorgan in #340 - Run Python ingestion tests on all indexes by @jparismorgan in #338
- Add support for int8 type indexes by @NikolaosPapailiou in #342
- Add support for
ingest()withindex_timestampfor type-erased indexes by @jparismorgan in #341 - Fix bug with relative Vamana indexes in Python by @jparismorgan in #344
- Remove unused code previously used to define arrays for testing by @jparismorgan in #346
- Update type-erased code to keep Ctx alive when calling into C++ to fix Vamana ingestion crash by @jparismorgan in #347
- Fix
Index.delete_index()so that it uses recursive=True and deletes the full index by @jparismorgan in #348 - Do not create default
tiledb::Configobjects in C++ by @jparismorgan in #350 - Pass namespace correctly form ObjectAPI embeddings ingestion to vector ingestion by @NikolaosPapailiou in #352
- Fix pickle error
tiledb.FilterListduringBATCHingestion for Vamana by @jparismorgan in #349 - Fix BATCH execution error by @NikolaosPapailiou in #351
- Fix non-contiguous queries when using Vamana by @jparismorgan in #343
- Update
FeatureVectorArrayto be able to read from URIs at a specified timestamp by @jparismorgan in #353 - Add some more instructions on how to build and setup CLion by @jparismorgan in #337
- Add int8 support to Vamana type-erased index by @jparismorgan in #355
- Remove unneeded clang-format CI job by @jparismorgan in #345
- Vamana type-erased index can be opened at a specific timestamp by @jparismorgan in #354
- Use unique path for ingestion
temp_datagroup by @NikolaosPapailiou in #357 - Check releases.csv hash after download by @dudoslav in #339
- Update C++ code to use TemporalPolicy instead of a timestamp by @jparismorgan in #356
- Expose C++ TemporalPolicy in Python by @jparismorgan in #362
- Add more C++ and Python temporal policy unit tests by @jparismorgan in #359
- Refactor C++ group code and add more Vamana unit tests by @jparismorgan in #363
- Fix bug where in Python where if an index was loaded before the earliest ingested data, we'd still load and query data from the future by @jparismorgan in #365
- Support Vamana in the ObjectIndex by @jparismorgan in #366
- Fix bug in tdb matrix where we would read even when the array is empty and get nan values by @jparismorgan in #368
- Update TileDB core version by @NikolaosPapailiou in #369
Full Changelog: 0.3.0...0.4.0
0.3.0
What's Changed
- Fix type-erased Vamana
query()by @jparismorgan in #317 - Add support for external ids with Vamana type-erased by @jparismorgan in #311
- [automated] Update backwards-compatibility-data for release 0.2.2 by @github-actions in #318
- Remove
id_typeandadjacency_row_index_typein Vamana type-erased index by @jparismorgan in #320 - Remove filter for query results with id = 0 and distance = 0 by @jparismorgan in #319
- Code cleanup: rename l_build_ & r_max_degree & b_backtrack, remove TODOs, add CLion build instructions by @jparismorgan in #323
- Add needs dependency on backwards_compatibility_data by @dudoslav in #322
- Update
generate_data.pyto return whether we should upload the generated indexes by @jparismorgan in #324 - Remove
overwritefromwrite_index()by @jparismorgan in #325 - Remove temp data in consolidate_and_vacuum by @NikolaosPapailiou in #321
- Only pass TILEDB_REST_TOKEN to test_cloud during CI by @NikolaosPapailiou in #327
- Support writing C++ type-erased indexes with a storage_version by @jparismorgan in #326
- Fix TileDB URIs with type-erased indexes, add Vamana test to test_cloud.py by @jparismorgan in #330
- Fix
consolidate_updateserror fortiledb://URIs. by @NikolaosPapailiou in #329 - Add option to import OpenAIEmbeddings from langchain_openai by @NikolaosPapailiou in #328
- Cleanup C++ group code by @jparismorgan in #331
- Refactor and cleanup
index_group.hby @jparismorgan in #333 - Add metadata tests for type-erased indexes in Python by @jparismorgan in #334
- Increment TileDB core version to 2.22.0 by @jparismorgan in #335
Full Changelog: 0.2.2...0.3.0
0.2.2
What's Changed
- Fix local ingestion by passing a dummy default namespace by @NikolaosPapailiou in #316
Full Changelog: 0.2.1...0.2.2