Skip to content

Commit 22577e5

Browse files
authored
Small improvements to README.md files
1 parent 7ee2a8f commit 22577e5

File tree

2 files changed

+24
-27
lines changed

2 files changed

+24
-27
lines changed

README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ Please see the following blog posts for background:
1111
* [Why TileDB as a Vector Database](https://tiledb.com/blog/why-tiledb-as-a-vector-database)
1212
* [TileDB Vector Search 101](https://tiledb.com/blog/tiledb-101-vector-search/)
1313

14-
We are actively working on LangChain integration, with others to come soon:
15-
* https://github.com/TileDB-Inc/langchain/pull/1 (WIP)
14+
We have released a [LangChain integration](https://python.langchain.com/docs/integrations/vectorstores/tiledb), with others to come soon.
1615

1716
# Quick Links
1817

@@ -22,7 +21,7 @@ We are actively working on LangChain integration, with others to come soon:
2221

2322
# Quick Installation
2423

25-
Pre-built packages are available from PyPI using pip:
24+
Pre-built packages are available from [PyPI](https://pypi.org/project/tiledb-vector-search) using pip:
2625

2726
```
2827
pip install tiledb-vector-search
@@ -37,11 +36,11 @@ conda install -c tiledb -c conda-forge tiledb-vector-search
3736

3837
# Contributing
3938

40-
We welcome contributions. Please see [`Building`](Building.md) for
39+
We welcome contributions. Please see [`Building`](./documentation/Building.md) for
4140
development-build instructions. For large new
4241
features, please open an issue to discuss goals and approach in order
4342
to ensure a smooth PR integration and review process. All contributions
44-
must be licensed under the repository's [MIT License](../LICENSE).
43+
must be licensed under the repository's [MIT License](./LICENSE).
4544

4645
# Testing
4746

src/src/README.md

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -106,14 +106,13 @@ The full set of usage options for `ivf_flat` are the following:
106106
ivf_flat: demo CLI program for performing feature vector search with kmeans index.
107107
Usage:
108108
ivf_flat (-h | --help)
109-
ivf_flat --db_uri URI --centroids_uri URI (--index_uri URI | --sizes_uri URI)
110-
--parts_uri URI --ids_uri URI --query_uri URI [--groundtruth_uri URI] [--output_uri URI]
111-
[--k NN][--nprobe NN] [--nqueries NN] [--alg ALGO] [--finite] [--blocksize NN] [--nth]
112-
[--nthreads NN] [--region REGION] [--log FILE] [-d] [-v]
109+
ivf_flat --centroids_uri URI --parts_uri URI (--index_uri URI | --sizes_uri URI)
110+
--ids_uri URI --query_uri URI [--groundtruth_uri URI] [--output_uri URI]
111+
[--k NN][--nprobe NN] [--nqueries NN] [--alg ALGO] [--infinite] [--finite] [--blocksize NN]
112+
[--nthreads NN] [--ppt NN] [--vpt NN] [--nodes NN] [--region REGION] [--stats] [--log FILE] [-d] [-v]
113113
114114
Options:
115115
-h, --help show this screen
116-
--db_uri URI database URI with feature vectors
117116
--centroids_uri URI URI with centroid vectors
118117
--index_uri URI URI with the paritioning index
119118
--sizes_uri URI URI with the parition sizes
@@ -129,10 +128,13 @@ Options:
129128
--infinite Load the entire array into RAM for the search [default: false]
130129
--finite For backward compatibility, load only required partitions into memory [default: true]
131130
--blocksize NN number of vectors to process in an out of core block (0 = all) [default: 0]
132-
--nth use nth_element for top k [default: false]
133-
--nthreads NN number of threads to use (0 = all) [default: 0]
131+
--nthreads NN number of threads to use (0 = hardware concurrency) [default: 0]
132+
--ppt NN minimum number of partitions to assign to a thread (0 = no min) [default: 0]
133+
--vpt NN minimum number of vectors to assign to a thread (0 = no min) [default: 0]
134+
--nodes NN number of nodes to use for (emulated) distributed query [default: 1]
134135
--region REGION AWS S3 region [default: us-east-1]
135136
--log FILE log info to FILE (- for stdout)
137+
--stats log TileDB stats [default: false]
136138
-d, --debug run in debug mode [default: false]
137139
-v, --verbose run in verbose mode [default: false]
138140
```
@@ -152,20 +154,18 @@ The inverted file index consists of data stored in multiple TileDB arrays, which
152154

153155
The user can also optionally specify
154156
* An array containing ground truth vectors (`--groundtruth_uri`), i.e., the nearest-neighbors that would be returned from an exact (`flat L2)` search and/or
155-
* An array for saving the results of the query.
157+
* An array for saving the results of the query. (`--output_uri`).
156158

157159
Example
158160
```txt
159161
ivf_flat \
160-
--db_uri s3://tiledb-nikos/vector-search/datasets/arrays/sift-1b-col-major \
161162
--centroids_uri s3://tiledb-nikos/vector-search/andrew/sift-base-1b-10000p/centroids.tdb \
162163
--parts_uri s3://tiledb-nikos/vector-search/andrew/sift-base-1b-10000p/parts.tdb \
163164
--index_uri s3://tiledb-nikos/vector-search/andrew/sift-base-1b-10000p/index.tdb \
164-
--sizes_uri s3://tiledb-nikos/vector-search/andrew/sift-base-1b-10000p/index_size.tdb \
165165
--ids_uri s3://tiledb-nikos/vector-search/andrew/sift-base-1b-10000p/ids.tdb \
166166
--query_uri s3://tiledb-andrew/kmeans/benchmark/query_public_10k \
167167
--groundtruth_uri s3://tiledb-andrew/kmeans/benchmark/bigann_1B_GT_nnids \
168-
--output_array file://vector_search/results/output.tdb
168+
--output_uri file://vector_search/results/output.tdb
169169
```
170170

171171
#### Search Options
@@ -178,8 +178,6 @@ The default is to use all the queries in the query array, which can also be spec
178178
* Which search algorithm in the C++ library to use for performing the search (`--algo`). It is recommended to use the default (other algorithms are currently WIP).
179179
* Whether to load the entire partitioned array into memory when performing the search or (if the `--infinite` option is given) whether to load only the necesary partitions, given the specified query. It is recommended to generally use the default value except in the case of large values of `nqueries` and `nprobe` and the availability of sufficient RAM to hold the entire partitioned array. (For backward compatibility, there is also a `--finite` flag which had the complementary behavior to `--infinite`). If `--blocksize` is specified with the finite-memory option, `ivf_flat` also operate in out-of-core fashion, loading subsets of partitions into memory, in the order they appear in the partitioned vector array.
180180
* An upper bound to the number of vectors to be loaded during each batch when using the finite-memory case. `ivf_flat` will load complete partitions on each out-of-core iteration, so the number of vectors loaded will generally be fewer than the specified upper bound. Similarly, the specified upper bound must be larger than the largest partition in the partitioned array. Out of core operation is necessary if available RAM cannot hold all the index data (in general due to the size of the vector data to be searched). Even if available memory can accommodate the entire partitioned array, out of core operation can be useful for making more efficient use of hierarchical memory.
181-
* Whether to use the `nth_element` C++ standard library algorithm for ranking top-k vectors (`--nth`). The default value is `false` and the default should always be used This option was used for performance experiments and should be considered deprecated.
182-
* How many threads to use when executing the parallelized sections of the search (`--nthreads`). The default is `std::thread::hardware_concurrency`, i.e., the number of available cores. In general the default value should be used.
183181
* The AWS region to use when accessing TileDB arrays stored in S3 (`--region`). The example array URIs provided with TileDB-Vector-Search are located in the `us-east-1` region, which is the default value.
184182
* The name of a file to write logging information to (`--log`). The default is nil, meaning no logs will be written. If the value `-` is specified, the output will be written to `std::cout`.
185183
* Whether to run in debug mode (`-d` or `--debug`). This will print copious information that is useful only to the library developers. End users should always use the default.
@@ -195,8 +193,8 @@ Example:
195193
--ids_uri s3://tiledb-nikos/vector-search/andrew/sift-base-1b-10000p/ids.tdb \
196194
--query_uri s3://tiledb-andrew/kmeans/benchmark/query_public_10k \
197195
--groundtruth_uri s3://tiledb-andrew/kmeans/benchmark/bigann_1B_GT_nnids \
198-
--output_array file://vector_search/results/output.tdb \
199-
--blocksize 1000000 --nqueries 1000 --nprobe 128 --log - -v
196+
--output_uri file://vector_search/results/output.tdb \
197+
--blocksize 1000000 --nqueries 1000 --nprobe 128 --log -v
200198
```
201199
Since there are a large number of options, particular the long set of of array URIs, it is recommended that
202200
you use the setup scripts in the `src/benchmarks` subdirectory. The setup script defines bash functions that
@@ -222,7 +220,7 @@ If the `--kmeans` flag is specified, `index` will generate a centroids array usi
222220

223221
The options used by `index` are
224222
* The name of the database to be indexed (`--db_uri`)
225-
* The name of the centroids array to be written, if `--kmeans` is specified (`--out_centroids_uri`)
223+
* The name of the centroids array to be written, if `--kmeans` is specified (`--centroids_uri`)
226224
* The name of the centroids array to be used for indexing if `--kmeans` is not specified (`--centroids`)
227225
* The name of the array of vectors specified by `--db_uri`, partitioned according to the generated (or provided) centroids
228226
* The name of the index array to be written (`--index_uri`)
@@ -235,10 +233,10 @@ Example:
235233
```
236234
ivf_index --kmeans \
237235
--db_uri s3://tiledb-lums/sift/sift_base \
238-
--id_uri s3://tiledb-lums/kmeans/ivf_flat/ids \
236+
--ids_uri s3://tiledb-lums/kmeans/ivf_flat/ids \
239237
--index_uri s3://tiledb-lums/kmeans/ivf_flat/index \
240238
--part_uri s3://tiledb-lums/kmeans/ivf_flat/parts \
241-
--out_centroids_uri s3://tiledb-lums/kmeans/ivf_flat/centroids
239+
--centroids_uri s3://tiledb-lums/kmeans/ivf_flat/centroids
242240
243241
```
244242

@@ -268,8 +266,8 @@ program will check its results against the given set of ground truth vectors.
268266
Usage:
269267
flat_l2 (-h | --help)
270268
flat_l2 --db_uri URI --query_uri URI [--groundtruth_uri URI] [--output_uri URI]
271-
[--k NN] [--nqueries NN] [--alg ALGO] [--finite] [--blocksize NN] [--nth]
272-
[--nthreads N] [--region REGION] [--log FILE] [-d] [-v]
269+
[--k NN] [--nqueries NN] [--alg ALGO] [--finite] [--blocksize NN]
270+
[--nthreads N] [--region REGION] [--log FILE] [--stats] [-d] [-v]
273271
274272
Options:
275273
-h, --help show this screen
@@ -282,10 +280,10 @@ program will check its results against the given set of ground truth vectors.
282280
--alg ALGO which algorithm to use for comparisons [default: vq_heap]
283281
--finite use finite RAM (out of core) algorithm [default: false]
284282
--blocksize NN number of vectors to process in an out of core block (0 = all) [default: 0]
285-
--nth use nth_element for top k [default: false]
286283
--nthreads N number of threads to use in parallel loops (0 = all) [default: 0]
287284
--region REGION AWS region [default: us-east-1]
288285
--log FILE log info to FILE (- for stdout)
286+
--stats log TileDB stats [default: false]
289287
-d, --debug run in debug mode [default: false]
290288
-v, --verbose run in verbose mode [default: false]
291289
```
@@ -309,7 +307,7 @@ Example:
309307
--db_uri s3://tiledb-nikos/vector-search/datasets/arrays/sift-1b-col-major \
310308
--query_uri s3://tiledb-andrew/kmeans/benchmark/query_public_10k \
311309
--groundtruth_uri s3://tiledb-andrew/kmeans/benchmark/bigann_1B_GT_nnids \
312-
--output_array file://vector_search/results/output.tdb \
310+
--output_uri file://vector_search/results/output.tdb \
313311
--blocksize 1000000 --nqueries 1000 --nprobe 128 --log - -v
314312
```
315313
As with `ivf_flat`, it is recommended that you run `flat_l2` using the setup scripts in `src/benchmarks` (or that you use your own scripts).

0 commit comments

Comments
 (0)