Skip to content

Commit cdd2a1e

Browse files
towfeeqfayaz11Fayaz, Towfeeq
andauthored
fixing README.md file to update pip install command (#622)
Co-authored-by: Fayaz, Towfeeq <[email protected]>
1 parent 6fa46c8 commit cdd2a1e

File tree

1 file changed

+45
-45
lines changed

1 file changed

+45
-45
lines changed

README.md

Lines changed: 45 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ To add more relevance and practicality, we provide cost-effectiveness reports pa
1111

1212
Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as [SIFT](http://corpus-texmex.irisa.fr/), [GIST](http://corpus-texmex.irisa.fr/), [Cohere](https://huggingface.co/datasets/Cohere/wikipedia-22-12/tree/main/en), and a dataset generated by OpenAI from an opensource [raw dataset](https://huggingface.co/datasets/allenai/c4). It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
1313

14-
Prepare to delve into the world of VDBBench, and let it guide you in uncovering your perfect vector database match.
14+
Prepare to delve into the world of VDBBench, and let it guide you in uncovering your perfect vector database match.
1515

1616
VDBBench is sponsored by Zilliz,the leading opensource vectorDB company behind Milvus. Choose smarter with VDBBench - start your free test on [zilliz cloud](https://zilliz.com/) today!
1717

@@ -30,35 +30,35 @@ pip install vectordb-bench
3030
**Install all database clients**
3131

3232
``` shell
33-
pip install vectordb-bench[all]
33+
pip install 'vectordb-bench[all]'
3434
```
3535
**Install the specific database client**
3636

3737
```shell
38-
pip install vectordb-bench[pinecone]
38+
pip install 'vectordb-bench[pinecone]'
3939
```
4040
All the database client supported
4141

4242
| Optional database client | install command |
4343
|--------------------------|---------------------------------------------|
4444
| pymilvus, zilliz_cloud (*default*) | `pip install vectordb-bench` |
45-
| all (*clients requirements might be conflict with each other*) | `pip install vectordb-bench[all]` |
46-
| qdrant | `pip install vectordb-bench[qdrant]` |
47-
| pinecone | `pip install vectordb-bench[pinecone]` |
48-
| weaviate | `pip install vectordb-bench[weaviate]` |
49-
| elastic, aliyun_elasticsearch| `pip install vectordb-bench[elastic]` |
50-
| pgvector, pgvectorscale, pgdiskann, alloydb | `pip install vectordb-bench[pgvector]` |
51-
| pgvecto.rs | `pip install vectordb-bench[pgvecto_rs]` |
52-
| redis | `pip install vectordb-bench[redis]` |
53-
| memorydb | `pip install vectordb-bench[memorydb]` |
54-
| chromadb | `pip install vectordb-bench[chromadb]` |
55-
| awsopensearch | `pip install vectordb-bench[opensearch]` |
56-
| aliyun_opensearch | `pip install vectordb-bench[aliyun_opensearch]` |
57-
| mongodb | `pip install vectordb-bench[mongodb]` |
58-
| tidb | `pip install vectordb-bench[tidb]` |
59-
| vespa | `pip install vectordb-bench[vespa]` |
60-
| oceanbase | `pip install vectordb-bench[oceanbase]` |
61-
| hologres | `pip install vectordb-bench[hologres]` |
45+
| all (*clients requirements might be conflict with each other*) | `pip install 'vectordb-bench[all]'` |
46+
| qdrant | `pip install 'vectordb-bench[qdrant]'` |
47+
| pinecone | `pip install 'vectordb-bench[pinecone]'` |
48+
| weaviate | `pip install 'vectordb-bench[weaviate]'` |
49+
| elastic, aliyun_elasticsearch| `pip install 'vectordb-bench[elastic]'` |
50+
| pgvector, pgvectorscale, pgdiskann, alloydb | `pip install 'vectordb-bench[pgvector]'` |
51+
| pgvecto.rs | `pip install 'vectordb-bench[pgvecto_rs]'` |
52+
| redis | `pip install 'vectordb-bench[redis]'` |
53+
| memorydb | `pip install 'vectordb-bench[memorydb]'` |
54+
| chromadb | `pip install 'vectordb-bench[chromadb]'` |
55+
| awsopensearch | `pip install 'vectordb-bench[opensearch]'` |
56+
| aliyun_opensearch | `pip install 'vectordb-bench[aliyun_opensearch]'` |
57+
| mongodb | `pip install 'vectordb-bench[mongodb]'` |
58+
| tidb | `pip install 'vectordb-bench[tidb]'` |
59+
| vespa | `pip install 'vectordb-bench[vespa]'` |
60+
| oceanbase | `pip install 'vectordb-bench[oceanbase]'` |
61+
| hologres | `pip install 'vectordb-bench[hologres]'` |
6262

6363
### Run
6464

@@ -190,7 +190,7 @@ Options:
190190
--number-of-shards INTEGER Number of primary shards for the index
191191
--number-of-replicas INTEGER Number of replica copies for each primary
192192
shard
193-
# Indexing Performance
193+
# Indexing Performance
194194
--index-thread-qty INTEGER Thread count for native engine indexing
195195
--index-thread-qty-during-force-merge INTEGER
196196
Thread count during force merge operations
@@ -206,7 +206,7 @@ Options:
206206
--engine TEXT type of engine to use valid values [faiss, lucene, s3vector]
207207
# Memory Management
208208
--cb-threshold TEXT k-NN Memory circuit breaker threshold
209-
209+
210210
# Quantization Type
211211
--quantization-type TEXT which type of quantization to use valid values [fp32, fp16]
212212
--help Show this message and exit.
@@ -282,7 +282,7 @@ Options:
282282

283283
It is recommended to use the following code for installation.
284284
```shell
285-
pip install vectordb-bench[hologres] "psycopg[binary]" pgvector
285+
pip install 'vectordb-bench[hologres]' 'psycopg[binary]' pgvector
286286
```
287287

288288
Execute tests for the index types: HGraph.
@@ -319,8 +319,8 @@ Options:
319319

320320
The vectordbbench command can optionally read some or all the options from a yaml formatted configuration file.
321321

322-
By default, configuration files are expected to be in vectordb_bench/config-files/, this can be overridden by setting
323-
the environment variable CONFIG_LOCAL_DIR or by passing the full path to the file.
322+
By default, configuration files are expected to be in vectordb_bench/config-files/, this can be overridden by setting
323+
the environment variable CONFIG_LOCAL_DIR or by passing the full path to the file.
324324

325325
The required format is:
326326
```yaml
@@ -349,16 +349,16 @@ milvushnsw:
349349
drop_old: False
350350
load: False
351351
```
352-
> Notes:
352+
> Notes:
353353
> - Options passed on the command line will override the configuration file*
354354
> - Parameter names use an _ not -
355355
356356
#### Using a batch configuration file.
357357
358358
The vectordbbench command can read a batch configuration file to run all the test cases in the yaml formatted configuration file.
359359
360-
By default, configuration files are expected to be in vectordb_bench/config-files/, this can be overridden by setting
361-
the environment variable CONFIG_LOCAL_DIR or by passing the full path to the file.
360+
By default, configuration files are expected to be in vectordb_bench/config-files/, this can be overridden by setting
361+
the environment variable CONFIG_LOCAL_DIR or by passing the full path to the file.
362362
363363
The required format is:
364364
```yaml
@@ -387,7 +387,7 @@ milvushnsw:
387387
drop_old: False
388388
load: False
389389
```
390-
> Notes:
390+
> Notes:
391391
> - Options can only be passed through configuration files
392392
> - Parameter names use an _ not -
393393
@@ -402,11 +402,11 @@ To facilitate the presentation of test results and provide a comprehensive perfo
402402

403403
### Scoring Rules
404404

405-
1. For each case, select a base value and score each system based on relative values.
406-
- For QPS and QP$, we use the highest value as the reference, denoted as `base_QPS` or `base_QP$`, and the score of each system is `(QPS/base_QPS) * 100` or `(QP$/base_QP$) * 100`.
407-
- For Latency, we use the lowest value as the reference, that is, `base_Latency`, and the score of each system is `(base_Latency + 10ms)/(Latency + 10ms) * 100`.
405+
1. For each case, select a base value and score each system based on relative values.
406+
- For QPS and QP$, we use the highest value as the reference, denoted as `base_QPS` or `base_QP$`, and the score of each system is `(QPS/base_QPS) * 100` or `(QP$/base_QP$) * 100`.
407+
- For Latency, we use the lowest value as the reference, that is, `base_Latency`, and the score of each system is `(base_Latency + 10ms)/(Latency + 10ms) * 100`.
408408

409-
We want to give equal weight to different cases, and not let a case with high absolute result values become the sole reason for the overall scoring. Therefore, when scoring different systems in each case, we need to use relative values.
409+
We want to give equal weight to different cases, and not let a case with high absolute result values become the sole reason for the overall scoring. Therefore, when scoring different systems in each case, we need to use relative values.
410410

411411
Also, for Latency, we add 10ms to the numerator and denominator to ensure that if every system performs particularly well in a case, its advantage will not be infinitely magnified when latency tends to 0.
412412

@@ -467,7 +467,7 @@ All standard benchmark results are generated by a client running on an 8 core, 3
467467
1. Initially, you select the systems to be tested - multiple selections are allowed. Once selected, corresponding forms will pop up to gather necessary information for using the chosen databases. The db_label is used to differentiate different instances of the same system. We recommend filling in the host size or instance type here (as we do in our standard results).
468468
2. The next step is to select the test cases you want to perform. You can select multiple cases at once, and a form to collect corresponding parameters will appear.
469469
3. Finally, you'll need to provide a task label to distinguish different test results. Using the same label for different tests will result in the previous results being overwritten.
470-
Now we can only run one task at the same time.
470+
Now we can only run one task at the same time.
471471
![image](vectordb_bench/fig/run_test_select_db.png)
472472
![image](vectordb_bench/fig/run_test_select_case.png)
473473
![image](vectordb_bench/fig/run_test_submit.png)
@@ -508,11 +508,11 @@ We have strict requirements for the data set format, please follow them.
508508
- Vectors data files: The file must be named `train.parquet` and should have two columns: `id` as an incrementing `int` and `emb` as an array of `float32`.
509509
- Query test vectors: The file must be named `test.parquet` and should have two columns: `id` as an incrementing `int` and `emb` as an array of `float32`.
510510
- We recommend limiting the number of test query vectors, like 1,000.
511-
When conducting concurrent query tests, Vdbbench creates a large number of processes.
512-
To minimize additional communication overhead during testing,
511+
When conducting concurrent query tests, Vdbbench creates a large number of processes.
512+
To minimize additional communication overhead during testing,
513513
we prepare a complete set of test queries for each process, allowing them to run independently.
514-
However, this means that as the number of concurrent processes increases,
515-
the number of copied query vectors also increases significantly,
514+
However, this means that as the number of concurrent processes increases,
515+
the number of copied query vectors also increases significantly,
516516
which can place substantial pressure on memory resources.
517517
- Ground truth file: The file must be named `neighbors.parquet` and should have two columns: `id` corresponding to query vectors and `neighbors_id` as an array of `int`.
518518

@@ -542,10 +542,10 @@ VDBBench aims to provide a more comprehensive, multi-faceted testing environment
542542

543543
**Step 2: Implement new_client.py and config.py**
544544

545-
1. Open new_client.py and define the NewClient class, which should inherit from the clients/api.py file's VectorDB abstract class. The VectorDB class serves as the API for benchmarking, and all DB clients must implement this abstract class.
545+
1. Open new_client.py and define the NewClient class, which should inherit from the clients/api.py file's VectorDB abstract class. The VectorDB class serves as the API for benchmarking, and all DB clients must implement this abstract class.
546546
Example implementation in new_client.py:
547547
new_client.py
548-
```python
548+
```python
549549
from ..api import VectorDB
550550
class NewClient(VectorDB):
551551
# Implement the abstract methods defined in the VectorDB class
@@ -574,7 +574,7 @@ class NewDBCaseConfig(DBCaseConfig):
574574

575575
In this final step, you will import your DB client into clients/__init__.py and update the initialization process.
576576
1. Open clients/__init__.py and import your NewClient from new_client.py.
577-
2. Add your NewClient to the DB enum.
577+
2. Add your NewClient to the DB enum.
578578
3. Update the db2client dictionary by adding an entry for your NewClient.
579579
Example implementation in clients/__init__.py:
580580

@@ -672,14 +672,14 @@ def ZillizAutoIndex(**parameters: Unpack[ZillizTypedDict]):
672672
)
673673
```
674674
3. Update cli by adding:
675-
1. Add database specific options as an Annotated TypedDict, see ZillizTypedDict above.
675+
1. Add database specific options as an Annotated TypedDict, see ZillizTypedDict above.
676676
2. Add index configuration specific options as an Annotated TypedDict. (example: vectordb_bench/backend/clients/pgvector/cli.py)
677677
1. May not be needed if there is only one index config.
678-
2. Repeat for each index configuration, nesting them if possible.
678+
2. Repeat for each index configuration, nesting them if possible.
679679
2. Add a index config specific function for each index type, see Zilliz above. The function name, in lowercase, will be the command name passed to the vectordbbench command.
680680
3. Update db_config and db_case_config to match client requirements
681681
4. Continue to add new functions for each index config.
682-
5. Import the client cli module and command to vectordb_bench/cli/vectordbbench.py (for databases with multiple commands (index configs), this only needs to be done for one command)
682+
5. Import the client cli module and command to vectordb_bench/cli/vectordbbench.py (for databases with multiple commands (index configs), this only needs to be done for one command)
683683
6. Import the `get_custom_case_config` function from `vectordb_bench/cli/cli.py` and use it to add a new key `custom_case` to the `parameters` variable within the command.
684684

685685

@@ -697,7 +697,7 @@ For the system under test, we use the default server-side configuration to maint
697697
For the Client, we welcome any parameter tuning to obtain better results.
698698
### Incomplete Results
699699
Many databases may not be able to complete all test cases due to issues such as Out of Memory (OOM), crashes, or timeouts. In these scenarios, we will clearly state these occurrences in the test results.
700-
### Mistake Or Misrepresentation
700+
### Mistake Or Misrepresentation
701701
We strive for accuracy in learning and supporting various vector databases, yet there might be oversights or misapplications. For any such occurrences, feel free to [raise an issue](https://github.com/zilliztech/VectorDBBench/issues/new) or make amendments on our GitHub page.
702702
## Timeout
703703
In our pursuit to ensure that our benchmark reflects the reality of a production environment while guaranteeing the practicality of the system, we have implemented a timeout plan based on our experiences for various tests.

0 commit comments

Comments
 (0)