Skip to content

Commit baa36df

Browse files
kolchfa-awshuibishe
authored andcommitted
Incl language
Signed-off-by: Fanit Kolchina <[email protected]>
1 parent 43ee306 commit baa36df

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2025-02-28-a-practical-guide-for-selecting-HNSW-hyperparameters.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ To achieve this, we use 15 vector search datasets spanning various modalities, e
7676
| [Word2bits](https://github.com/agnusmaximus/Word2BitsD199572657/) | 800 | 399,000 | 1,000 | 100 | Hamming | Word2Vec with quantized parameter | Language, English Wikipedia |
7777
| [GIST](http://corpus-texmex.irisa.fr/) | 960 | 1,000,000 | 1,000 | 100 | Euclidean | GIST descriptors, INRIA C implementation | Image |
7878
| [MS MARCO](https://microsoft.github.io/msmarco/) | 384 | 1,000,000 | 50,000 | 100 | Euclidean | MiniLLM | Language, Question answering |
79-
| [openai-dbpedia](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M) | 1,536 | 950,000 | 50,000 | 100 | Euclidean | text-embedding-3-large | Language, DBpedia |
79+
| [openai-dbpedia](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M) | 1,536 | 950,000 | 50,000 | 100 | Euclidean | text-embedding-3-large | Language, DBPedia |
8080

8181

8282
For each dataset, we evaluated a grid of 80 HNSW configurations based on the following search space:
@@ -89,7 +89,7 @@ search_space = {
8989
}
9090
```
9191

92-
For these experiments, we used an OpenSearch 2.15 cluster with three master nodes and six data nodes, each running on an `r6g.4xlarge.search` instance. We evaluated test vectors in batches of 100 and recorded query throughput and recall@10 for each HNSW configuration. In the next section, we'll introduce the algorithm used to learn the portfolio.
92+
For these experiments, we used an OpenSearch 2.15 cluster with three cluster manager nodes and six data nodes, each running on an `r6g.4xlarge.search` instance. We evaluated test vectors in batches of 100 and recorded query throughput and recall@10 for each HNSW configuration. In the next section, we'll introduce the algorithm used to learn the portfolio.
9393

9494
### Method
9595

0 commit comments

Comments
 (0)