Commit de29e7d
Merge cockroachdb#155998
155998: Create a vector indexing roachtest r=mw5h a=mw5h
#### workload/vecann: add shared utility functions for vector workloads
Add DeriveDistanceMetric and CalculateRecall functions to the vecann
package to be shared between vecbench and the upcoming vector index
roachtest. These functions extract the distance metric from dataset
naming conventions and compute recall for search result validation.
Refactor vecbench to use these shared implementations, removing
duplicate code.
Informs: cockroachdb#154590
Release note: None
#### roachtest/vecindex: add a vector index test stub
Add minimal test infrastructure for vector index roachtests:
- Create vecindex.go with vecIndexOptions struct and empty
registerVectorIndex function
- Register the test in registry.go
- Add helper functions for test naming, key generation, and distance
operator mapping.
- Stubs out the phases of the test.
Add test configurations and registration for 5 vector index test
variants:
- vecindex/dbpedia-100k/nodes=3 (standard, no prefix)
- vecindex/dbpedia-100k/nodes=3/prefix=3 (with prefix columns)
- vecindex/dbpedia-1m/nodes=6 (large-scale)
- vecindex/random-s/nodes=1 (local development)
- vecindex/random-s/nodes=1/prefix=2 (local with prefix)
Informs: cockroachdb#154590
Release note: None
#### roachtest/vecindex: implement a test of backfill and merge
Add a test phase that loads a dataset using a pool of workers and, at a
test-specified percentage of table population, kicks off a create vector
index for the data. This allows us to test both backfill (pre-create)
and merge (post-create starting). Times are reported for both but are
not used as a pass criteria (yet).
Informs: cockroachdb#154590
Release note: None
#### roachtest/vecindex: test recall of vector ann data
This addition to the vecindex roachtest tests the recall of nearest
neighbors from the test data provided with each data set. Each test has
a configurable set of beam sizes to test and a minimum recall
correctneess that is acceptable for each beam size. Tests that load
multiple prefixes test each prefix.
Informs: cockroachdb#154590
Release note: None
#### roachtest/vecindex: add a concurrent reader/writer subtest
This subtest spins up a configurable number of readers and writers to
drive vector search load to the database.
Each writer inserts rows from the first train data file in single
row batches until it has inserted all of the rows in that file, at
which point it switches into delete mode and starts deleting rows
in 10 row batches. When all rows have been deleted, the writer once
again becomes an inserter and the process repeats.
Each reader randomly selects a beam size from the sizes configured for
the test and then runs searches for random vectors in the test data for
the dataset. The reader ignores rows inserted by the writer threads to
avoid too heavily skewing results. To do this, it searches for more
vectors than called for and then filters the output to remove vectors
written by the insert workers. When the read worker exits, it validates
its recall rate against the expected rate for the number of searches it
performed.
For multi-prefix tests, this subtest only reads and writes to the first
prefix to ensure the maximum amount of contention.
Fixes: cockroachdb#154590
Release note: None
Co-authored-by: Matt White <[email protected]>File tree
10 files changed
+851
-33
lines changed- build/teamcity/cockroach/nightlies
- pkg
- cmd
- roachtest
- registry
- testdata/filter
- tests
- vecbench
- workload/vecann
10 files changed
+851
-33
lines changedLines changed: 17 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
Lines changed: 35 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
459 | 459 | | |
460 | 460 | | |
461 | 461 | | |
| 462 | + | |
462 | 463 | | |
463 | 464 | | |
464 | 465 | | |
465 | 466 | | |
466 | 467 | | |
467 | | - | |
| 468 | + | |
468 | 469 | | |
469 | 470 | | |
470 | 471 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
209 | 209 | | |
210 | 210 | | |
211 | 211 | | |
| 212 | + | |
212 | 213 | | |
213 | 214 | | |
214 | 215 | | |
| |||
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
| 288 | + | |
| 289 | + | |
287 | 290 | | |
288 | 291 | | |
289 | 292 | | |
| |||
310 | 313 | | |
311 | 314 | | |
312 | 315 | | |
| 316 | + | |
313 | 317 | | |
314 | 318 | | |
315 | 319 | | |
| |||
318 | 322 | | |
319 | 323 | | |
320 | 324 | | |
| 325 | + | |
321 | 326 | | |
322 | 327 | | |
323 | 328 | | |
| |||
340 | 345 | | |
341 | 346 | | |
342 | 347 | | |
| 348 | + | |
343 | 349 | | |
344 | 350 | | |
| 351 | + | |
345 | 352 | | |
346 | 353 | | |
347 | 354 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
| 172 | + | |
172 | 173 | | |
173 | 174 | | |
174 | 175 | | |
| |||
0 commit comments