CANDOR-Bench

Streaming vector index benchmarking framework.

1. One-click deployment

# Clone the repo (with submodules)
git clone --recursive https://github.com/intellistream/CANDOR-Bench.git
cd CANDOR-Bench

# Run the deployment script
./deploy.sh

# Activate the virtual environment
source sage-db-bench/bin/activate

Deployment options:

./deploy.sh --skip-system-deps  # Skip system dependency installation (when deps are already installed)
./deploy.sh --skip-build        # Skip build (setup environment only)
./deploy.sh --help              # Show help

2. Datasets

Supported datasets

Dataset	Dim	Size	Description
sift	128	1M	SIFT feature vectors
glove	100	1.2M	GloVe word vectors
random-xs	32	10K	Random data (testing)
random-s	64	100K	Random data (small scale)
random-m	128	1M	Random data (mid scale)

Download datasets

python prepare_dataset.py --dataset sift
python prepare_dataset.py --dataset glove

Add a new dataset

Add it in datasets/registry.py:

class MyDataset(Dataset):
    def __init__(self):
        self.nb = 100000      # number of vectors
        self.nq = 10000       # number of queries
        self.d = 128          # vector dimension
        self.dtype = 'float32'
        self.basedir = 'raw_data/mydataset'

    def prepare(self):
        # Download or generate data
        pass

    def get_data_in_range(self, start, end):
        # Return data in [start, end)
        pass

    def get_queries(self):
        # Return query vectors
        pass

    def distance(self):
        return 'euclidean'  # or 'ip'

# Register dataset
DATASETS['mydataset'] = lambda: MyDataset()

3. Algorithms

Supported algorithms

Algorithm	Type	Description
faiss_HNSW	Graph index	Faiss HNSW implementation
faiss_HNSW_Optimized	Graph index	HNSW with Gorder optimization
faiss_IVFPQ	Quantization	Inverted file + product quantization
diskann	Graph index	DiskANN
vsag_hnsw	Graph index	VSAG HNSW

Add a new algorithm

Create a directory under bench/algorithms/:

bench/algorithms/my_algo/
├── __init__.py
├── my_algo.py
└── config.yaml

Implement the algorithm interface (my_algo.py):

from ..base import BaseStreamingANN

class MyAlgorithm(BaseStreamingANN):
    def __init__(self, metric, index_params):
        self.metric = metric
        self.name = "my_algo"
        # parse index_params

    def setup(self, dtype, max_pts, ndim):
        # Initialize index
        pass

    def insert(self, X, ids):
        # Insert vectors
        pass

    def delete(self, ids):
        # Delete vectors
        pass

    def query(self, X, k):
        # Query, return (ids, distances)
        pass

    def set_query_arguments(self, query_args):
        # Set query parameters (e.g., ef)
        pass

Create the config file (config.yaml):

sift:
  my_algo:
    module: benchmark_anns.bench.algorithms.my_algo.my_algo
    constructor: MyAlgorithm
    base-args: ["@metric"]
    run-groups:
      base:
        args: |
          [{"param1": 32, "param2": 100}]
        query-args: |
          [{"ef": 40}]

Export it in __init__.py:

from .my_algo import MyAlgorithm
__all__ = ['MyAlgorithm']

4. Benchmark workflow

4.1 Compute ground truth

python compute_gt.py \
    --dataset sift \
    --runbook_file runbooks/simple.yaml \
    --gt_cmdline_tool ./DiskANN/build/apps/utils/compute_groundtruth

Ground-truth files are saved in raw_data/{dataset}/{size}/{runbook}.yaml/.

4.2 Run benchmarks

# Basic usage
python run_benchmark.py \
    --algorithm faiss_HNSW_Optimized \
    --dataset sift \
    --runbook runbooks/simple.yaml

# Enable cache miss profiling
python run_benchmark.py \
    --algorithm faiss_HNSW_Optimized \
    --dataset sift \
    --runbook runbooks/simple.yaml \
    --enable-cache-profiling

4.3 Export results

python export_results.py \
    --dataset sift \
    --algorithm faiss_HNSW_Optimized \
    --runbook simple

Exported results include:

recall: recall per batch
query_qps: query throughput
query_latency_ms: query latency
cache_misses: cache miss counts (if enabled)

Result files are saved in results/{dataset}/{algorithm}/.

Name		Name	Last commit message	Last commit date
Latest commit History 326 Commits
.github		.github
DiskANN @ a26f824		DiskANN @ a26f824
algorithms_impl		algorithms_impl
bench		bench
datasets		datasets
runbooks		runbooks
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
ALGORITHM_DEPLOYMENT.md		ALGORITHM_DEPLOYMENT.md
CICD_FIXES.md		CICD_FIXES.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
__init__.py		__init__.py
__main__.py		__main__.py
activate.sh		activate.sh
compute_gt.py		compute_gt.py
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
export_results.py		export_results.py
install.sh		install.sh
prepare_dataset.py		prepare_dataset.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
setup.cfg		setup.cfg
test_local.sh		test_local.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CANDOR-Bench

1. One-click deployment

2. Datasets

Supported datasets

Download datasets

Add a new dataset

3. Algorithms

Supported algorithms

Add a new algorithm

4. Benchmark workflow

4.1 Compute ground truth

4.2 Run benchmarks

4.3 Export results

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

CGCL-codes/CANDOR-Bench

Folders and files

Latest commit

History

Repository files navigation

CANDOR-Bench

1. One-click deployment

2. Datasets

Supported datasets

Download datasets

Add a new dataset

3. Algorithms

Supported algorithms

Add a new algorithm

4. Benchmark workflow

4.1 Compute ground truth

4.2 Run benchmarks

4.3 Export results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages