Automatic Partition Count Derivation for ACE #1603

julianmi · 2025-12-02T13:18:59Z

This PR adds automatic partition count derivation for ACE (Augmented Core Extraction) graph builds based on available system memory.

Previously, users had to manually calculate and specify the number of partitions based on their dataset size and available memory. This was error-prone and required understanding the internal memory requirements of the ACE algorithm. With auto-derivation, users get optimal partitioning out of the box while still having the option to override if needed.

Changes

When npartitions is set to 0 (new default), ACE automatically calculates the optimal number of partitions based on available host and GPU memory.
When npartitions is set to a positive value, the specified count is used but may be automatically increased if it would exceed memory limits.
Added max_host_memory_gb and max_gpu_memory_gb parameters to allow users to constrain memory usage (useful for shared systems or testing).

This builds on top of PR #1597, which should be merged first.

- Added `cuvsHnswAceParams` structure for ACE configuration. - Implemented `cuvsHnswBuild` function to facilitate index construction using ACE. - Updated HNSW index parameters to include ACE settings. - Created new tests for HNSW index building and searching using ACE. - Updated documentation to reflect the new ACE parameters and usage.

- Add heuristic to automatically derive the number of partitions based on host and device memory requirements. - Increase the user-profided `npartitions` if it does not fit memory. - Introduced `max_host_memory_gb` and `max_gpu_memory_gb` fields to `cuvsAceParams` and `cuvsHnswAceParams` structures for controlling memory usage during ACE builds. - Added tests to verify that small memory limits trigger disk mode correctly for both CAGRA and HNSW index builds.

copy-pr-bot · 2025-12-02T13:19:03Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

- Renamed parameter `m` to `M` in HNSW structures and related functions for consistency. - Removed `ef_construction` from `cuvsHnswAceParams` and related classes, as it is no longer needed. - Load the HNSW index from file before search if needed.

KyleFromNVIDIA

Approved CMake changes

tfeher

Thanks Julian for the PR, it is great to have the n_partition parameter automatically determined. I have reviewed C/C++/Python changes. I have suggestions for the formulas used for memory estimation, but in general the PR is in a good shape.

cpp/src/neighbors/detail/cagra/cagra_build.cuh

tfeher · 2026-01-15T09:37:49Z

cpp/src/neighbors/detail/cagra/cagra_build.cuh

+  size_t gpu_sub_graph_size = imbalance_factor * 2 * (dataset_size / n_partitions) *
+                              (intermediate_degree + graph_degree) * sizeof(IdxT);
+  size_t gpu_workspace_size     = gpu_sub_dataset_size;
+  size_t disk_mode_gpu_required = gpu_sub_dataset_size + gpu_sub_graph_size + gpu_workspace_size;


Why do we need a workspace that is equivalent with sub_datset_size?

We should also keep in mind that memory requirements depend on the build algorithm that we use. Having gpu_sub_dataset_size + gpu_sub_graph_size is a good upper limit now. Just for reference, I expect that we have the following actual memory usage

IVF-PQ max(pq_compressed_sub_dataset_size, gpu_sub_graph_size)

NN descent gpu_sub_dataset_size_fp16 + gpu_sub_graph_size

Iterative solver max(gpu_sub_dataset_size, gpu_sub_graph_size)

Let me know if you have a better estimate of gpu_workspace_size please. The limiter I have found seems to be build_knn_graph calling sort_knn_graph which creates a copy of the dataset (graph_core.cuh#L532-L537).

Thanks for these limits. This is very helpful.

Indeed sorting would need the dataset. But that is done after KNN building, so we reuse the space that we reserved for the dataset during that phase. Therefore I would not include dataset size in workspace.

I went through optimize, and here is the workspace memory usage. Could you define a helper function, and include this in the memory estimate?

def optimize_workspace_size(N, deg, ideg, S, mst_opt = False): """ Claculates CAGRA optimize memory usage This is the working memory on top of the input/output host memory usage (N * (deg + ideg) * S). N - number of rows in the dataset deg - graph degree ideg - intermediate graph degree S - graph type size """ mst_host = N * S # mst_graph_num_edges if mst_opt: mst_host += N * deg * S # mst_graph allocated in optimize mst_host += N * deg * S # mst_graph allocated in mst_optimize mst_host += N * S * 7 # vectors with _max_edges suffix mst_host += (deg-1) * (deg-1) * S # iB_candidates prune_host = N * ideg * 1 # detour count prune_dev = N * ideg * 1 # detour count prune_dev += N * 4 # d_num_detour_edges prune_dev += N * ideg * S # d_input_graph # We neglect 8 bytes (both on host and device) for stats rev_host = N * deg * S # rev_graph rev_host += N * 4 # rev_graph_count rev_host += N * S # dest_nodes rev_dev = N * deg * S # d_rev_graph rev_dev += N * 4 # d_rev_graph_count rev_dev += N * 4 # d_dest_nodes # Memory for merging graphs combine_host = (N * 4 + deg * 4) / 1e9 # in_edge_count + hist mst_host /= 1e9 prune_host /= 1e9; prune_dev /= 1e9; rev_host /= 1e9; rev_dev /= 1e9; print("Prune host {:4.2f} GB, dev {:4.2f} GB".format(prune_host, prune_dev)) print("Rev host {:4.2f} GB, dev {:4.2f} GB".format(rev_host, rev_dev)) print("MST host {:4.2f} GB".format(mst_host)) total_host = mst_host + max(prune_host, rev_host, combine_host) total_dev = max(prune_dev, rev_dev) print("Total host {:4.2f} GB, dev {:4.2f} GB".format(total_host, total_dev)) return total_host, total_dev

Thank you, this is very helpful. I have added a helper which is used in host and device memory calculations.

cpp/src/neighbors/detail/cagra/cagra_build.cuh

python/cuvs/cuvs/neighbors/hnsw/hnsw.pyx

cpp/src/neighbors/detail/cagra/cagra_build.cuh

- Update this once we have enough data to recommend this again.

tfeher · 2026-01-16T09:59:28Z

Thanks Julian for the update, I had one more comment for the workspace size estimate, otherwise the code looks good.

@tfeher

- Provided by @tfeher

julianmi added 5 commits November 29, 2025 19:59

Fix formatting

fd06f6f

Address review feedback

28f14b7

Improve C++ test based on review feedback

6d464ec

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Dec 2, 2025

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Dec 2, 2025

julianmi added 2 commits December 3, 2025 13:38

Merge branch 'main' into hnsw-ace

97f84e1

Address review comments

5138bfc

- Renamed parameter `m` to `M` in HNSW structures and related functions for consistency. - Removed `ef_construction` from `cuvsHnswAceParams` and related classes, as it is no longer needed. - Load the HNSW index from file before search if needed.

julianmi mentioned this pull request Dec 3, 2025

Add HNSW ACE build method #1597

Merged

Merge branch 'pr/julianmi/1597' into ace-auto-npartitions

29a4e65

julianmi marked this pull request as ready for review December 3, 2025 15:37

julianmi requested review from a team as code owners December 3, 2025 15:37

KyleFromNVIDIA approved these changes Dec 3, 2025

View reviewed changes

julianmi added 10 commits December 4, 2025 09:53

Merge branch 'main' into hnsw-ace

04479f7

Fix ACE parameters in tests

6ba5ec2

Merge branch 'main' into hnsw-ace

575527f

Merge branch 'hnsw-ace' into ace-auto-npartitions

7bbcb00

Pass ACE memory limits in Python interface

40a20b9

Merge branch 'main' into hnsw-ace

114095d

Merge remote-tracking branch 'origin/hnsw-ace' into ace-auto-npartitions

de93cd1

Merge branch 'main' into hnsw-ace

6f09b5b

Merge branch 'main' into hnsw-ace

43b5a79

Merge remote-tracking branch 'origin/hnsw-ace' into ace-auto-npartitions

527e8e0

julianmi added 5 commits December 17, 2025 14:52

Merge branch 'main' into hnsw-ace

6153711

Merge branch 'main' into hnsw-ace

5f5954c

Merge branch 'main' into hnsw-ace

0e7a9ae

Merge branch 'main' into hnsw-ace

1782e86

Update copyright

e595771

cjnolet assigned julianmi Jan 5, 2026

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 5, 2026

cjnolet moved this from Todo to In Progress in Vector Search, ML, & Data Mining Release Board Jan 5, 2026

julianmi and others added 10 commits January 6, 2026 17:50

Merge branch 'main' into hnsw-ace

b0925b4

Merge branch 'main' into hnsw-ace

5d59c79

Merge remote-tracking branch 'origin/hnsw-ace' into ace-auto-npartitions

a9bf831

Merge branch 'main' into hnsw-ace

90a05fc

Merge branch 'main' into hnsw-ace

5b01b40

Merge remote-tracking branch 'origin/hnsw-ace' into ace-auto-npartitions

817bb48

Merge upstream/main into ace-auto-npartitions

31ec3f6

Merge branch 'main' into ace-auto-npartitions

4fe0abe

Merge branch 'main' into ace-auto-npartitions

a926d3a

tfeher requested changes Jan 15, 2026

View reviewed changes

tfeher reviewed Jan 15, 2026

View reviewed changes

cpp/src/neighbors/detail/cagra/cagra_build.cuh Show resolved Hide resolved

julianmi added 6 commits January 15, 2026 16:37

Merge remote-tracking branch 'upstream/main' into ace-auto-npartitions

0d2a981

Apply review feedback

ff921b6

Remove the partition size recommendation

b8c44f2

- Update this once we have enough data to recommend this again.

Fix gpu_memory_limited based on review feedback

74122ca

Fix formatting

58e13fe

Merge branch 'main' into ace-auto-npartitions

b7a7136

julianmi added 2 commits January 17, 2026 10:09

Add CAGRA workspace size calculations

c42549e

- Provided by @tfeher

Merge branch 'main' into ace-auto-npartitions

38c2aa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic Partition Count Derivation for ACE #1603

Automatic Partition Count Derivation for ACE #1603

Uh oh!

julianmi commented Dec 2, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 2, 2025

Uh oh!

KyleFromNVIDIA left a comment

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher Jan 15, 2026

Uh oh!

julianmi Jan 15, 2026

Uh oh!

tfeher Jan 16, 2026

Uh oh!

julianmi Jan 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Automatic Partition Count Derivation for ACE #1603

Are you sure you want to change the base?

Automatic Partition Count Derivation for ACE #1603

Uh oh!

Conversation

julianmi commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 2, 2025

Uh oh!

KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

julianmi Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

tfeher Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

julianmi Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

julianmi commented Dec 2, 2025 •

edited

Loading