Skip to content

Conversation

@julianmi
Copy link
Contributor

@julianmi julianmi commented Dec 2, 2025

This PR removes the optional in-memory path for ACE (Augmented Core Extraction) builds, making disk-based storage the only supported mode. The all neighbors API can be used in cases where partitioning is required due to limited device but sufficient host memory.

This simplifies the ACE implementation and reduces the complexity of maintaining two code paths.

Changes

  • Removed use_disk parameter from ace_params.
  • Eliminated conditional memory vs. disk code paths in the core ACE build logic.
  • Removed ace_gather_partition_dataset() function (in-memory only).
  • Consolidated ace_adjust_sub_graph_ids_disk() into ace_adjust_sub_graph_ids().

- Added `cuvsHnswAceParams` structure for ACE configuration.
- Implemented `cuvsHnswBuild` function to facilitate index construction using ACE.
- Updated HNSW index parameters to include ACE settings.
- Created new tests for HNSW index building and searching using ACE.
- Updated documentation to reflect the new ACE parameters and usage.
- Add heuristic to automatically derive the number of partitions based on host and device memory requirements.
- Increase the user-profided `npartitions` if it does not fit memory.
- Introduced `max_host_memory_gb` and `max_gpu_memory_gb` fields to `cuvsAceParams` and `cuvsHnswAceParams` structures for controlling memory usage during ACE builds.
- Added tests to verify that small memory limits trigger disk mode correctly for both CAGRA and HNSW index builds.
- ACE (Augmented Core Extraction) now always uses disk-based storage for consistent behavior and memory efficiency when building large indices.
- Remove 'use_disk' parameter from ace_params struct in C++, C, Python, Java
- Remove in-memory code path from build_ace() in cagra_build.cuh
- Remove ace_gather_partition_dataset() (no longer needed)
- Consolidate ace_adjust_sub_graph_ids_disk() into ace_adjust_sub_graph_ids()
- Remove conditional memory vs disk branches throughout ACE build
- Update documentation to reflect disk-only behavior
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@tfeher tfeher mentioned this pull request Dec 3, 2025
- Renamed parameter `m` to `M` in HNSW structures and related functions for consistency.
- Removed `ef_construction` from `cuvsHnswAceParams` and related classes, as it is no longer needed.
- Load the HNSW index from file before search if needed.
@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 5, 2026
@cjnolet cjnolet moved this from Todo to In Progress in Vector Search, ML, & Data Mining Release Board Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

2 participants