Skip to content

Conversation

@jinsolp
Copy link
Contributor

@jinsolp jinsolp commented Dec 19, 2025

Closes #1577

Reduces binary size by deduplicating calc_chunk_indices_kernel. This PR reduces instantiations from 62 -> 1 for each template (BlockDim=32, 64, ..., 1024)

Binary Size Changes

CUDA 12.9: 1096.15MB ->
CUDA 13: 432.98 MB->

@jinsolp jinsolp self-assigned this Dec 19, 2025
@jinsolp jinsolp requested review from a team as code owners December 19, 2025 00:07
@jinsolp jinsolp added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Dec 19, 2025
@jinsolp jinsolp requested a review from divyegala December 19, 2025 00:07
@divyegala
Copy link
Member

@robertmaynard can you review this PR as well? I was under the impression that launching a kernel across TUs would not work in CUDA whole compilation mode but here it seems to be working. Aren't kernels supposed to have their symbols hidden too?

Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is invalid and goes against the guidance in https://developer.nvidia.com/blog/cuda-c-compiler-updates-impacting-elf-visibility-and-linkage/

This currently only work in cuvs as we have failed to remove: https://github.com/rapidsai/cuvs/blob/main/cpp/cmake/modules/ConfigureCUDA.cmake#L38

@jinsolp
Copy link
Contributor Author

jinsolp commented Dec 19, 2025

Ohh I see okay 🥲

Copy link
Member

@divyegala divyegala Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel needs to be launched in the same TU as which it is defined. We can (but should ideally avoid) pass the pointer around to other TUs but they shouldn't be attempting to launch the kernel.

@cjnolet cjnolet moved this from Todo to In Progress in Vector Search, ML, & Data Mining Release Board Jan 5, 2026
@jinsolp
Copy link
Contributor Author

jinsolp commented Jan 12, 2026

Hi @robertmaynard , can you check this PR? I've added the changes to follow the guidelines

@jinsolp
Copy link
Contributor Author

jinsolp commented Jan 16, 2026

/merge

@rapids-bot rapids-bot bot merged commit 3138284 into rapidsai:main Jan 16, 2026
191 of 193 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

Reduce Binary Size by deduplicating calc_chunk_indices_kernel kernel

3 participants