-
Notifications
You must be signed in to change notification settings - Fork 150
prefer CUDA 13.1 devcontainers, react to some cutlass removals in RAFT #1686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
This isn't working yet, but taking it out of draft so I don't have to |
|
I suspect that in CI we'll see the same thing I now see locally in a rapidsai/raft#2916 removed that header from RAFT, but it's still used here:
|
|
Here's the next error (reproducible locally in a Does bringing over these utilities from RAFT require declaring some other dependency? Or do we maybe need to change the flags cutlass is compiled with to avoid some codepaths? Or maybe these patches that RAFT had (https://github.com/rapidsai/raft/pull/2916/files#diff-5c9671513a52b5c647b9db97761ff01bb6d9a5d15ced9741a1760cec9d7dc9b7) need to be ported over here? |
bdice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s fix all the raft names to cuvs since it’s defined in cuvs now.
cpp/CMakeLists.txt
Outdated
| "$<INSTALL_INTERFACE:include>" | ||
| ) | ||
| target_link_libraries(cuvs_cpp_headers INTERFACE raft::raft rmm::rmm) | ||
| target_link_libraries(cuvs_cpp_headers INTERFACE nvidia::cutlass::cutlass raft::raft rmm::rmm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we need cutlass here as it isn't part of any public header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It is part of this file but I'll move it over to source. https://github.com/rapidsai/cuvs/pull/1686/files#diff-b220f246b1f8a4db2836dc1c87f1f30e2cee2d0f534f564e6f4cfd48e8d1a285R10
dantegd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a couple of questions
Also there are now two cutlass_utils.hpp files:
cpp/include/cuvs/util/cutlass_utils.hppwith public header with just cutlass_error exceptioncpp/src/util/cutlass_utils.hppwith internal header with CUVS_CUTLASS_TRY macro
This split makes sense (exception is public API, macro is internal), but might confuse future contributors. Consider adding a comment in the internal header noting that it's separate from the public one.
| if (CUDA_STATIC_RUNTIME) | ||
| set(CUDART_LIBRARY "${CUDA_cudart_static_LIBRARY}" CACHE FILEPATH "fixing cutlass cmake code" FORCE) | ||
| endif() | ||
| set(CUDART_LIBRARY "${CUDA_cudart_static_LIBRARY}" CACHE FILEPATH "fixing cutlass cmake code" FORCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change to always static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the other way round? The conditional is removed now. This option just tells so to CUTLASS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was part of raft and not mirrored to cuvs.
The always static cudart is part of this effort: rapidsai/build-planning#235
|
|
||
| rapids_export_package( | ||
| BUILD NvidiaCutlass cuvs-exports GLOBAL_TARGETS nvidia::cutlass::cutlass | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jobs downstream of libcuvs builds are failing like this:
│ │ -- Found rmm: $PREFIX/lib/cmake/rmm/rmm-config.cmake (found version "26.02.0")
│ │ -- Found raft: $PREFIX/lib/cmake/raft/raft-config.cmake (found version "26.02.0")
│ │ CMake Error at $BUILD_PREFIX/share/cmake-4.2/Modules/CMakeFindDependencyMacro.cmake:93 (find_package):
│ │ By not providing "FindNvidiaCutlass.cmake" in CMAKE_MODULE_PATH this
│ │ project has asked CMake to find a package configuration file provided by
│ │ "NvidiaCutlass", but CMake did not find one.
│ │
│ │ Could not find a package configuration file provided by "NvidiaCutlass"
│ │ with any of the following names:
│ │
│ │ NvidiaCutlassConfig.cmake
│ │ nvidiacutlass-config.cmake
│ │
I think I know what's happening, based on my read of https://docs.rapids.ai/api/rapids-cmake/legacy/dependency_tracking/
Since nvidia::cutlass::cutlass is a PRIVATE dependency of cuvs, we shouldn't be including it in the INSTALL export set and therefore generating a find_dependency(NvidiaCutlass).
I just pushed 9d102ce attempting to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like one more similar change for cuco (also a PRIVATE dependency) is needed:
│ │ CMake Error at $BUILD_PREFIX/share/cmake-4.2/Modules/CMakeFindDependencyMacro.cmake:93 (find_package):
│ │ By not providing "Findcuco.cmake" in CMAKE_MODULE_PATH this project has
│ │ asked CMake to find a package configuration file provided by "cuco", but
│ │ CMake did not find one.
│ │
│ │ Could not find a package configuration file provided by "cuco" with any of
│ │ the following names:
│ │
│ │ cucoConfig.cmake
│ │ cuco-config.cmake
Pushed 523e320
| include("${rapids-cmake-dir}/export/find_package_root.cmake") | ||
| rapids_export_find_package_root( | ||
| BUILD NvidiaCutlass [=[${CMAKE_CURRENT_LIST_DIR}]=] | ||
| EXPORT_SET cuvs-exports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be in the cuvs-static export set and not the general export set.
PRIVATE static dependencies are still exported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok I did not realize that. Fixed in 59fd826
|
/merge |
use CUDA 13.1 devcontainers
Follow-up to #1677
There, I forgot to switch devcontainer testing here back to CUDA 13.1 (I'd temporarily kept it at 13.0 because there weren't yet NCCL packages with 13.1 support). This fixes that.
react to cutlass removals in RAFT
rapidsai/raft#2916 removed headers used by cuVS and stopped exporting cutlass from RAFT.
This brings those headers and some related patches over here to cuVS.
Related: rapidsai/cuml#7658