-
Notifications
You must be signed in to change notification settings - Fork 1
Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xiaodong Ye <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates MUSA’s mudnn::Unary IDENTITY operation to accelerate device-to-device memory copies for FP16/FP32 tensors, offering ~40% performance improvements.
- Adds
mudnnMemcpyAsyncAPI declaration and implementation using mudnn::Unary::IDENTITY - Updates
cpy.cuto route contiguous F32/F16 copies through MUSA when enabled - Extends CMake configurations to include new sources and link against the mudnn library
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| ml/backend/ggml/ggml/src/ggml-musa/mudnn.cuh | Declares mudnnMemcpyAsync |
| ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu | Implements mudnnMemcpyAsync with MUSA DNN identity op |
| ml/backend/ggml/ggml/src/ggml-musa/CMakeLists.txt | Adds mudnn headers/sources and links mudnn library |
| ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu | Routes contiguous FP16/FP32 copies through mudnnMemcpyAsync |
| CMakeLists.txt | Includes mudnn in runtime dependency regex and link steps |
Comments suppressed due to low confidence (3)
ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu:88
- There are no unit tests covering mudnnMemcpyAsync; adding tests for both FLOAT and HALF copy paths would help ensure correctness and prevent regressions.
musaError_t mudnnMemcpyAsync(ggml_backend_cuda_context& ctx, const ggml_tensor* dst, const ggml_tensor* src) {
ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu:1
- The file uses std::vector and std::unordered_map but does not include or <unordered_map>, leading to compilation failures.
#include <mutex>
ml/backend/ggml/ggml/src/ggml-musa/CMakeLists.txt:101
- Static builds are not linking against the mudnn library, which will cause unresolved symbol errors when mudnn code is compiled; consider linking mudnn for static configuration or disabling mudnn support in static mode.
target_link_libraries(ggml-musa PRIVATE MUSA::musart_static MUSA::mublas_static)
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
Signed-off-by: Xiaodong Ye <[email protected]>
This is a manual merge of ggml-org/llama.cpp#13647.
Testing Done
The previous eval rate was around 7 tokens/s — this update improves it by approximately 40%.