Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #12

yeahdongcn · 2025-07-04T06:44:11Z

This is a manual merge of ggml-org/llama.cpp#13647.

Testing Done

❯ ollama run gemma3n:e4b
total duration:       7.828684474s
load duration:        58.973429ms
prompt eval count:    11 token(s)
prompt eval duration: 2.363928229s
prompt eval rate:     4.65 tokens/s
eval count:           53 token(s)
eval duration:        5.405308512s
eval rate:            9.81 tokens/s

The previous eval rate was around 7 tokens/s — this update improves it by approximately 40%.

Signed-off-by: Xiaodong Ye <[email protected]>

Copilot

Pull Request Overview

This PR integrates MUSA’s mudnn::Unary IDENTITY operation to accelerate device-to-device memory copies for FP16/FP32 tensors, offering ~40% performance improvements.

Adds mudnnMemcpyAsync API declaration and implementation using mudnn::Unary::IDENTITY
Updates cpy.cu to route contiguous F32/F16 copies through MUSA when enabled
Extends CMake configurations to include new sources and link against the mudnn library

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
ml/backend/ggml/ggml/src/ggml-musa/mudnn.cuh	Declares `mudnnMemcpyAsync`
ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu	Implements `mudnnMemcpyAsync` with MUSA DNN identity op
ml/backend/ggml/ggml/src/ggml-musa/CMakeLists.txt	Adds mudnn headers/sources and links `mudnn` library
ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu	Routes contiguous FP16/FP32 copies through `mudnnMemcpyAsync`
CMakeLists.txt	Includes `mudnn` in runtime dependency regex and link steps

Comments suppressed due to low confidence (3)

ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu:88

There are no unit tests covering mudnnMemcpyAsync; adding tests for both FLOAT and HALF copy paths would help ensure correctness and prevent regressions.

musaError_t mudnnMemcpyAsync(ggml_backend_cuda_context& ctx, const ggml_tensor* dst, const ggml_tensor* src) {

ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu:1

The file uses std::vector and std::unordered_map but does not include or <unordered_map>, leading to compilation failures.

#include <mutex>

ml/backend/ggml/ggml/src/ggml-musa/CMakeLists.txt:101

Static builds are not linking against the mudnn library, which will cause unresolved symbol errors when mudnn code is compiled; consider linking mudnn for static configuration or disabling mudnn support in static mode.

        target_link_libraries(ggml-musa PRIVATE MUSA::musart_static MUSA::mublas_static)

Signed-off-by: Xiaodong Ye <[email protected]>

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy

a76d74e

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn requested review from Copilot and fishingfly July 4, 2025 06:44

yeahdongcn self-assigned this Jul 4, 2025

Copilot AI reviewed Jul 4, 2025

View reviewed changes

fishingfly approved these changes Jul 7, 2025

View reviewed changes

yeahdongcn merged commit 5b24e02 into main Jul 7, 2025

yeahdongcn deleted the xd/mudnn branch July 7, 2025 07:52

yeahdongcn added a commit that referenced this pull request Jul 10, 2025

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

5c08bca

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Jul 31, 2025

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

92bdf00

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 6, 2025

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

e45eba7

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 7, 2025

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

c5a4f5e

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 11, 2025

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

2fffcf0

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 18, 2025

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

728bb51

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 19, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

cba90b1

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 20, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

163587b

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 21, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

a2c1079

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Aug 26, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

162fd7a

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Sep 1, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

aa74ff6

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Oct 2, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

50a10c9

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit that referenced this pull request Oct 13, 2025

musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#12)

b6fb1fb

Signed-off-by: Xiaodong Ye <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #12

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #12

Uh oh!

yeahdongcn commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #12

Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy #12

Uh oh!

Conversation

yeahdongcn commented Jul 4, 2025

Testing Done

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants