Skip to content

Conversation

@hipudding
Copy link
Collaborator

@hipudding hipudding commented Aug 29, 2025

./bin/test-backend-ops test -b CANN0 -o MUL_MAT_ID
Testing 5 devices

Backend 1/5: CANN0
Device description: Ascend910B4
Device memory: 30196 MB (29803 MB free)

new_pool_for_device: device 0 use vmm pool
11837/11837 tests passed
Backend CANN0: OK
Backend 2/5: CANN1
Skipping
5/5 backends passed
OK

Performance result for qwen3:30b-a3b-fp16

Before

llama_perf_context_print:        load time =   32008.05 ms
llama_perf_context_print: prompt eval time =     465.31 ms /    16 tokens (   29.08 ms per token,    34.39 tokens per second)
llama_perf_context_print:        eval time =    3771.73 ms /    12 runs   (  314.31 ms per token,     3.18 tokens per second)
llama_perf_context_print:       total time =    4766.22 ms /    28 tokens
llama_perf_context_print:    graphs reused =         12

After

llama_perf_sampler_print:    sampling time =      54.81 ms /   224 runs   (    0.24 ms per token,  4086.62 tokens per second)
llama_perf_context_print:        load time =   30221.19 ms
llama_perf_context_print: prompt eval time =     298.75 ms /    16 tokens (   18.67 ms per token,    53.56 tokens per second)
llama_perf_context_print:        eval time =    5157.83 ms /   207 runs   (   24.92 ms per token,    40.13 tokens per second)
llama_perf_context_print:       total time =    6018.31 ms /   223 tokens
llama_perf_context_print:    graphs reused =        207

Make sure to read the contributing guidelines before submitting a PR

@hipudding hipudding added Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 29, 2025
@hipudding hipudding requested review from ggerganov and slaren August 30, 2025 02:21
@hipudding hipudding merged commit b9382c3 into ggml-org:master Sep 1, 2025
48 checks passed
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants