AMX Improved Performance #2

trilog-inc · 2025-09-14T16:27:18Z

trilog-inc
Sep 14, 2025

Hi,

I went a head and rebuilt the project with the build parameters your listed:

cmake -B build -DGGML_NATIVE=ON -DGGML_CUDA=ON -DGGML_AMX_TILE=ON -DGGML_AMX_INT8=ON -DGGML_AMX_BF16=ON

the TG seems to have increased significantly from 7.5 to 10.74 ( ~45% increase ). Prompt eval didn't really budge.


prompt eval time =    1038.92 ms /    16 tokens (   64.93 ms per token,    15.40 tokens per second)
       eval time =  117149.82 ms /  1258 tokens (   93.12 ms per token,    10.74 tokens per second)

prompt eval time =   24999.25 ms /   704 tokens (   35.51 ms per token,    28.16 tokens per second)
       eval time =  492760.91 ms /  5083 tokens (   96.94 ms per token,    10.32 tokens per second)

Using this command:

./build/bin/llama-server --model /mnt/home_extend/models/unsloth_DeepSeek-V3.1-GGUF/UD-Q4_K_XL/DeepSeek-V3.1-UD-Q4_K_XL-00001-of-00008.gguf --alias ds3.1 --threads 44 --ctx-size 100000 --n-gpu-layers 99 --cpu-moe --temp 0.6 --top-p 0.95 -fa 1 --host 0.0.0.0 --jinja --port 8099 --threads 44 --amx -ub 8192 -b 8192

Very interesting! Going to keep testing. Maybe this can be merged with ik_llama to futher increase performance of MOE models..

Gadflyii · 2025-09-15T13:34:45Z

Gadflyii
Sep 15, 2025
Maintainer

Thanks!

Nice to see such an improvement from just running the moe on the CPU while keeping everything else on the GPU.

Thanks again for testing this out.

0 replies

trilog-inc · 2025-10-03T02:09:27Z

trilog-inc
Oct 3, 2025
Author

@Gadflyii Hi! can you sync up with mainline llama.cpp? I would like to test the new GLM4.6 model

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMX Improved Performance #2

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

AMX Improved Performance #2

Uh oh!

trilog-inc Sep 14, 2025

Replies: 2 comments

Uh oh!

Gadflyii Sep 15, 2025 Maintainer

Uh oh!

trilog-inc Oct 3, 2025 Author

trilog-inc
Sep 14, 2025

Gadflyii
Sep 15, 2025
Maintainer

trilog-inc
Oct 3, 2025
Author