AMX high-performance operators (BF16, Int8; an int4 variant is coming soon), #1368

voipmonitor · 2025-06-06T11:08:56Z

voipmonitor
Jun 6, 2025

Hi,

I have a question regarding the AMX support for Qwen3-235B-A22B-FP8 - is this even planned to support such model for the AMX? (FP16 is too much for my 256GB server)
int4 variant - does it mean that the AMX will support also Qwen3-235B-A22B-GGUF Q4_K_M ?

CodeZ-Hao · 2025-06-13T08:50:01Z

CodeZ-Hao
Jun 13, 2025

Hello, asually the amx support is ready with Qwen3-235B-A22B, not special in FP16 or FP8. And supported BF16 and int8 (this may seems like fp8), but the quantify was not from pre-quantization gguf. Alternatively, it load from fp16-gguf and quantify to int8 (if you config in yaml). That means if you need int8 or int4 (not support yet) with amx , you should also load model from fp16-gguf.
And the amx support seems would takes up more memory than expected, i tried AMXInt8 with Qwen3-235B-A22B, it takes about 380 GB memory, so 256 GB was not enough for that (that may be a bug).

5 replies

voipmonitor Jun 13, 2025
Author

thank you, in the meantime I have upgraded RAM and was able to run amx

mtcl Jun 14, 2025

thank you, in the meantime I have upgraded RAM and was able to run amx

How are you able to do this? I am unable to do this? Can you list out the model tha tyou used and exact steps on how you ran this?

voipmonitor Jun 14, 2025
Author

thank you, in the meantime I have upgraded RAM and was able to run amx

How are you able to do this? I am unable to do this? Can you list out the model tha tyou used and exact steps on how you ran this?

TORCH_CUDA_ARCH_LIST='12.0' python -m ktransformers.server.main --architectures Qwen3MoeForCausalLM --model_path /mnt/models/Qwen/Qwen3-235B-A22B/ --gguf_path /mnt/models/unsloth/Qwen3-235B-A22B-GGUF-BF16/BF16/Qwen3-235B-A22B-BF16-0001
0-of-00010.gguf --optimize_config_path /root/ktransformers/ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml --backend_type balance_serve --cache_lens 32768 --chunk_size 4096 --max_batch_size 8 --model_name "unsloth/Qwen3-235B-A22B-GGUF-BF16"

in Qwen3Moe-serve-amx.yaml change the baclend to the backend: "AMXInt8"

10.801 tokens/s

mtcl Jun 14, 2025

thank you, in the meantime I have upgraded RAM and was able to run amx

How are you able to do this? I am unable to do this? Can you list out the model tha tyou used and exact steps on how you ran this?

TORCH_CUDA_ARCH_LIST='12.0' python -m ktransformers.server.main --architectures Qwen3MoeForCausalLM --model_path /mnt/models/Qwen/Qwen3-235B-A22B/ --gguf_path /mnt/models/unsloth/Qwen3-235B-A22B-GGUF-BF16/BF16/Qwen3-235B-A22B-BF16-0001
0-of-00010.gguf --optimize_config_path /root/ktransformers/ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml --backend_type balance_serve --cache_lens 32768 --chunk_size 4096 --max_batch_size 8 --model_name "unsloth/Qwen3-235B-A22B-GGUF-BF16"

in Qwen3Moe-serve-amx.yaml change the baclend to the backend: "AMXInt8"

10.801 tokens/s

I see that you have not defined the port, what port does the server start at? Mine gets stuck with this command and never says server started message.

voipmonitor Jun 14, 2025
Author

I see that you have not defined the port, what port does the server start at? Mine gets stuck with this command and never says server started message.

10002, but once it starts you clearly see it. Post your full log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMX high-performance operators (BF16, Int8; an int4 variant is coming soon), #1368

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

AMX high-performance operators (BF16, Int8; an int4 variant is coming soon), #1368

Uh oh!

voipmonitor Jun 6, 2025

Replies: 1 comment · 5 replies

Uh oh!

CodeZ-Hao Jun 13, 2025

Uh oh!

voipmonitor Jun 13, 2025 Author

Uh oh!

mtcl Jun 14, 2025

Uh oh!

Uh oh!

voipmonitor Jun 14, 2025 Author

Uh oh!

mtcl Jun 14, 2025

Uh oh!

voipmonitor Jun 14, 2025 Author

voipmonitor
Jun 6, 2025

Replies: 1 comment 5 replies

CodeZ-Hao
Jun 13, 2025

voipmonitor Jun 13, 2025
Author

voipmonitor Jun 14, 2025
Author

voipmonitor Jun 14, 2025
Author