AMX high-performance operators (BF16, Int8; an int4 variant is coming soon), #1368
voipmonitor
started this conversation in
General
Replies: 1 comment 5 replies
-
Hello, asually the amx support is ready with Qwen3-235B-A22B, not special in FP16 or FP8. And supported BF16 and int8 (this may seems like fp8), but the quantify was not from pre-quantization gguf. Alternatively, it load from fp16-gguf and quantify to int8 (if you config in yaml). That means if you need int8 or int4 (not support yet) with amx , you should also load model from fp16-gguf. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a question regarding the AMX support for Qwen3-235B-A22B-FP8 - is this even planned to support such model for the AMX? (FP16 is too much for my 256GB server)
int4 variant - does it mean that the AMX will support also Qwen3-235B-A22B-GGUF Q4_K_M ?
Beta Was this translation helpful? Give feedback.
All reactions