AMX QWEN support #1356
              
                Unanswered
              
          
                  
                    
                      voipmonitor
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment 3 replies
-
| 
         Hey @voipmonitor I have been struggling with this as well. I have been unable to find this information.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    3 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/AMX.md there is "Note: At present, Qwen3MoE running with AMX can only read BF16 GGUF; support for loading from safetensor will be added later." which confuses me - does it mean, that we are not able to run 4bit quantisied versions of QWEN using AMX feature? Is it possible to use Qwen3-235B-A22B-GGUF with AMX? BF16 version is around 450GB - can anyone point me to the repo with GGUF which I can run with AMX --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml please?
edit: I have figured it out. At the moment the AMX backend can read only FP16 model variant. But the backend engine can be switched to the "AMXInt8" or "AMXBF16" in the file ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml
Beta Was this translation helpful? Give feedback.
All reactions