Fused Qwen3 MoE layer with LoRA #2622
Replies: 3 comments 3 replies
-
Nice, thanks for the info. Out of curiosity, what part of the MoE layer requires special handling? |
Beta Was this translation helpful? Give feedback.
-
The Qwen3 MoE model (and all other MoE models) in HF Transformers is notoriously slow, because there is a for loop in The critical part of my repo is to implement the
Currently I've only written a PyTorch implementation, which allocates an intermediate array of shape It should be possible to write a Triton kernel and not allocate any intermediate memory. An AI can quickly write the kernel but I still need to optimize a few things like the layout arrangements. (Update: This is done!) The rest are some boilerplate code to make it compatible with the Qwen3 MoE model in HF Transformers. I've written a |
Beta Was this translation helpful? Give feedback.
-
Oh hi hi! Again nice work on this! Also thanks for utilizing Unsloth kernels - we haven't released or announced them yet, so always cool to see community members utilizing them! Just a small note the MoE kernels are licensed as AGPLv3 - we decided to make Unsloth dual licensed so all code under the kernels folder is AGPLv3. The main reason is because many other packages and companies copy and paste kernels from our repos without any credit ( ie no acknowledgements and license copyright mentions ), and so we tried doing linking via LGPL with no success, since people would sneakily fork the LGPL package and link to their fork. More details here: unslothai/unsloth#2890 (reply in thread) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
https://github.com/woct0rdho/transformers-qwen3-moe-fused
I'm working on implementing a fused Qwen3 MoE layer, which focuses on fine-tuning on a single GPU, while being compatible with the HF Transformers ecosystem. Just want to let you know that it's a use case of the LoRA dynamic dispatch API.
Beta Was this translation helpful? Give feedback.
All reactions