-
Notifications
You must be signed in to change notification settings - Fork 749
Open
Description
we have a working implementation internally that support Qwen3.5-35B-A3B for text input only:
- bitwise alignment in forward pass vs
transformers==5.2.0. - and tested with 8 GPUs with following configuration:
(TP, CP) = (1, 1), (2, 1), (1, 2), (2, 2).
Three points I am not sure about:
- multimodal encoder module and MTP module: it is left out for now.
- code organization:
torchtitan/modelsortorchtitan/experiments? - dependency on flash-linear-attention: it is used for linear attention layer, and tested on both
0.4.0and0.4.1.- Alternative one is to remove this import. This library is used by
slime(ver0.4.1) andmiles(ver0.4.0) for support Qwen3.5. - Alternative two is to guard this import, same treatment as
deepep. A pytorch native fallback would still be needed though.
- Alternative one is to remove this import. This library is used by
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels