Skip to content

[RFC] Support new models Qwen3.5-35B-A3B (text input only) #2544

@gali-leilei

Description

@gali-leilei

we have a working implementation internally that support Qwen3.5-35B-A3B for text input only:

  1. bitwise alignment in forward pass vs transformers==5.2.0.
  2. and tested with 8 GPUs with following configuration: (TP, CP) = (1, 1), (2, 1), (1, 2), (2, 2).

Three points I am not sure about:

  • multimodal encoder module and MTP module: it is left out for now.
  • code organization: torchtitan/models or torchtitan/experiments?
  • dependency on flash-linear-attention: it is used for linear attention layer, and tested on both 0.4.0 and 0.4.1.
    • Alternative one is to remove this import. This library is used by slime (ver 0.4.1) and miles (ver 0.4.0) for support Qwen3.5.
    • Alternative two is to guard this import, same treatment as deepep. A pytorch native fallback would still be needed though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions