[RFC] Support new models Qwen3.5-35B-A3B (text input only)

we have a working implementation internally that support `Qwen3.5-35B-A3B` for text input only:
1. bitwise alignment in forward pass vs `transformers==5.2.0`.
2. and tested with 8 GPUs with following configuration: `(TP, CP) = (1, 1), (2, 1), (1, 2), (2, 2)`.

Three points I am not sure about:
- multimodal encoder module and MTP module: it is left out for now. 
- code organization: `torchtitan/models` or `torchtitan/experiments`?
- dependency on [flash-linear-attention](https://github.com/fla-org/flash-linear-attention):  it is used for linear attention layer, and tested on both `0.4.0` and `0.4.1`. 
	- Alternative one is to remove this import. This library is used by `slime` (ver `0.4.1`) and `miles` (ver `0.4.0`) for support Qwen3.5.
	- Alternative two is to guard this import, same treatment as `deepep`. A pytorch native fallback would still be needed though.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Support new models Qwen3.5-35B-A3B (text input only) #2544

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Support new models Qwen3.5-35B-A3B (text input only) #2544

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions