Skip to content

Commit 76ad21d

Browse files
committed
[model] support Qwen3-235B-A22B-Thinking-2507 (#5113)
1 parent 138e40e commit 76ad21d

File tree

3 files changed

+6
-0
lines changed

3 files changed

+6
-0
lines changed

docs/source/Instruction/支持的模型和数据集.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,8 @@
214214
|[swift/Qwen3-235B-A22B-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-AWQ)|qwen3_moe|qwen3|transformers>=4.51|✘|-|[cognitivecomputations/Qwen3-235B-A22B-AWQ](https://huggingface.co/cognitivecomputations/Qwen3-235B-A22B-AWQ)|
215215
|[Qwen/Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507)|qwen3_moe|qwen3|transformers>=4.51|✔|-|[Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)|
216216
|[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|✘|-|[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|
217+
|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507)|qwen3_moe|qwen3|transformers>=4.51|✔|-|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)|
218+
|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|✘|-|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|
217219
|[swift/Qwen3-235B-A22B-Instruct-2507-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-Instruct-2507-AWQ)|qwen3_moe|qwen3|transformers>=4.51|✘|-|-|
218220
|[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct)|qwen3_moe|qwen3|transformers>=4.51|✔|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)|
219221
|[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|qwen3_moe|qwen3|transformers>=4.51|✘|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|

docs/source_en/Instruction/Supported-models-and-datasets.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,8 @@ The table below introduces the models integrated with ms-swift:
214214
|[swift/Qwen3-235B-A22B-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-AWQ)|qwen3_moe|qwen3|transformers>=4.51|✘|-|[cognitivecomputations/Qwen3-235B-A22B-AWQ](https://huggingface.co/cognitivecomputations/Qwen3-235B-A22B-AWQ)|
215215
|[Qwen/Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507)|qwen3_moe|qwen3|transformers>=4.51|✔|-|[Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)|
216216
|[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|✘|-|[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|
217+
|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507)|qwen3_moe|qwen3|transformers>=4.51|✔|-|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)|
218+
|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|✘|-|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|
217219
|[swift/Qwen3-235B-A22B-Instruct-2507-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-Instruct-2507-AWQ)|qwen3_moe|qwen3|transformers>=4.51|✘|-|-|
218220
|[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct)|qwen3_moe|qwen3|transformers>=4.51|✔|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)|
219221
|[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|qwen3_moe|qwen3|transformers>=4.51|✘|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|

swift/llm/model/model/qwen.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -555,6 +555,8 @@ def _get_cast_dtype(self) -> torch.dtype:
555555
ModelGroup([
556556
Model('Qwen/Qwen3-235B-A22B-Instruct-2507', 'Qwen/Qwen3-235B-A22B-Instruct-2507'),
557557
Model('Qwen/Qwen3-235B-A22B-Instruct-2507-FP8', 'Qwen/Qwen3-235B-A22B-Instruct-2507-FP8'),
558+
Model('Qwen/Qwen3-235B-A22B-Thinking-2507', 'Qwen/Qwen3-235B-A22B-Thinking-2507'),
559+
Model('Qwen/Qwen3-235B-A22B-Thinking-2507-FP8', 'Qwen/Qwen3-235B-A22B-Thinking-2507-FP8'),
558560
# awq
559561
Model('swift/Qwen3-235B-A22B-Instruct-2507-AWQ'),
560562
]),

0 commit comments

Comments
 (0)