[model] support Qwen3-235B-A22B-Thinking-2507 (#5113)

Jintao-Huang · Jintao-Huang · commit 76ad21db8beb · 2025-07-28T15:19:00.000+08:00
diff --git a/docs/source/Instruction/支持的模型和数据集.md b/docs/source/Instruction/支持的模型和数据集.md
@@ -214,6 +214,8 @@
 |[swift/Qwen3-235B-A22B-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-AWQ)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|[cognitivecomputations/Qwen3-235B-A22B-AWQ](https://huggingface.co/cognitivecomputations/Qwen3-235B-A22B-AWQ)|
 |[Qwen/Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507)|qwen3_moe|qwen3|transformers>=4.51|&#x2714;|-|[Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)|
 |[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|
+|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507)|qwen3_moe|qwen3|transformers>=4.51|&#x2714;|-|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)|
+|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|
 |[swift/Qwen3-235B-A22B-Instruct-2507-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-Instruct-2507-AWQ)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|-|
 |[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct)|qwen3_moe|qwen3|transformers>=4.51|&#x2714;|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)|
 |[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|
diff --git a/docs/source_en/Instruction/Supported-models-and-datasets.md b/docs/source_en/Instruction/Supported-models-and-datasets.md
@@ -214,6 +214,8 @@ The table below introduces the models integrated with ms-swift:
 |[swift/Qwen3-235B-A22B-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-AWQ)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|[cognitivecomputations/Qwen3-235B-A22B-AWQ](https://huggingface.co/cognitivecomputations/Qwen3-235B-A22B-AWQ)|
 |[Qwen/Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507)|qwen3_moe|qwen3|transformers>=4.51|&#x2714;|-|[Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)|
 |[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|[Qwen/Qwen3-235B-A22B-Instruct-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8)|
+|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507)|qwen3_moe|qwen3|transformers>=4.51|&#x2714;|-|[Qwen/Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)|
+|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|[Qwen/Qwen3-235B-A22B-Thinking-2507-FP8](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8)|
 |[swift/Qwen3-235B-A22B-Instruct-2507-AWQ](https://modelscope.cn/models/swift/Qwen3-235B-A22B-Instruct-2507-AWQ)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|-|-|
 |[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct)|qwen3_moe|qwen3|transformers>=4.51|&#x2714;|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)|
 |[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|qwen3_moe|qwen3|transformers>=4.51|&#x2718;|coding|[Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8)|
diff --git a/swift/llm/model/model/qwen.py b/swift/llm/model/model/qwen.py
@@ -555,6 +555,8 @@ def _get_cast_dtype(self) -> torch.dtype:
             ModelGroup([
                 Model('Qwen/Qwen3-235B-A22B-Instruct-2507', 'Qwen/Qwen3-235B-A22B-Instruct-2507'),
                 Model('Qwen/Qwen3-235B-A22B-Instruct-2507-FP8', 'Qwen/Qwen3-235B-A22B-Instruct-2507-FP8'),
+                Model('Qwen/Qwen3-235B-A22B-Thinking-2507', 'Qwen/Qwen3-235B-A22B-Thinking-2507'),
+                Model('Qwen/Qwen3-235B-A22B-Thinking-2507-FP8', 'Qwen/Qwen3-235B-A22B-Thinking-2507-FP8'),
                 # awq
                 Model('swift/Qwen3-235B-A22B-Instruct-2507-AWQ'),
             ]),