Skip to content

Commit 5ca893d

Browse files
ysjprojectsBordapre-commit-ci[bot]
authored
QwQ-32B (#1952)
Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Jirka B <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 7789e82 commit 5ca893d

File tree

5 files changed

+33
-5
lines changed

5 files changed

+33
-5
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,8 @@ Every model is written from scratch to maximize performance and remove layers of
143143
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwen2.5/) |
144144
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186) |
145145
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122) |
146-
| QwQ | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
146+
| QwQ | 32B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwq-32b/) |
147+
| QwQ-Preview | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
147148
| R1 Distill Llama | 8B, 70B | DeepSeek AI | [DeepSeek AI 2025](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) |
148149
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | [Hugging Face 2024](https://github.com/huggingface/smollm) |
149150
| Salamandra | 2B, 7B | Barcelona Supercomputing Centre | [BSC-LTC 2024](https://github.com/BSC-LTC/salamandra) |

litgpt/config.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2267,11 +2267,32 @@ def norm_class(self) -> Type:
22672267
configs.append(copy)
22682268

22692269
qwq = [
2270+
# https://huggingface.co/Qwen/QwQ-32B/blob/main/config.json
2271+
dict(
2272+
name="QwQ-32B",
2273+
hf_config=dict(org="Qwen", name="QwQ-32B"),
2274+
block_size=131072,
2275+
vocab_size=151643,
2276+
padded_vocab_size=152064,
2277+
n_layer=64,
2278+
n_head=40,
2279+
n_embd=5120,
2280+
n_query_groups=8,
2281+
rotary_percentage=1.0,
2282+
parallel_residual=False,
2283+
bias=False,
2284+
attn_bias=True,
2285+
norm_class_name="RMSNorm",
2286+
mlp_class_name="LLaMAMLP",
2287+
intermediate_size=27648,
2288+
norm_eps=1e-5,
2289+
rope_base=1000000,
2290+
),
22702291
# https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/config.json
22712292
dict(
22722293
name="QwQ-32B-Preview",
22732294
hf_config=dict(org="Qwen", name="QwQ-32B-Preview"),
2274-
block_size=131072,
2295+
block_size=32768,
22752296
vocab_size=151643,
22762297
padded_vocab_size=152064,
22772298
n_layer=64,

tests/convert/test_lit_checkpoint.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -529,7 +529,9 @@ def test_check_conversion_supported_lora():
529529

530530

531531
@torch.inference_mode()
532-
@pytest.mark.parametrize("model_name", ("Qwen2.5-1.5B", "Qwen2.5-Coder-1.5B", "Qwen2.5-Math-1.5B", "QwQ-32B-Preview"))
532+
@pytest.mark.parametrize(
533+
"model_name", ["Qwen2.5-1.5B", "Qwen2.5-Coder-1.5B", "Qwen2.5-Math-1.5B", "QwQ-32B-Preview", "QwQ-32B"]
534+
)
533535
@pytest.mark.parametrize(
534536
("device", "dtype"),
535537
[

tests/test_model.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -800,7 +800,9 @@ def test_against_original_gemma_2(model_name, device, dtype):
800800

801801

802802
@torch.inference_mode()
803-
@pytest.mark.parametrize("model_name", ("Qwen2.5-1.5B", "Qwen2.5-Coder-1.5B", "Qwen2.5-Math-1.5B", "QwQ-32B-Preview"))
803+
@pytest.mark.parametrize(
804+
"model_name", ["Qwen2.5-1.5B", "Qwen2.5-Coder-1.5B", "Qwen2.5-Math-1.5B", "QwQ-32B-Preview", "QwQ-32B"]
805+
)
804806
@pytest.mark.parametrize(
805807
("device", "dtype"),
806808
[

tutorials/download_model_weights.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ LitGPT supports a variety of LLM architectures with publicly available weights.
4141
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwen2.5/) |
4242
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186) |
4343
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122) |
44-
| QwQ | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
44+
| QwQ | 32B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwq-32b/) |
45+
| QwQ-Preview | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
4546
| R1 Distll Llama | 8B, 70B | DeepSeek AI | [DeepSeek AI 2025](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) |
4647
| RedPajama-INCITE | 3B, 7B | Together | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
4748
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | [Hugging Face 2024](https://github.com/huggingface/smollm) |
@@ -223,6 +224,7 @@ Qwen/Qwen2.5-Math-7B
223224
Qwen/Qwen2.5-Math-7B-Instruct
224225
Qwen/Qwen2.5-Math-72B
225226
Qwen/Qwen2.5-Math-72B-Instruct
227+
Qwen/QwQ-32B
226228
Qwen/QwQ-32B-Preview
227229
stabilityai/FreeWilly2
228230
stabilityai/stable-code-3b

0 commit comments

Comments
 (0)