Skip to content

Commit c991533

Browse files
ysjprojectsBordapre-commit-ci[bot]KaelanDtshijie.yu
authored andcommitted
qwen2.5 long context (Lightning-AI#1933)
Co-authored-by: Jirka B <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: KaelanDt <[email protected]> Co-authored-by: shijie.yu <[email protected]>
1 parent 01edd1b commit c991533

File tree

4 files changed

+60
-1
lines changed

4 files changed

+60
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ Every model is written from scratch to maximize performance and remove layers of
146146
| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
147147
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwen2.5/) |
148148
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186) |
149+
| Qwen2.5 1M (Long Context) | 7B, 14B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwen2.5-1m/) |
149150
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122) |
150151
| QwQ | 32B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwq-32b/) |
151152
| QwQ-Preview | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |

litgpt/config.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2330,6 +2330,53 @@ def norm_class(self) -> Type:
23302330
),
23312331
]
23322332

2333+
qwen_2_5_1m = [
2334+
# https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M/blob/main/config.json
2335+
dict(
2336+
name="Qwen2.5-7B-Instruct-1M",
2337+
hf_config=dict(org="Qwen", name="Qwen2.5-7B-Instruct-1M"),
2338+
block_size=1010000,
2339+
vocab_size=151643,
2340+
padded_vocab_size=152064,
2341+
n_layer=28,
2342+
n_head=28,
2343+
n_embd=3584,
2344+
n_query_groups=4,
2345+
rotary_percentage=1.0,
2346+
parallel_residual=False,
2347+
bias=False,
2348+
attn_bias=True,
2349+
norm_class_name="RMSNorm",
2350+
mlp_class_name="LLaMAMLP",
2351+
intermediate_size=18944,
2352+
norm_eps=1e-5,
2353+
rope_base=10000000,
2354+
),
2355+
# https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-1M/blob/main/config.json
2356+
dict(
2357+
name="Qwen2.5-14B-Instruct-1M",
2358+
hf_config=dict(org="Qwen", name="Qwen2.5-14B-Instruct-1M"),
2359+
block_size=1010000,
2360+
vocab_size=151643,
2361+
padded_vocab_size=152064,
2362+
n_layer=48,
2363+
n_head=40,
2364+
n_embd=5120,
2365+
n_query_groups=8,
2366+
rotary_percentage=1.0,
2367+
parallel_residual=False,
2368+
bias=False,
2369+
attn_bias=True,
2370+
norm_class_name="RMSNorm",
2371+
mlp_class_name="LLaMAMLP",
2372+
intermediate_size=13824,
2373+
norm_eps=1e-5,
2374+
rope_base=10000000,
2375+
),
2376+
]
2377+
2378+
qwen_2_5.extend(qwen_2_5_1m)
2379+
23332380
qwen_2_5_coder = [
23342381
# https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B/blob/main/config.json
23352382
dict(

tests/convert/test_lit_checkpoint.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -605,7 +605,15 @@ def test_check_conversion_supported_lora():
605605

606606
@torch.inference_mode()
607607
@pytest.mark.parametrize(
608-
"model_name", ["Qwen2.5-1.5B", "Qwen2.5-Coder-1.5B", "Qwen2.5-Math-1.5B", "QwQ-32B-Preview", "QwQ-32B"]
608+
"model_name",
609+
(
610+
"Qwen2.5-1.5B",
611+
"Qwen2.5-Coder-1.5B",
612+
"Qwen2.5-Math-1.5B",
613+
"QwQ-32B-Preview",
614+
"QwQ-32B",
615+
"Qwen2.5-7B-Instruct-1M",
616+
),
609617
)
610618
@pytest.mark.parametrize(
611619
("device", "dtype"),

tutorials/download_model_weights.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ LitGPT supports a variety of LLM architectures with publicly available weights.
4444
| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
4545
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwen2.5/) |
4646
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186) |
47+
| Qwen2.5 1M (Long Context) | 7B, 14B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwen2.5-1m/) |
4748
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122) |
4849
| QwQ | 32B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwq-32b/) |
4950
| QwQ-Preview | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/) |
@@ -209,8 +210,10 @@ Qwen/Qwen2.5-3B
209210
Qwen/Qwen2.5-3B-Instruct
210211
Qwen/Qwen2.5-7B
211212
Qwen/Qwen2.5-7B-Instruct
213+
Qwen/Qwen2.5-7B-Instruct-1M
212214
Qwen/Qwen2.5-14B
213215
Qwen/Qwen2.5-14B-Instruct
216+
Qwen/Qwen2.5-14B-Instruct-1M
214217
Qwen/Qwen2.5-32B
215218
Qwen/Qwen2.5-32B-Instruct
216219
Qwen/Qwen2.5-72B

0 commit comments

Comments
 (0)