Skip to content

Commit 82d6c3b

Browse files
authored
[DSV3] Upgrade to DeepSeek-V3.1 (#1609)
Tested Loading weights from https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base <img width="1296" height="605" alt="Screenshot 2025-08-20 at 10 28 20 PM" src="https://github.com/user-attachments/assets/cc5bc9ef-0afd-45c9-bdf6-7cf36d9729e8" />
1 parent 08b8b24 commit 82d6c3b

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

torchtitan/models/deepseek_v3/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ DeepSeek-V3 is a Mixture-of-Experts (MoE) transformer model with Multi-head Late
88

99
```bash
1010
# DeepSeek 671B tokenizer (automatically downloads tokenizer.json and tokenizer_config.json)
11-
python scripts/download_hf_assets.py --repo_id deepseek-ai/DeepSeek-V3 --assets tokenizer
11+
python scripts/download_hf_assets.py --repo_id deepseek-ai/DeepSeek-V3.1-Base --assets tokenizer
1212
```
1313

1414
```bash

torchtitan/models/deepseek_v3/train_configs/deepseek_v3_671b.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ enable_wandb = false
2020
[model]
2121
name = "deepseek_v3"
2222
flavor = "671B"
23-
hf_assets_path = "./assets/hf/DeepSeek-V3"
23+
hf_assets_path = "./assets/hf/DeepSeek-V3.1-Base"
2424
# converters = ["float8"]
2525

2626
[optimizer]

0 commit comments

Comments
 (0)