Skip to content

Commit 34bb0cd

Browse files
Lyken17Copilot
andauthored
Update deprecated huggingface-cli and fix broken links (#1147)
Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Lyken17 <[email protected]>
1 parent 0850fbe commit 34bb0cd

21 files changed

+65
-65
lines changed

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# slime Documentation
22

3-
We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase.
3+
We recommend new contributors start from writing documentation, which helps you quickly understand slime codebase.
44
Most documentation files are located under the `docs/` folder.
55

66
## Docs Workflow

docs/en/examples/deepseek-r1.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ Regarding parallelism, for sglang we will enable EP64, activate dp attention, an
1111

1212
## Environment Setup
1313

14-
For instructions on setting up the environment and downloading data, please refer to [Example: Qwen3-4B](./qwen3-4B.md).
14+
For instructions on setting up the environment and downloading data, please refer to [Example: Qwen3-4B](qwen3-4B.md).
1515

1616
To prepare the DeepSeek R1 checkpoint, first you will need to download DeepSeek-R1 to a directory accessible by all machines (hereinafter referred to as `$BASE_DIR`):
1717

1818
```bash
19-
huggingface-cli download deepseek-ai/DeepSeek-R1 --local-dir $BASE_DIR/DeepSeek-R1
19+
hf download deepseek-ai/DeepSeek-R1 --local-dir $BASE_DIR/DeepSeek-R1
2020
```
2121

2222
The Hugging Face checkpoint for DeepSeek-R1 is in a block-quantized fp8 format. To convert it into a torch_dist format that Megatron can load, you first need to convert it to a bf16 Hugging Face checkpoint:
@@ -85,7 +85,7 @@ SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
8585
source "${SCRIPT_DIR}/models/deepseek-v3.sh"
8686
```
8787
88-
This reads the model's config from [scripts/models/deepseek-v3.sh](../../../scripts/models/deepseek-v3.sh). These configs are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](../../../scripts/models/).
88+
This reads the model's config from [scripts/models/deepseek-v3.sh](https://github.com/THUDM/slime/blob/main/scripts/models/deepseek-v3.sh). These configs are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](https://github.com/THUDM/slime/tree/main/scripts/models/).
8989
9090
#### CKPT\_ARGS
9191

docs/en/examples/glm4-9B.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,14 @@ Download the model and data:
1515

1616
```bash
1717
# hf checkpoint
18-
huggingface-cli download zai-org/GLM-Z1-9B-0414 --local-dir /root/GLM-Z1-9B-0414
18+
hf download zai-org/GLM-Z1-9B-0414 --local-dir /root/GLM-Z1-9B-0414
1919

2020
# train data
21-
huggingface-cli download --repo-type dataset zhuzilin/dapo-math-17k \
21+
hf download --repo-type dataset zhuzilin/dapo-math-17k \
2222
--local-dir /root/dapo-math-17k
2323

2424
# eval data
25-
huggingface-cli download --repo-type dataset zhuzilin/aime-2024 \
25+
hf download --repo-type dataset zhuzilin/aime-2024 \
2626
--local-dir /root/aime-2024
2727
```
2828

@@ -49,7 +49,7 @@ bash scripts/run-glm4-9B.sh
4949

5050
### Parameter Introduction
5151

52-
Here, we will briefly introduce the various components of the [run-glm4-9B.sh](../../../scripts/run-glm4-9B.sh) script:
52+
Here, we will briefly introduce the various components of the [run-glm4-9B.sh](https://github.com/THUDM/slime/blob/main/scripts/run-glm4-9B.sh) script:
5353

5454
#### MODEL\_ARGS
5555

@@ -58,7 +58,7 @@ SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
5858
source "${SCRIPT_DIR}/models/glm4-9B.sh"
5959
```
6060

61-
Reads the model's config from [scripts/models/glm4-9B.sh](../../../scripts/models/glm4-9B.sh). These configs are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](../../../scripts/models/).
61+
Reads the model's config from [scripts/models/glm4-9B.sh](https://github.com/THUDM/slime/blob/main/scripts/models/glm4-9B.sh). These configs are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](https://github.com/THUDM/slime/tree/main/scripts/models/).
6262

6363
⚠️ Ensure that settings such as `--rotary-base` in the model configuration file match the settings of the model you are currently training. This is because different models, even with the same architecture, might use different values. If needed, you can override these parameters in your script after loading the model weights. For instance:
6464

docs/en/examples/glm4.5-355B-A32B.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ This is an example of doing GLM-4.5 RL training using 64xH100 GPUs.
55

66
## Environment Setup
77

8-
For instructions on setting up the environment and downloading data, please refer to [Example: Qwen3-4B](./qwen3-4B.md).
8+
For instructions on setting up the environment and downloading data, please refer to [Example: Qwen3-4B](qwen3-4B.md).
99

1010
First, you will need to download GLM-4.5 to a directory accessible by all machines (hereinafter referred to as `$BASE_DIR`):
1111

1212
```bash
13-
huggingface-cli download zai-org/GLM-4.5 --local-dir $BASE_DIR/GLM-4.5-355B-A32B
13+
hf download zai-org/GLM-4.5 --local-dir $BASE_DIR/GLM-4.5-355B-A32B
1414
```
1515

1616
Next, we need to convert the huggingface checkpoint into the torch_dist format with 2 nodes, each with 8 GPUs:
@@ -66,7 +66,7 @@ SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
6666
source "${SCRIPT_DIR}/models/glm4.5-355B-A32B.sh"
6767
```
6868
69-
This reads the model's config from [scripts/models/glm4.5-355B-A32B.sh](../../../scripts/models/glm4.5-355B-A32B.sh). These configs are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](../../../scripts/models/).
69+
This reads the model's config from [scripts/models/glm4.5-355B-A32B.sh](https://github.com/THUDM/slime/blob/main/scripts/models/glm4.5-355B-A32B.sh). These configs are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](https://github.com/THUDM/slime/tree/main/scripts/models/).
7070
7171
#### PERF\_ARGS
7272

docs/en/examples/qwen3-30B-A3B.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
## Environment Preparation
55

6-
The environment setup, model download, data, and checkpoint conversion are the same as for the Qwen3-4B model. You can refer to [Example: Qwen3-4B Model](./qwen3-4B.md), replacing mentions of Qwen3-4B with Qwen3-30B-A3B.
6+
The environment setup, model download, data, and checkpoint conversion are the same as for the Qwen3-4B model. You can refer to [Example: Qwen3-4B Model](qwen3-4B.md), replacing mentions of Qwen3-4B with Qwen3-30B-A3B.
77

88
To convert huggingface checkpoint to torch_dist, please try:
99

@@ -29,7 +29,7 @@ bash scripts/run-qwen3-30B-A3B.sh
2929

3030
### Parameter Introduction
3131

32-
Here, we will briefly introduce the MoE-related parts in the [run-qwen3-30B-A3B.sh](../../../scripts/run-qwen3-30B-A3B.sh) script.
32+
Here, we will briefly introduce the MoE-related parts in the [run-qwen3-30B-A3B.sh](https://github.com/THUDM/slime/blob/main/scripts/run-qwen3-30B-A3B.sh) script.
3333

3434
1. To support running Qwen3-30B-A3B in an 8xH800 environment, we need to enable Megatron's CPU Adam to save GPU memory. The corresponding configuration is:
3535

@@ -79,7 +79,7 @@ Here, we will briefly introduce the MoE-related parts in the [run-qwen3-30B-A3B.
7979
slime also supports BF16 training with FP8 inference. For the Qwen3-30B-A3B model, you just need to download the following model:
8080

8181
```bash
82-
huggingface-cli download Qwen/Qwen3-30B-A3B-FP8 --local-dir /root/Qwen3-30B-A3B-FP8
82+
hf download Qwen/Qwen3-30B-A3B-FP8 --local-dir /root/Qwen3-30B-A3B-FP8
8383
```
8484

8585
And replace `--hf-checkpoint` with:

docs/en/examples/qwen3-4B.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,14 @@ Download the model and data:
1515

1616
```bash
1717
# hf checkpoint
18-
huggingface-cli download Qwen/Qwen3-4B --local-dir /root/Qwen3-4B
18+
hf download Qwen/Qwen3-4B --local-dir /root/Qwen3-4B
1919

2020
# train data
21-
huggingface-cli download --repo-type dataset zhuzilin/dapo-math-17k \
21+
hf download --repo-type dataset zhuzilin/dapo-math-17k \
2222
--local-dir /root/dapo-math-17k
2323

2424
# eval data
25-
huggingface-cli download --repo-type dataset zhuzilin/aime-2024 \
25+
hf download --repo-type dataset zhuzilin/aime-2024 \
2626
--local-dir /root/aime-2024
2727
```
2828

@@ -49,7 +49,7 @@ bash scripts/run-qwen3-4B.sh
4949

5050
### Parameter Introduction
5151

52-
Here, we will briefly introduce the various components of the [run-qwen3-4B.sh](../../../scripts/run-qwen3-4B.sh) script:
52+
Here, we will briefly introduce the various components of the [run-qwen3-4B.sh](https://github.com/THUDM/slime/blob/main/scripts/run-qwen3-4B.sh) script:
5353

5454
#### MODEL\_ARGS
5555

@@ -58,7 +58,7 @@ SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
5858
source "${SCRIPT_DIR}/models/qwen3-4B.sh"
5959
```
6060

61-
This reads the model's configuration from [scripts/models/qwen3-4B.sh](../../../scripts/models/qwen3-4B.sh). These are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](../../../scripts/models/).
61+
This reads the model's configuration from [scripts/models/qwen3-4B.sh](https://github.com/THUDM/slime/blob/main/scripts/models/qwen3-4B.sh). These are all Megatron parameters. When training with Megatron, it cannot read the model config from the checkpoint, so we need to configure it ourselves. We provide some examples in [scripts/models](https://github.com/THUDM/slime/tree/main/scripts/models/).
6262

6363
⚠️ Ensure that settings such as `--rotary-base` in the model configuration file match the settings of the model you are currently training. This is because different models, even with the same architecture, might use different values. If needed, you can override these parameters in your script after loading the model weights. For instance:
6464

docs/en/examples/qwen3-4b-base-openhermes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
## Environment Preparation
55

6-
First, we need to create a mirror environment and convert the `Qwen3-4B-Base` model by following the [Example: Qwen3-4B Model](./models/qwen3-4B.md).
6+
First, we need to create a mirror environment and convert the `Qwen3-4B-Base` model by following the [Example: Qwen3-4B Model](qwen3-4B.md).
77

88
After that, we will process the SFT data. Here, we use the classic [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) as an example. First, we process the data into a format suitable for `slime` to load. You can use the following script to add a column that conforms to the OpenAI message format and save it to `/root/openhermes2_5.parquet`.
99

@@ -50,7 +50,7 @@ bash script/run-qwen3-4B-base-sft.sh
5050

5151
### Parameter Introduction
5252

53-
You can compare [run-qwen3-4B-base-sft.sh](../../scripts/run-qwen3-4B.sh) with [run-qwen3-4B.sh](../../scripts/run-qwen3-4B.sh). You will find that besides changing the model from the instruct version to the base model, the main adjustments are as follows:
53+
You can compare [run-qwen3-4B-base-sft.sh](https://github.com/THUDM/slime/blob/main/scripts/run-qwen3-4B-base-sft.sh) with [run-qwen3-4B.sh](https://github.com/THUDM/slime/blob/main/scripts/run-qwen3-4B.sh). You will find that besides changing the model from the instruct version to the base model, the main adjustments are as follows:
5454

5555
1. Removed `SGLANG_ARGS` and `GRPO_ARGS`. This is because it is not necessary to start SGLang or configure GRPO-related settings during the SFT process.
5656

docs/en/get_started/qa.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949

5050
9. **My gradient norm is very high and the training crashes. What should I do?**
5151

52-
First, ensure that your data and model are compatible. For example, if your data already uses a chat template, check if this template matches the one used by the original model. If the data is correct, please refer to our [Debug Guide](./debug.md) for a more in-depth analysis.
52+
First, ensure that your data and model are compatible. For example, if your data already uses a chat template, check if this template matches the one used by the original model. If the data is correct, please refer to our [Debug Guide](../developer_guide/debug.md) for a more in-depth analysis.
5353

5454
10. **My sglang generation takes an extremely long time, GPU power is maxed out, and there's no output for a long while. Why?**
5555

docs/en/get_started/quick_start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -571,5 +571,5 @@ ray job submit --address="http://127.0.0.1:8265" \
571571

572572
slime has been deeply optimized for distributed training of large-scale Mixture of Experts (MoE) models. We provide some end-to-end training cases for reference:
573573

574-
- [Example: 64xH100 Training GLM-4.5](models/glm4.5-355B-A32B.md)
575-
- [Example: 128xH100 Training DeepSeek-R1](models/deepseek-r1.md)
574+
- [Example: 64xH100 Training GLM-4.5](../examples/glm4.5-355B-A32B.md)
575+
- [Example: 128xH100 Training DeepSeek-R1](../examples/deepseek-r1.md)

docs/en/get_started/usage.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ MODEL_ARGS=(
6767
)
6868
```
6969

70-
We provide configurations for common models in [scripts/models](../../scripts/models), which you can reuse directly. If you are also using Megatron for pre-training/SFT, you can directly reuse the model configurations from your pre-training/SFT setup.
70+
We provide configurations for common models in [scripts/models](../../../scripts/models), which you can reuse directly. If you are also using Megatron for pre-training/SFT, you can directly reuse the model configurations from your pre-training/SFT setup.
7171

7272
Note:
7373

@@ -99,7 +99,7 @@ Megatron supports several of its custom checkpoint formats. Here are two of the
9999

100100
The `torch` format is Megatron's older storage format. Its structure consists of directories like `mp_rank_xxx`, where each directory corresponds to the checkpoint stored by each rank under a specific parallel partitioning. Because of this, when loading a `torch` format checkpoint, you must ensure that the checkpoint's parallelism strategy matches that of the training task.
101101

102-
We recommend using the `torch_dist` format because it supports automatic parallel sharding, meaning that training tasks with different parallelism settings can share the same checkpoint, which is much more convenient. `torch_dist` is also the default format in the open-source Megatron. A `torch_dist` format checkpoint typically contains a set of `.distcp` files. When using `torch_dist`, you can convert from Hugging Face to `torch_dist` and vice versa using the checkpoint conversion method described in the [README](../../README.md).
102+
We recommend using the `torch_dist` format because it supports automatic parallel sharding, meaning that training tasks with different parallelism settings can share the same checkpoint, which is much more convenient. `torch_dist` is also the default format in the open-source Megatron. A `torch_dist` format checkpoint typically contains a set of `.distcp` files. When using `torch_dist`, you can convert from Hugging Face to `torch_dist` and vice versa using the checkpoint conversion method described in the [README](../../../README.md).
103103

104104
In terms of storage structure, a Megatron checkpoint typically looks like this, assuming the storage path is `/ckpt/`:
105105

@@ -183,7 +183,7 @@ Additionally, we provide a `metadata_key`, which defaults to `"metadata"`. When
183183

184184
slime supports customizing data generation (rollout) to various degrees.
185185

186-
- By default, it uses the `generate_rollout` function from [slime/rollout/sglang\_example.py](../../slime/rollout/sglang_rollout.py) for data generation. This file implements an asynchronous (asyncio) data generation flow based on SGLang and supports features like dynamic sampling and partial rollout.
186+
- By default, it uses the `generate_rollout` function from [slime/rollout/sglang_rollout.py](https://github.com/THUDM/slime/blob/main/slime/rollout/sglang_rollout.py) for data generation. This file implements an asynchronous (asyncio) data generation flow based on SGLang and supports features like dynamic sampling and partial rollout.
187187

188188
- You can completely replace the `generate_rollout` in sglang\_example.py by using the `--rollout-function-path` parameter. You just need to ensure that the function signature passed via `--rollout-function-path` is as follows:
189189

@@ -213,7 +213,7 @@ slime supports customizing data generation (rollout) to various degrees.
213213

214214
- `evaluation`: A boolean indicating if the rollout is for evaluation. You can configure a separate evaluation function using `--eval-function-path`.
215215

216-
- The returned `Sample` type is defined in [slime/utils/types.py](../../slime/utils/types.py). When implementing, you need to ensure the following fields are correctly set:
216+
- The returned `Sample` type is defined in [slime/utils/types.py](https://github.com/THUDM/slime/blob/main/slime/utils/types.py). When implementing, you need to ensure the following fields are correctly set:
217217

218218
- `tokens`: The tokens for the prompt + response.
219219
- `response_length`: The total length of the response. For multi-turn tasks, this is the length of the tokens remaining after the first-turn prompt.
@@ -254,7 +254,7 @@ slime supports customizing data generation (rollout) to various degrees.
254254
return sample
255255
```
256256

257-
For a more complete version, please refer to [slime/rollout/sglang\_example.py](../../slime/rollout/sglang_rollout.py).
257+
For a more complete version, please refer to [slime/rollout/sglang_rollout.py](https://github.com/THUDM/slime/blob/main/slime/rollout/sglang_rollout.py).
258258

259259
- Sometimes, you may also need to support a custom reward model. This can be configured by setting `--custom-rm-path`.
260260

@@ -275,7 +275,7 @@ Some parameters related to slime's resource scheduling are configured by slime i
275275
- `--tp-size` in slime is set using `--rollout-num-gpus-per-engine`.
276276
- `--model-path` in slime is set using `--hf-checkpoint`.
277277

278-
The way SGLang parameters are integrated into slime can be found in [slime/backends/sglang\_utils/arguments.py](../../slime/backends/sglang_utils/arguments.py).
278+
The way SGLang parameters are integrated into slime can be found in [slime/backends/sglang_utils/arguments.py](https://github.com/THUDM/slime/blob/main/slime/backends/sglang_utils/arguments.py).
279279

280280
### How to Use the Router
281281

@@ -291,7 +291,7 @@ slime supports different and lightly modified versions of Megatron by reusing co
291291

292292
### Parameter Configuration
293293

294-
slime directly imports all parameters of the Megatron in the current environment by using `from megatron.training.arguments import parse_args`. If the version of Megatron you are using has parameters defined outside of `parse_args`, you can configure them by passing them in, similar to how it's done in [train.py](../../train.py), for example:
294+
slime directly imports all parameters of the Megatron in the current environment by using `from megatron.training.arguments import parse_args`. If the version of Megatron you are using has parameters defined outside of `parse_args`, you can configure them by passing them in, similar to how it's done in [train.py](https://github.com/THUDM/slime/blob/main/train.py), for example:
295295

296296
```python
297297
if __name__ == "__main__":

0 commit comments

Comments
 (0)