Skip to content

Commit 8a9369d

Browse files
nathon-leeloadamstohtana
authored
fix: update Megatron-DeepSpeed tutorial to match current repo structure (#7761)
docs: update Megatron-DeepSpeed tutorial to match current repo structure - Update outdated file paths and script names in `docs/_tutorials/megatron.md`. - Replace `scripts/` with `examples/` for training scripts. - Replace `pretrain_gpt2.py` with `pretrain_gpt.py`. - Correct locations for `arguments.py` and `utils.py` to `megatron/`. - Ensure tutorial instructions align with the latest Megatron-DeepSpeed repository layout. Resolves #7757 --------- Signed-off-by: leejianwoo-collab <leejianwoo@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
1 parent 816e4ae commit 8a9369d

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

docs/_tutorials/megatron.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -31,31 +31,31 @@ git submodule update --init --recursive
3131
### Running Unmodified Megatron-LM GPT2 model
3232

3333
* For a single GPU run:
34-
- change `scripts/pretrain_gpt2.sh`, set its `--train-data` argument as `"webtext"`.
35-
- run `bash scripts/pretrain_gpt2.sh`
34+
- change `examples/pretrain_gpt.sh`, set its `--train-data` argument as `"webtext"`.
35+
- run `bash examples/pretrain_gpt.sh`
3636

3737
* For multiple GPUs and/or nodes run:
38-
- change `scripts/pretrain_gpt2_model_parallel.sh`
38+
- change `examples/pretrain_gpt_distributed_with_mp.sh`
3939
- set its `--train-data` argument as `"webtext"`
4040
- `GPUS_PER_NODE` indicates how many GPUs per node involved in the testing
4141
- `NNODES` indicates how many nodes involved in the testing
4242

43-
- run `bash scripts/pretrain_gpt2_model_parallel.sh`
43+
- run `bash examples/pretrain_gpt_distributed_with_mp.sh`
4444

4545

4646
## Enabling DeepSpeed
4747

4848
To use DeepSpeed we will modify three files :
4949

50-
* `arguments.py` : Arguments configurations
51-
* `pretrain_gpt2.py` : Main entry point for training
52-
* `utils.py` : Checkpoint saving and loading utilities
50+
* `megatron/arguments.py` : Arguments configurations
51+
* `pretrain_gpt.py` : Main entry point for training
52+
* `megatron/utils.py` : Checkpoint saving and loading utilities
5353

5454

5555
### Argument Parsing
5656
The first step is adding DeepSpeed arguments to
5757
Megatron-LM GPT2 model, using `deepspeed.add_config_arguments()` in
58-
`arguments.py`.
58+
`megatron/arguments.py`.
5959

6060
```python
6161
def get_args():
@@ -360,9 +360,9 @@ Megatron-LM GPT2 model with DeepSpeed applied, execute the following command to
360360
start training.
361361

362362
- Single GPU run
363-
- run `bash scripts/ds_pretrain_gpt2.sh`
363+
- run `bash scripts/ds_pretrain_gpt.sh`
364364
- Multiple GPUs/Nodes run
365-
- run `bash scripts/ds_zero2_pretrain_gpt2_model_parallel.sh`
365+
- run `bash scripts/ds_zero2_pretrain_gpt_model_parallel.sh`
366366

367367
## DeepSpeed Evaluation using GPT-2
368368

0 commit comments

Comments
 (0)