fix: update Megatron-DeepSpeed tutorial to match current repo structure (#7761)

nathon-lee · loadams · tohtana · web-flow · commit 8a9369d03e80 · 2026-01-11T06:48:27.000Z
docs: update Megatron-DeepSpeed tutorial to match current repo structure - Update outdated file paths and script names in `docs/_tutorials/megatron.md`. - Replace `scripts/` with `examples/` for training scripts. - Replace `pretrain_gpt2.py` with `pretrain_gpt.py`. - Correct locations for `arguments.py` and `utils.py` to `megatron/`. - Ensure tutorial instructions align with the latest Megatron-DeepSpeed repository layout. Resolves #7757 --------- Signed-off-by: leejianwoo-collab <leejianwoo@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
diff --git a/docs/_tutorials/megatron.md b/docs/_tutorials/megatron.md
@@ -31,31 +31,31 @@ git submodule update --init --recursive
 ### Running Unmodified Megatron-LM GPT2 model
 
 * For a single GPU run:
-    - change `scripts/pretrain_gpt2.sh`, set its `--train-data` argument as `"webtext"`.
-    - run `bash scripts/pretrain_gpt2.sh`
+    - change `examples/pretrain_gpt.sh`, set its `--train-data` argument as `"webtext"`.
+    - run `bash examples/pretrain_gpt.sh`
 
 * For multiple GPUs and/or nodes run:
-    - change `scripts/pretrain_gpt2_model_parallel.sh`
+    - change `examples/pretrain_gpt_distributed_with_mp.sh`
         - set its `--train-data` argument as `"webtext"`
         - `GPUS_PER_NODE` indicates how many GPUs per node involved in the testing
         - `NNODES` indicates how many nodes involved in the testing
 
-    - run `bash scripts/pretrain_gpt2_model_parallel.sh`
+    - run `bash examples/pretrain_gpt_distributed_with_mp.sh`
 
 
 ## Enabling DeepSpeed
 
 To use DeepSpeed we will modify three files :
 
-* `arguments.py` : Arguments configurations
-* `pretrain_gpt2.py` : Main entry point for training
-* `utils.py` : Checkpoint saving and loading utilities
+* `megatron/arguments.py` : Arguments configurations
+* `pretrain_gpt.py` : Main entry point for training
+* `megatron/utils.py` : Checkpoint saving and loading utilities
 
 
 ### Argument Parsing
 The first step is adding DeepSpeed arguments to
 Megatron-LM GPT2 model, using `deepspeed.add_config_arguments()` in
-`arguments.py`.
+`megatron/arguments.py`.
 
 ```python
 def get_args():
@@ -360,9 +360,9 @@ Megatron-LM GPT2 model with DeepSpeed applied, execute the following command to
 start training.
 
 - Single GPU run
-  - run `bash scripts/ds_pretrain_gpt2.sh`
+  - run `bash scripts/ds_pretrain_gpt.sh`
 - Multiple GPUs/Nodes run
-  - run `bash scripts/ds_zero2_pretrain_gpt2_model_parallel.sh`
+  - run `bash scripts/ds_zero2_pretrain_gpt_model_parallel.sh`
 
 ## DeepSpeed Evaluation using GPT-2