diff --git a/docs/_tutorials/megatron.md b/docs/_tutorials/megatron.md index 286a9a36a926..490abdb60122 100644 --- a/docs/_tutorials/megatron.md +++ b/docs/_tutorials/megatron.md @@ -31,31 +31,31 @@ git submodule update --init --recursive ### Running Unmodified Megatron-LM GPT2 model * For a single GPU run: - - change `scripts/pretrain_gpt2.sh`, set its `--train-data` argument as `"webtext"`. - - run `bash scripts/pretrain_gpt2.sh` + - change `examples/pretrain_gpt.sh`, set its `--train-data` argument as `"webtext"`. + - run `bash examples/pretrain_gpt.sh` * For multiple GPUs and/or nodes run: - - change `scripts/pretrain_gpt2_model_parallel.sh` + - change `examples/pretrain_gpt_distributed_with_mp.sh` - set its `--train-data` argument as `"webtext"` - `GPUS_PER_NODE` indicates how many GPUs per node involved in the testing - `NNODES` indicates how many nodes involved in the testing - - run `bash scripts/pretrain_gpt2_model_parallel.sh` + - run `bash examples/pretrain_gpt_distributed_with_mp.sh` ## Enabling DeepSpeed To use DeepSpeed we will modify three files : -* `arguments.py` : Arguments configurations -* `pretrain_gpt2.py` : Main entry point for training -* `utils.py` : Checkpoint saving and loading utilities +* `megatron/arguments.py` : Arguments configurations +* `pretrain_gpt.py` : Main entry point for training +* `megatron/utils.py` : Checkpoint saving and loading utilities ### Argument Parsing The first step is adding DeepSpeed arguments to Megatron-LM GPT2 model, using `deepspeed.add_config_arguments()` in -`arguments.py`. +`megatron/arguments.py`. ```python def get_args(): @@ -360,9 +360,9 @@ Megatron-LM GPT2 model with DeepSpeed applied, execute the following command to start training. - Single GPU run - - run `bash scripts/ds_pretrain_gpt2.sh` + - run `bash scripts/ds_pretrain_gpt.sh` - Multiple GPUs/Nodes run - - run `bash scripts/ds_zero2_pretrain_gpt2_model_parallel.sh` + - run `bash scripts/ds_zero2_pretrain_gpt_model_parallel.sh` ## DeepSpeed Evaluation using GPT-2