Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/_tutorials/megatron.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,31 +31,31 @@ git submodule update --init --recursive
### Running Unmodified Megatron-LM GPT2 model

* For a single GPU run:
- change `scripts/pretrain_gpt2.sh`, set its `--train-data` argument as `"webtext"`.
- run `bash scripts/pretrain_gpt2.sh`
- change `examples/pretrain_gpt.sh`, set its `--train-data` argument as `"webtext"`.
- run `bash examples/pretrain_gpt.sh`

* For multiple GPUs and/or nodes run:
- change `scripts/pretrain_gpt2_model_parallel.sh`
- change `examples/pretrain_gpt_distributed_with_mp.sh`
- set its `--train-data` argument as `"webtext"`
- `GPUS_PER_NODE` indicates how many GPUs per node involved in the testing
- `NNODES` indicates how many nodes involved in the testing

- run `bash scripts/pretrain_gpt2_model_parallel.sh`
- run `bash examples/pretrain_gpt_distributed_with_mp.sh`


## Enabling DeepSpeed

To use DeepSpeed we will modify three files :

* `arguments.py` : Arguments configurations
* `pretrain_gpt2.py` : Main entry point for training
* `utils.py` : Checkpoint saving and loading utilities
* `megatron/arguments.py` : Arguments configurations
* `pretrain_gpt.py` : Main entry point for training
* `megatron/utils.py` : Checkpoint saving and loading utilities


### Argument Parsing
The first step is adding DeepSpeed arguments to
Megatron-LM GPT2 model, using `deepspeed.add_config_arguments()` in
`arguments.py`.
`megatron/arguments.py`.

```python
def get_args():
Expand Down Expand Up @@ -360,9 +360,9 @@ Megatron-LM GPT2 model with DeepSpeed applied, execute the following command to
start training.

- Single GPU run
- run `bash scripts/ds_pretrain_gpt2.sh`
- run `bash scripts/ds_pretrain_gpt.sh`
- Multiple GPUs/Nodes run
- run `bash scripts/ds_zero2_pretrain_gpt2_model_parallel.sh`
- run `bash scripts/ds_zero2_pretrain_gpt_model_parallel.sh`

## DeepSpeed Evaluation using GPT-2

Expand Down