|
| 1 | +# Reproducing Figures in SC21 Paper |
| 2 | + |
| 3 | + |
| 4 | +This directory contains some of the scripts that were used to produce the |
| 5 | +results in the [Megatron paper](https://arxiv.org/pdf/2104.04473.pdf) that is |
| 6 | +to appear at [SuperComputing 2021](https://sc21.supercomputing.org/). These |
| 7 | +scripts use [Slurm](https://slurm.schedmd.com/documentation.html) with the |
| 8 | +[pyxis plugin](https://github.com/NVIDIA/pyxis), but can be modified for other |
| 9 | +schedulers as well. |
| 10 | + |
| 11 | + |
| 12 | +## Setup |
| 13 | + |
| 14 | +All the cluster-dependent variables are in [`CONFIG.sh`](./CONFIG.sh). Please |
| 15 | +update the unspecified values (in angle brackets `<...>`) before launching any |
| 16 | +scripts. |
| 17 | + |
| 18 | + |
| 19 | + |
| 20 | +## Scripts |
| 21 | + |
| 22 | +Below is a list of scripts that can be used to reproduce various figures in our |
| 23 | +[paper](https://arxiv.org/pdf/2104.04473.pdf): |
| 24 | + |
| 25 | +* [run_table_1.sh](./run_table_1.sh): Table 1 showing weak-scaling throughput |
| 26 | +for GPT models ranging from 1 billion to 1 trillion parameters. |
| 27 | +* [run_figure_11.sh](./run_figure_11.sh): Figure 11 showing the weak-scaling |
| 28 | +performance of pipeline parallelism. |
| 29 | +* [run_figure_12.sh](./run_figure_12.sh): Figure 12 showing the effect of |
| 30 | +the interleaved schedule on a 175B GPT model. |
| 31 | +* [run_figure_13.sh](./run_figure_13.sh): Figure 13 showing the effect of |
| 32 | +different degrees of pipeline and tensor model parallelism on a model with |
| 33 | +162.2 billion parameters. |
| 34 | +* [run_figure_14.sh](./run_figure_14.sh): Figure 14 showing the effect of |
| 35 | +different degrees of data and pipeline model parallelism on a model with |
| 36 | +5.9 billion parameters. |
| 37 | +* [run_figure_15.sh](./run_figure_15.sh): Figure 15 showing the effect of |
| 38 | +different degrees of data and tensor model parallelism on a model with |
| 39 | +5.9 billion parameters. |
| 40 | +* [run_figure_16.sh](./run_figure_16.sh): Figure 16 showing the effect of |
| 41 | +microbatch size. |
| 42 | +* [run_figure_17.sh](./run_figure_17.sh): Figure 17 showing the effect of |
| 43 | +activation recomputation. |
| 44 | +* [run_figure_18.sh](./run_figure_18.sh): Figure 18 showing the effect of |
| 45 | +the scatter-gather communication optimization. |
0 commit comments