Skip to content

Commit 9ca6687

Browse files
committed
updated README instructions for training on alternate GPUs
1 parent ae379b0 commit 9ca6687

File tree

1 file changed

+15
-11
lines changed

1 file changed

+15
-11
lines changed

README.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -96,36 +96,40 @@ Otherwise, follow the steps above. The 12B param model may not function well in
9696
A100 instance types are not available in all cloud regions, or can be hard to provision. Training is possible on other GPU instance types,
9797
for smaller Dolly model sizes, and with small modifications to reduce memory usage. These modifications are not optimal, but are simple to make.
9898

99-
Select your GPU family type from the `gpu_family` widget and then run the rest of the code.
99+
Select your GPU family type from the `gpu_family` widget, enter the number of GPUs available in the `num_gpus` widget, and then run the rest of the code.
100100
A number of different options will be set for you to train the model for one of the following GPU types:
101101
- A100 (default)
102+
- A10
102103
- V100
103-
- A10 (in progress, see below for manual configuration details)
104+
105+
Details of the different configurations are below.
106+
107+
#### A100 GPUs
108+
109+
A100 GPUs are preferred for training all model sizes, and are the only GPUs that can train the 12B param model in a reasonable amount of time.
110+
As such, this is the default configuration, as set in the `a100_config.json` deepspeed config file.
104111

105112
#### A10 GPUs
106113

107114
Training the 12B param model is not recommended on A10s.
108115

109-
To train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB; `Standard_NV72ads_A10_v5`, 2 x A10), make the following changes:
116+
To train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB; `Standard_NV72ads_A10_v5`, 2 x A10),
117+
simply select `a10` from the `gpu_family` widget and enter the number of GPUs available in the `num_gpus` widget, then run the rest of the code.
118+
This will use the `a10_config.json` deepspeed config file, which makes the following changes:
110119

111-
- Set `per-device-train-batch-size` and `per-device-eval-batch-size` to 3 in the `train_dolly.py` invocation of `deepspeed`
112-
- Modify the deepspeed config file `a10_a100_config.json` to configure optimizer offload. Within the `"zero_optimization"` section, add:
120+
- `per-device-train-batch-size` and `per-device-eval-batch-size` are set to 3 in the `train_dolly.py` invocation of `deepspeed`
121+
- Within the `"zero_optimization"` section of the deepspeed config, we have added:
113122
```
114123
"offload_optimizer": {
115124
"device": "cpu",
116125
"pin_memory": true
117126
},
118127
```
119-
- Set the `num_gpus` widget in `train_dolly` to the number of GPUs in your instance, such as 2 or 4, before running
120-
121-
To train the 2.8B param model:
122-
123-
- Instead, only set `per-device-train-batch-size` and `per-device-eval-batch-size` to 3 in the `train_dolly.py` invocation of `deepspeed`
124128

125129
#### V100 GPUs
126130

127131
To run on V100 instances with 32GB of GPU memory (ex: `p3dn.24xlarge` or `Standard_ND40rs_v2`),
128-
simply select `v100` from the `gpu_family` widget and then run the rest of the code.
132+
simply select `v100` from the `gpu_family` widget and enter the number of GPUs available in the `num_gpus` widget, and then run the rest of the code.
129133
This will use the `v100_config.json` deepspeed config file, which makes the following changes:
130134

131135
- It makes the changes described above for A10s

0 commit comments

Comments
 (0)