You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-11Lines changed: 15 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,36 +96,40 @@ Otherwise, follow the steps above. The 12B param model may not function well in
96
96
A100 instance types are not available in all cloud regions, or can be hard to provision. Training is possible on other GPU instance types,
97
97
for smaller Dolly model sizes, and with small modifications to reduce memory usage. These modifications are not optimal, but are simple to make.
98
98
99
-
Select your GPU family type from the `gpu_family` widget and then run the rest of the code.
99
+
Select your GPU family type from the `gpu_family` widget, enter the number of GPUs available in the `num_gpus` widget, and then run the rest of the code.
100
100
A number of different options will be set for you to train the model for one of the following GPU types:
101
101
- A100 (default)
102
+
- A10
102
103
- V100
103
-
- A10 (in progress, see below for manual configuration details)
104
+
105
+
Details of the different configurations are below.
106
+
107
+
#### A100 GPUs
108
+
109
+
A100 GPUs are preferred for training all model sizes, and are the only GPUs that can train the 12B param model in a reasonable amount of time.
110
+
As such, this is the default configuration, as set in the `a100_config.json` deepspeed config file.
104
111
105
112
#### A10 GPUs
106
113
107
114
Training the 12B param model is not recommended on A10s.
108
115
109
-
To train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB; `Standard_NV72ads_A10_v5`, 2 x A10), make the following changes:
116
+
To train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB; `Standard_NV72ads_A10_v5`, 2 x A10),
117
+
simply select `a10` from the `gpu_family` widget and enter the number of GPUs available in the `num_gpus` widget, then run the rest of the code.
118
+
This will use the `a10_config.json` deepspeed config file, which makes the following changes:
110
119
111
-
-Set `per-device-train-batch-size` and `per-device-eval-batch-size` to 3 in the `train_dolly.py` invocation of `deepspeed`
112
-
-Modify the deepspeed config file `a10_a100_config.json` to configure optimizer offload. Within the `"zero_optimization"` section, add:
120
+
-`per-device-train-batch-size` and `per-device-eval-batch-size` are set to 3 in the `train_dolly.py` invocation of `deepspeed`
121
+
- Within the `"zero_optimization"` section of the deepspeed config, we have added:
113
122
```
114
123
"offload_optimizer": {
115
124
"device": "cpu",
116
125
"pin_memory": true
117
126
},
118
127
```
119
-
- Set the `num_gpus` widget in `train_dolly` to the number of GPUs in your instance, such as 2 or 4, before running
120
-
121
-
To train the 2.8B param model:
122
-
123
-
- Instead, only set `per-device-train-batch-size` and `per-device-eval-batch-size` to 3 in the `train_dolly.py` invocation of `deepspeed`
124
128
125
129
#### V100 GPUs
126
130
127
131
To run on V100 instances with 32GB of GPU memory (ex: `p3dn.24xlarge` or `Standard_ND40rs_v2`),
128
-
simply select `v100` from the `gpu_family` widget and then run the rest of the code.
132
+
simply select `v100` from the `gpu_family` widget and enter the number of GPUs available in the `num_gpus` widget, and then run the rest of the code.
129
133
This will use the `v100_config.json` deepspeed config file, which makes the following changes:
0 commit comments