Skip to content

Commit 3e8c659

Browse files
committed
Improve batch size guidance for other instance training
1 parent 0eadcb7 commit 3e8c659

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ Training the 12B param model is not recommended on A10s.
103103

104104
To train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB; `Standard_NV72ads_A10_v5`, 2 x A10), make the following changes:
105105

106+
- Set `per-device-train-batch-size` and `per-device-eval-batch-size` to 3 in the `train_dolly.py` invocation of `deepspeed`
106107
- Modify the deepspeed config file `ds_z3_bf16_config.json` to configure optimizer offload. Within the `"zero_optimization"` section, add:
107108
```
108109
"offload_optimizer": {
@@ -114,7 +115,7 @@ To train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB;
114115

115116
To train the 2.8B param model:
116117

117-
- Instead, simply set `per-device-train-batch-size` and `per-device-eval-batch-size` to 2 in the `train_dolly.py` invocation of `deepspeed`
118+
- Instead, only set `per-device-train-batch-size` and `per-device-eval-batch-size` to 3 in the `train_dolly.py` invocation of `deepspeed`
118119

119120
#### V100 GPUs
120121

@@ -127,6 +128,8 @@ To run on V100 instances with 32GB of GPU memory (ex: `p3dn.24xlarge` or `Standa
127128
bf16=False,
128129
...
129130
```
131+
132+
You may be able to slightly increase the batch size with 32GB instances, compared to what works above for 24GB A10s.
130133

131134
## Running Unit Tests Locally
132135

0 commit comments

Comments
 (0)