Skip to content

Commit 6609c85

Browse files
committed
Set bf16 flags corretly for a10/a100
1 parent fd1a733 commit 6609c85

File tree

4 files changed

+18
-3
lines changed

4 files changed

+18
-3
lines changed

config/a100_config.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
{
2+
"fp16": {
3+
"enabled": false
4+
},
25
"bf16": {
3-
"enabled": "auto"
6+
"enabled": true
47
},
58
"optimizer": {
69
"type": "AdamW",

config/a10_config.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
{
2+
"fp16": {
3+
"enabled": false
4+
},
25
"bf16": {
3-
"enabled": "auto"
6+
"enabled": true
47
},
58
"optimizer": {
69
"type": "AdamW",

config/v100_config.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
"fp16": {
33
"enabled": true
44
},
5+
"bf16": {
6+
"enabled": false
7+
},
58
"optimizer": {
69
"type": "AdamW",
710
"params": {

train_dolly.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,11 @@
160160
num_gpus = int(num_gpus)
161161
num_gpus_flag = f"--num_gpus={num_gpus}"
162162

163+
if gpu_family == "v100":
164+
bf16_flag = "--bf16 false"
165+
else:
166+
bf16_flag = "--bf16 true"
167+
163168
os.environ["TOKENIZERS_PARALLELISM"] = "false"
164169

165170
# COMMAND ----------
@@ -184,7 +189,8 @@
184189
--eval-steps 50 \
185190
--warmup-steps 50 \
186191
--test-size 200 \
187-
--lr 5e-6
192+
--lr 5e-6 \
193+
{bf16_flag}
188194

189195
# COMMAND ----------
190196

0 commit comments

Comments
 (0)