Skip to content

Commit cd67192

Browse files
author
tnixon
committed
use batch_size of 4 for A10 GPUs
1 parent 8ca36a1 commit cd67192

File tree

1 file changed

+27
-25
lines changed

1 file changed

+27
-25
lines changed

train_dolly.py

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,14 @@
3434

3535
# COMMAND ----------
3636

37-
# MAGIC !wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb -O /tmp/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb && \
38-
# MAGIC wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-3_11.5.1.109-1_amd64.deb -O /tmp/libcublas-dev-11-3_11.5.1.109-1_amd64.deb && \
39-
# MAGIC wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb -O /tmp/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb && \
40-
# MAGIC wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-3_10.2.4.109-1_amd64.deb -O /tmp/libcurand-dev-11-3_10.2.4.109-1_amd64.deb && \
41-
# MAGIC dpkg -i /tmp/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb && \
42-
# MAGIC dpkg -i /tmp/libcublas-dev-11-3_11.5.1.109-1_amd64.deb && \
43-
# MAGIC dpkg -i /tmp/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb && \
44-
# MAGIC dpkg -i /tmp/libcurand-dev-11-3_10.2.4.109-1_amd64.deb
37+
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb -O /tmp/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb && \
38+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-3_11.5.1.109-1_amd64.deb -O /tmp/libcublas-dev-11-3_11.5.1.109-1_amd64.deb && \
39+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb -O /tmp/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb && \
40+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-3_10.2.4.109-1_amd64.deb -O /tmp/libcurand-dev-11-3_10.2.4.109-1_amd64.deb && \
41+
dpkg -i /tmp/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb && \
42+
dpkg -i /tmp/libcublas-dev-11-3_11.5.1.109-1_amd64.deb && \
43+
dpkg -i /tmp/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb && \
44+
dpkg -i /tmp/libcurand-dev-11-3_10.2.4.109-1_amd64.deb
4545

4646
# COMMAND ----------
4747

@@ -148,7 +148,9 @@
148148

149149
# configure the batch_size
150150
batch_size = 3
151-
if gpu_family == "a100":
151+
if gpu_family == "a10":
152+
batch_size = 4
153+
elif gpu_family == "a100":
152154
batch_size = 6
153155

154156
# configure num_gpus, if specified
@@ -167,22 +169,22 @@
167169

168170
# COMMAND ----------
169171

170-
# MAGIC !deepspeed {num_gpus_flag} \
171-
# MAGIC --module training.trainer \
172-
# MAGIC --input-model {input_model} \
173-
# MAGIC --deepspeed {deepspeed_config} \
174-
# MAGIC --epochs 2 \
175-
# MAGIC --local-output-dir {local_output_dir} \
176-
# MAGIC --dbfs-output-dir {dbfs_output_dir} \
177-
# MAGIC --per-device-train-batch-size {batch_size} \
178-
# MAGIC --per-device-eval-batch-size {batch_size} \
179-
# MAGIC --logging-steps 10 \
180-
# MAGIC --save-steps 200 \
181-
# MAGIC --save-total-limit 20 \
182-
# MAGIC --eval-steps 50 \
183-
# MAGIC --warmup-steps 50 \
184-
# MAGIC --test-size 200 \
185-
# MAGIC --lr 5e-6
172+
!deepspeed {num_gpus_flag} \
173+
--module training.trainer \
174+
--input-model {input_model} \
175+
--deepspeed {deepspeed_config} \
176+
--epochs 2 \
177+
--local-output-dir {local_output_dir} \
178+
--dbfs-output-dir {dbfs_output_dir} \
179+
--per-device-train-batch-size {batch_size} \
180+
--per-device-eval-batch-size {batch_size} \
181+
--logging-steps 10 \
182+
--save-steps 200 \
183+
--save-total-limit 20 \
184+
--eval-steps 50 \
185+
--warmup-steps 50 \
186+
--test-size 200 \
187+
--lr 5e-6
186188

187189
# COMMAND ----------
188190

0 commit comments

Comments
 (0)