how to run horovod strategy? #13663
-
What I got is: python pl_examples/basic_examples/mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.accelerator 'gpu' --trainer.devices 4 --trainer.strategy 'horovod' the output is Thu Jul 14 14:26:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 35C P0 34W / 70W | 1547MiB / 15360MiB | 37% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 31C P8 16W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 31C P8 16W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 32C P8 16W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 39649 C 1545MiB |
+-----------------------------------------------------------------------------+ the command line output is (base) ray@ip-172-31-36-78:~/horovod-gpu/lightning$ python pl_examples/basic_examples/mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.accelerator 'gpu' --trainer.devices 4 --trainer.strategy 'horovod'
####
###########
####################
############################
#####################################
##############################################
######################### ###################
####################### ###################
#################### ####################
################## #####################
################ ######################
##################### #################
###################### ###################
##################### #####################
#################### #######################
################### #########################
##############################################
#####################################
############################
####################
##########
####
Global seed set to 42
/home/ray/anaconda3/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py:92: PossibleUserWarning: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
rank_zero_warn(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/ray/anaconda3/lib/python3.8/site-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
| Name | Type | Params
--------------------------------------
0 | model | Net | 1.2 M
1 | test_acc | Accuracy | 0
--------------------------------------
1.2 M Trainable params
0 Non-trainable params
1.2 M Total params
4.800 Total estimated model params size (MB)
/home/ray/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 0: 0%| What should be: It should be multiple gpus. How should I run? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 12 replies
-
Try using |
Beta Was this translation helpful? Give feedback.
-
Thu Jul 14 17:43:52 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 34C P0 34W / 70W | 1547MiB / 15360MiB | 38% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 32C P8 16W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 32C P8 16W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 31C P8 16W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 20520 C 1545MiB |
+-----------------------------------------------------------------------------+ |
Beta Was this translation helpful? Give feedback.
-
still see this (base) ray@ip-172-31-93-242:~/horovod-gpu/lightning$ python pl_examples/basic_examples/mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.accelerator 'gpu' --trainer.gpus 4 --trainer.strategy 'horovod'
####
###########
####################
############################
#####################################
##############################################
######################### ###################
####################### ###################
#################### ####################
################## #####################
################ ######################
##################### #################
###################### ###################
##################### #####################
#################### #######################
################### #########################
##############################################
#####################################
############################
####################
##########
####
Global seed set to 42
/home/ray/horovod-gpu/lightning/pytorch_lightning/loops/utilities.py:92: PossibleUserWarning: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
rank_zero_warn(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/train-images-idx3-ubyte.gz
9913344it [00:00, 23040462.18it/s]
Extracting /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/train-images-idx3-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/train-labels-idx1-ubyte.gz
29696it [00:00, 119648464.54it/s]
Extracting /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/train-labels-idx1-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/t10k-images-idx3-ubyte.gz
1649664it [00:00, 10574307.41it/s]
Extracting /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/t10k-labels-idx1-ubyte.gz
5120it [00:00, 48475928.85it/s]
Extracting /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/ray/horovod-gpu/lightning/Datasets/MNIST/raw
/home/ray/anaconda3/lib/python3.8/site-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
9913344it [00:00, 35476547.86it/s]
Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
29696it [00:00, 118510039.57it/s]
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
1649664it [00:00, 10670010.40it/s]
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
5120it [00:00, 49941480.19it/s]
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
Missing logger folder: /home/ray/horovod-gpu/lightning/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
| Name | Type | Params
--------------------------------------
0 | model | Net | 1.2 M
1 | test_acc | Accuracy | 0
--------------------------------------
1.2 M Trainable params
0 Non-trainable params
1.2 M Total params
4.800 Total estimated model params size (MB)
/home/ray/horovod-gpu/lightning/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 0: 0%| | 0/1875 [00:00<?, ?it/s]/home/ray/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Epoch 2: 66%|██████████████████████████████████████████████████████████████████████████████████████████████████▊ | 1243/1875 [00:11<00:05, 107.94it/s, loss=0.103, v_num=0]^C/home/ray/horovod-gpu/lightning/pytorch_lightning/trainer/trainer.py:726: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")
Restoring states from the checkpoint path at /home/ray/horovod-gpu/lightning/lightning_logs/version_0/checkpoints/epoch=1-step=3750.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Loaded model weights from checkpoint at /home/ray/horovod-gpu/lightning/lightning_logs/version_0/checkpoints/epoch=1-step=3750.ckpt
/home/ray/horovod-gpu/lightning/pytorch_lightning/trainer/connectors/data_connector.py:330: PossibleUserWarning: Using `DistributedSampler` with the dataloaders. During `trainer.test()`, it is recommended to use `Trainer(devices=1)` to ensure each sample/batch gets evaluated exactly once. Otherwise, multi-device settings use `DistributedSampler` that replicates some samples to make sure all devices have same batch size in case of uneven inputs.
rank_zero_warn(
/home/ray/horovod-gpu/lightning/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Testing DataLoader 0: 0%| | 0/313 [00:00<?, ?it/s]/home/ray/anaconda3/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called `full_state_update` that has
not been set for this class (_ResultMetric). The property determines if `update` by
default needs access to the full metric state. If this is not the case, significant speedups can be
achieved and we recommend setting this to `False`.
We provide an checking function
`from torchmetrics.utilities import check_forward_no_full_state`
that can be used to check if the `full_state_update=True` (old and potential slower behaviour,
default for now) or if `full_state_update=False` can be used safely.
warnings.warn(*args, **kwargs)
Testing DataLoader 0: 5%|████████▋ | 17/313 [00:00<00:02, 143.89it/s]^C^C |
Beta Was this translation helpful? Give feedback.
-
i am using pytorch lightning 1.6.5 |
Beta Was this translation helpful? Give feedback.
Try using
gpus
to specify number of devices to train on--trainer.gpus=4