-
Notifications
You must be signed in to change notification settings - Fork 977
Description
Describe the bug
I am aiming to use TPUs to train on Google Colab.
After update to Google Colab and pytorch xla wheels, the following code no longer runs as I believe TPUs don't run in nodes, they are intrinsically linked to the VMs they are running on. :
To Reproduce
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchtext==0.10.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html
!pip install pyyaml==5.4.1
Expected behavior
Could alternative code be provided and the documentation updated? I am currently trying to use:
!pip install torch~=2.6.0 'torch_xla[tpu]~=2.6.0' \
-f https://storage.googleapis.com/libtpu-releases/index.html \
-f https://storage.googleapis.com/libtpu-wheels/index.html
!pip install cloud-tpu-client==0.10
# Install the latest PyTorch packages (using CUDA 12.6 builds)
!pip install torch==2.6.0 torchvision==0.21.0 torchtext==0.18 \
-f https://download.pytorch.org/whl/cu118/torch_stable.html
# Install the latest version of PyYAML
!pip install pyyaml==6.0
Whilst this then shows the TPU as available, training is incredibly slow at around 0.18 it/second (when it works) and when running the code here , it never gets past cell [11] as it gets stuck here for unknown reasons and no progress bar even shows up.
System (please complete the following information):
- Python version: [e.g. 3.11.11]
- darts version [e.g. 0.32.0]
Additional context
The code in "To Reproduce" is taken from here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status