Terminate called after throwing an instance of ‘c10::CUDAError’ what(): CUDA error: initialization error #9197
Unanswered
thepurpleowl
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 2 comments 3 replies
-
Ok, replying here (the forum seems not the way to go anymore)! In my case is related to ddp: ##8821 (comment) |
Beta Was this translation helpful? Give feedback.
2 replies
-
Dear @thepurpleowl, It seems your machine has both cuda and TPU available. Since there's the issue linked above with cuda, this might be why you are experiencing it for TPUs GPU available: True, used: False
TPU available: True, using: 8 TPU cores
IPU available: False, using: 0 IPUs Best, |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to run my pytorch-lghtning code on TPU in GCP.
I am getting the error
terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: initialization error CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ... torch.multiprocessing.spawn.ProcessExitedException: process 7 terminated with signal SIGABRT
The full stack trace is as follows,
python: 3.7
pytorch: 1.9
pytorch-lightning: 1.4.4
cuda: 11.1
tpu: v2-8
This works on single GPU and Error occurs at
trainer.fit
What does
c10-cudaerror
mean? Is it something related to cuda version 10?Any ideas what's going wrong?
Beta Was this translation helpful? Give feedback.
All reactions