Kernel crashes when dimension error on an Apple Silicon Device #497
Replies: 1 comment 1 reply
-
Hi @pypdeveloper , I'm not quite sure what could be happening here. Does it happen even with small tensors? (e.g. not passing it to a model?) Have you tried with a simple model (although from memory I was going to see if this link helped: https://discuss.pytorch.org/t/kernel-dies-on-loss-backward/152231 But it looks like that was for a bigger model with a large batch size. One hot fix could be, experimenting with models on the CPU (making sure they work), then going to MPS when actually training. This is not ideal but it might work for now, as PyTorch for MPS still has a few rough edges. Is there anything else you've found online? Otherwise, I'd recommend checking out the PyTorch for MPS ops thread: pytorch/pytorch#77764 (comment) (it shows how many operations are still to be made on the MPS device) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everybody, I have noticed that whenever there is a dimension error with a tensor or a mismatch between the input shape of a model and the shape of the data the kernel crashes. For example at timestamp 18:17:55 when the code is being debugged if I run the same code (making sure that I am running it on the "mps" device) the kernel crashes. I am running a conda environment and have installed Jupyter lab the way that has been shown on the official website, this issue also persists on the VSCode jupyterlab extension. There is no error displayed just that the kernel crashes. I am running python 3.11 with the stable Pytorch packages, and I am running this on an M1 Max chip. Is this can bug with all computers? And is there a fix to this, so that when I am building models without a guide I can understand what the error is, and debug it with the proper methods?
This is the code snippet that I was talking about above at video timestamp 18:17:55 where the kernel crashes:
model_2(rand_image_tensor.unsqueeze(0).to(device))
Note I am running pytorch version 2.0.1
Thank you so much for you help!!!
Beta Was this translation helpful? Give feedback.
All reactions