-
Notifications
You must be signed in to change notification settings - Fork 393
Description
I am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) environment. Below are the details and issues I’m facing:
1.Device Selection Issue:
device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps or torch.backends.mps.is_available() else "cpu" print("Using device:", device)
Although MPS is detected and selected, training crashes immediately after starting, with the following error:
/opt/anaconda3/envs/mynewenv/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ').
2.CPU Training:
When I switch to CPU training on the same machine, it runs without any issues using the same batch size of 8.
3.Google Colab Training:
There are no issues when running the same code on Google Colab.
I’m looking for insights into what might be causing these issues on MPS and how I could resolve them. Specifically, I’d like to understand the semaphore leak and bus error that seems to occur only when using MPS.