Skip to content

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS” #28

@pratheeshkumar99

Description

@pratheeshkumar99

I am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) environment. Below are the details and issues I’m facing:

1.Device Selection Issue:

device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps or torch.backends.mps.is_available() else "cpu" print("Using device:", device)

Although MPS is detected and selected, training crashes immediately after starting, with the following error:

/opt/anaconda3/envs/mynewenv/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ').

2.CPU Training:

When I switch to CPU training on the same machine, it runs without any issues using the same batch size of 8.

3.Google Colab Training:

There are no issues when running the same code on Google Colab.

I’m looking for insights into what might be causing these issues on MPS and how I could resolve them. Specifically, I’d like to understand the semaphore leak and bus error that seems to occur only when using MPS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions