Skip to content

Conversation

@luclmt
Copy link

@luclmt luclmt commented Jul 17, 2025

This function queries the available GPUs on the system and determines which one has
the highest amount of free memory. It uses PyTorch's CUDA APIs instead of nvidia-smi,
ensuring better compatibility with the rest of the code and reliability when
environment variables like CUDA_VISIBLE_DEVICES are used.

Using nvidia-smi together with CUDA_VISIBLE_DEVICES can lead to errors like this:

torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal

because nvidia-smi shows all GPUs visible to the system, while CUDA_VISIBLE_DEVICES only limits the GPUs visible at the CUDA API level.

Use torch instead of nvidia-smi to get free mem gpu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant