Waiting for GPU memory to become available #30876

cool-RR · 2025-08-10T03:56:14Z

cool-RR
Aug 10, 2025

My code is a CLI app that uses JAX. I often run multiple instances of it in parallel. Sometimes there's not enough GPU memory, so if I start one of the instances, it errors out with a message like jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 1439504488 bytes.

I would like to add a flag to my command that waits for enough GPU memory to become available, and then grab it. Right now I use this:

def get_used_and_total_gpu_ram_mib() -> tuple[int, int]:
    command = 'nvidia-smi --query-gpu=memory.used,memory.total --format=csv,nounits,noheader'
    output = subprocess.check_output(command.split()).decode('ascii').strip().split('\n')
    try:
        line, = output
    except ValueError as value_error:
        raise NotImplementedError('Multiple GPUs not supported') from value_error
    return tuple(map(int, line.split(',')))


def wait_for_gpu_availability() -> None:
    """Wait until GPU memory usage falls below 10%."""
    used_ram_mib, total_ram_mib = get_used_and_total_gpu_ram_mib()
    memory_usage = used_ram_mib / total_ram_mib

    if memory_usage <= 0.1:
        click.echo('GPU is already available, continuing.')
        return

    click.echo('Waiting for GPU to become available...')

    while memory_usage > 0.1:
        time_module.sleep(1)
        used_ram_mib, total_ram_mib = get_used_and_total_gpu_ram_mib()
        memory_usage = used_ram_mib / total_ram_mib

    click.echo('GPU is available, continuing.')

It's not great, because it's not atomic. I want to launch a dozen of instances of my program, each waiting for GPU. If I use this function, they might all start running as soon as enough RAM clears out, but because they're all running at once, only one will grab the RAM and the others will error out. I want an atomic solution, i.e. the program waits until the memory is available, allocates it, and only if it successfully got it does it proceed.

Is that possible with JAX?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Waiting for GPU memory to become available #30876

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Waiting for GPU memory to become available #30876

Uh oh!

Uh oh!

cool-RR Aug 10, 2025

Replies: 0 comments

cool-RR
Aug 10, 2025