Skip to content

Add /scratch_local bind mount for JAX CUDA compilation support #8

@lenardrommel

Description

@lenardrommel

I am not to sure why I get this error, but it happens using python3.11 with luno_experiments.

When using JAX with CUDA in Singularity containers, jobs fail because JAX's CUDA compiler (ptxas) cannot write temporary compilation files to /scratch_local.

jaxlib._jax.XlaRuntimeError: INTERNAL: ptxas exited with non-zero error code 65280, 
output: ptxas fatal : Could not open output file '/scratch_local/USERNAME-JOBID/tmp/tmpxft_XXXXX'

Current workaround:

Change run.yaml to:

mode:
  slurm: 
    pykernel: "singularity exec --bind /mnt:/mnt --bind /scratch_local:/scratch_local --nv python.sif bash -c"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions