Skip to content

Fix CUDA V100 compatibility with LocalPreferences.toml#868

Merged
ChrisRackauckas merged 7 commits intoSciML:masterfrom
ChrisRackauckas-Claude:fix-cuda-v100-compatibility
Mar 20, 2026
Merged

Fix CUDA V100 compatibility with LocalPreferences.toml#868
ChrisRackauckas merged 7 commits intoSciML:masterfrom
ChrisRackauckas-Claude:fix-cuda-v100-compatibility

Conversation

@ChrisRackauckas-Claude
Copy link
Contributor

Summary

This PR fixes the CUDA test failures on V100 runners (demeter3/demeter4) that were caused by CUDA driver version compatibility issues.

Problem

The V100 GPUs on demeter4 have a CUDA driver version (570.172) that reports an NVML version mismatch with newer CUDA toolkit versions. Additionally, CUDA_Driver_jll v13+ drops support for compute capability 7.0 (V100).

The tests were failing with:

CUDA error: unknown error (code 999, ERROR_UNKNOWN)

Solution

Following the pattern established in OrdinaryDiffEq.jl PR #3162:

  1. Added LocalPreferences.toml files in both the root directory and lib/SimpleNonlinearSolve/ to:

    • Pin CUDA Runtime to version 12.6
    • Disable the forward-compat driver (compat = "false")
  2. Updated test/runtests.jl and lib/SimpleNonlinearSolve/test/runtests.jl to add CUDA_Driver_jll and CUDA_Runtime_jll packages when running CUDA tests, so the LocalPreferences are properly loaded.

Files Changed

  • LocalPreferences.toml (new)
  • lib/SimpleNonlinearSolve/LocalPreferences.toml (new)
  • test/runtests.jl
  • lib/SimpleNonlinearSolve/test/runtests.jl

References

ChrisRackauckas and others added 7 commits March 19, 2026 08:42
This fixes the CUDA test failures on V100 runners (demeter3/demeter4) by:

1. Adding LocalPreferences.toml files to pin CUDA Runtime to v12.6 and
   disable the forward-compat driver (CUDA_Driver_jll v13+ drops V100
   compute capability 7.0 support)

2. Updating test runtests.jl to add CUDA_Driver_jll and CUDA_Runtime_jll
   packages so the preferences are properly loaded

Fixes: ChrisRackauckas/InternalJunk#24
…ation

ForwardDiff uses scalar indexing which triggers GPU scalar indexing errors.
Use AutoFiniteDiff which works with CUDA arrays.
The default PolyAlgorithm uses AutoForwardDiff internally which does
scalar indexing on GPU arrays. Since there's no way to pass autodiff
to the default algorithm, remove it from GPU tests.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `needs: nonlinearsolve-cuda` to SimpleNonlinearSolve CUDA job so
  they don't compete for GPU memory on the same V100 runner
- Add GC.gc() and CUDA.reclaim() before kernel launch tests to free
  memory from prior test items on shared GPU runners

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `needs` dependency doesn't help since other jobs on the shared
runner contribute to GPU memory pressure regardless. The CUDA.reclaim()
fix in the test file is the actual mitigation.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OOM was transient (other processes on shared runner), not caused by
our tests. The gpu-v100 tag already targets the right runners.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChrisRackauckas ChrisRackauckas merged commit 19cd72e into SciML:master Mar 20, 2026
61 of 65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants