Skip to content

Fix CUDA tests and docs build#198

Open
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:mainfrom
ChrisRackauckas-Claude:fix-cuda-tests
Open

Fix CUDA tests and docs build#198
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:mainfrom
ChrisRackauckas-Claude:fix-cuda-tests

Conversation

@ChrisRackauckas-Claude
Copy link
Contributor

Summary

Fixes the CUDA test and documentation build failures from the GHA migration (PR #197).

Changes

1. Fix runtests.jl to use BACKEND_GROUP instead of GROUP

The GPU.yml workflow sets BACKEND_GROUP=CUDA but the tests were looking for GROUP env var. This caused the tests to default to CPU mode and fail when trying to load LuxCUDA (which wasn't needed for CPU tests).

-const GROUP = uppercase(get(ENV, "GROUP", "CPU"))
+const BACKEND_GROUP = uppercase(get(ENV, "BACKEND_GROUP", get(ENV, "GROUP", "CPU")))

2. Add LuxCUDA to test dependencies

shared_testsetup.jl tries to using LuxCUDA when running CUDA tests, but LuxCUDA was missing from the test dependencies in Project.toml. This caused the LoadError:

LoadError: ArgumentError: Package LuxCUDA not found in current path.

3. Add LocalPreferences.toml for docs build (V100 compatibility)

The documentation build failed on demeter4 (V100 GPU runners) due to CUDA version incompatibility. Following the pattern from OrdinaryDiffEq.jl and the fix documented in ChrisRackauckas/InternalJunk#19:

  • Pin CUDA runtime to 12.6
  • Disable forward-compat driver (V100 runners need the system driver since CUDA_Driver_jll v13+ drops compute capability 7.0 support)
  • Add CUDA_Driver_jll and CUDA_Runtime_jll to docs deps

Related Issues

Fixes: ChrisRackauckas/InternalJunk#22

Changes:
1. Fix runtests.jl to use BACKEND_GROUP instead of GROUP env var
   - The GPU.yml workflow sets BACKEND_GROUP=CUDA but tests were looking for GROUP
   - Now tests properly run when BACKEND_GROUP=CUDA is set

2. Add LuxCUDA to test dependencies
   - shared_testsetup.jl tries to 'using LuxCUDA' when running CUDA tests
   - LuxCUDA was missing from test deps causing LoadError

3. Add LocalPreferences.toml for docs build (V100 compatibility)
   - Pin CUDA runtime to 12.6 and disable forward-compat driver
   - Fixes demeter4 V100 runners where CUDA_Driver_jll v13+ drops CC 7.0 support
   - Add CUDA_Driver_jll and CUDA_Runtime_jll to docs deps

Fixes: ChrisRackauckas/InternalJunk#22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants