Resolve Metal.jl type instability for saveat and literals; Metal and AMDGPU build fix #381

Ambar-13 · 2025-10-30T14:19:59Z

This PR fixes InvalidIRError: unsupported use of double value crashes encountered when using EnsembleGPUKernel on the Metal.jl backend, which was originally reported in Issue #379.

The fixes are in two main parts:

Insrc/ensemblegpukernel/nlsolve/utils.jl to resolve issues related to Metal and AMDGPU.
nlsolve Stiff Solver Bug: The unsupported dynamic function invocation error (which was failing AMDGPU and Metal) is fixed by replacing norm() with the GPU-safe diffeqgpunorm(..., t). The successful Metal, AMDGPU (and OneAPI Julia v1) builds show that the previous issues have been resolved.

2.saveat Type Instability (Fixes #379):
* In src/ensemblegpukernel/lowerlevel_solve.jl, the saveat argument (when a Number or AbstractRange) was being converted into a StepRangeLen which had internal Float64 fields, crashing the Float32 Metal kernel.
* This PR changes the logic to convert saveat into a Vector{Tt} (where Tt is the problem's time type) before passing it to the kernel.
* This also includes fixes for UndefVarError (by initializing saveat_converted = nothing) and adds the missing adapt(backend, ...) call to the SDE solver.
Numeric Literal Instability:
* In src/ensemblegpukernel/perform_step/gpu_tsit5_perform_step file, hardcoded numeric literals (e.g., 1.0f-14) were used.
* These have been wrapped with T(...) (e.g., T(1.0e-14)) to ensure they match the kernel's float type, preventing type instability.

These changes ensure robust type handling for `Float32` problems on backends without `Float64` support, while remaining compatible with standard `Float64` usage on CUDA and CPU.

Note for Maintainers: I do not have access to Apple Metal hardware, so I was unable to test this fix locally. Requesting verification from a maintainer or user with a Metal-capable device.

Fixes type instability errors (`InvalidIRError: unsupported use of double value`) encountered when using `EnsembleGPUKernel` on the `Metal.jl` backend (Apple M-series GPUs), specifically: 1. Issue SciML#379: Corrects the handling of the `saveat` argument in `src/ensemblegpukernel/lowerlevel_solve.jl`. Converts `saveat` (whether a `Number`, `AbstractRange`, or `AbstractArray`) into a `Vector{Tt}` matching the problem's time type (`Tt = eltype(prob.tspan)`) before passing it to the GPU kernel. This prevents the internal `Float64` fields within `StepRangeLen` from causing compilation errors. Includes edge case handling for `saveat=0.0`. 2. Related Literal Fix: Wraps hardcoded numeric literals (e.g., `1e-7`, `1e-14`) with `Tt(...)` in various `src/ensemblegpukernel/perform_step/` files (like `gpu_em_perform_step.jl`, `gpu_tsit5_perform_step.jl`, etc.) to ensure type consistency within the GPU kernels, addressing issues similar to those discussed on Zulip/Slack. These changes ensure robust type handling for `Float32` problems on backends without `Float64` support, while remaining compatible with standard `Float64` usage on CUDA and CPU.

Fix: Resolve Metal.jl type instability for saveat and literals Fixes type instability errors (`InvalidIRError: unsupported use of double value`) encountered when using `EnsembleGPUKernel` on the `Metal.jl` backend (Apple M-series GPUs), specifically: 1. Issue SciML#379: Corrects the handling of the `saveat` argument in `src/ensemblegpukernel/lowerlevel_solve.jl`. Converts `saveat` (whether a `Number`, `AbstractRange`, or `AbstractArray`) into a `Vector{Tt}` matching the problem's time type (`Tt = eltype(prob.tspan)`) before passing it to the GPU kernel. This prevents the internal `Float64` fields within `StepRangeLen` from causing compilation errors. Includes edge case handling for `saveat=0.0`. 2. Related Literal Fix: Wraps hardcoded numeric literals (e.g., `1e-7`, `1e-14`) with `Tt(...)` in various `src/ensemblegpukernel/perform_step/` files (like `gpu_em_perform_step.jl`, `gpu_tsit5_perform_step.jl`, etc.) to ensure type consistency within the GPU kernels, addressing issues similar to those discussed on Zulip/Slack. These changes ensure robust type handling for `Float32` problems on backends without `Float64` support, while remaining compatible with standard `Float64` usage on CUDA and CPU. Fixes SciML#379

Initializes `saveat_converted = nothing` in both `vectorized_solve` functions (for ODEs and SDEs) to fix an `UndefVarError`. Also adds the missing `adapt(backend, ...)` call in the SDE function to move the converted `saveat` vector to the GPU.

Ambar-13 · 2025-10-30T17:22:15Z

Hello!
I’ve pushed a few additional commits to address the remaining issues.

The latest CI run shows that Metal and AMDGPU tests are now passing, confirming that the fixes in this branch successfully resolve the backend-related issues.
The remaining failures on CUDA and oneAPI (Julia v1.1) appear to be pre-existing CI issues (cuMemFreeAsync and OpenSSL_jll errors).

Thank you for your time and for maintaining this package!

— Ambar

Updates the `saveat` range tests to compare the result against a `collect`ed Vector (Vector{Tt}) instead of a StepRangeLen. This matches the new, correct output type from the `saveat` fix.

This reverts commit e557187.

Ambar-13 added 5 commits October 30, 2025 00:40

Critical fixes on lowerlevel_solve.jl

a377b43

Fixes to integrator

fa8ec8a

Ambar-13 changed the title ~~Resolve Metal.jl type instability for saveat and literals~~ Resolve Metal.jl type instability for saveat and literals; Metal and AMDGPU build fix Oct 31, 2025

Ambar-13 marked this pull request as draft November 1, 2025 16:02

Ambar-13 added 7 commits November 2, 2025 01:04

Enable saveat tests on Metal backen

c523734

fixes for saveat

d9342e8

Update to match new Vector output type

e57a647

Updates the `saveat` range tests to compare the result against a `collect`ed Vector (Vector{Tt}) instead of a StepRangeLen. This matches the new, correct output type from the `saveat` fix.

Update lowerlevel_solve.jl

e3d09b3

Update lowerlevel_solve.jl

028e9bf

Revert "Update lowerlevel_solve.jl"

2368c7a

This reverts commit e557187.

Changes to Integrator

aac6f57

Ambar-13 force-pushed the fix-metal-saveat-types branch from 099a9ac to aac6f57 Compare November 1, 2025 17:05

Ambar-13 closed this Nov 1, 2025

Ambar-13 deleted the fix-metal-saveat-types branch November 1, 2025 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Resolve Metal.jl type instability for saveat and literals; Metal and AMDGPU build fix #381

Resolve Metal.jl type instability for saveat and literals; Metal and AMDGPU build fix #381

Uh oh!

Ambar-13 commented Oct 30, 2025 •

edited

Loading

Uh oh!

Ambar-13 commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Resolve Metal.jl type instability for saveat and literals; Metal and AMDGPU build fix #381

Resolve Metal.jl type instability for saveat and literals; Metal and AMDGPU build fix #381

Uh oh!

Conversation

Ambar-13 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

These changes ensure robust type handling for Float32 problems on backends without Float64 support, while remaining compatible with standard Float64 usage on CUDA and CPU.

Uh oh!

Ambar-13 commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ambar-13 commented Oct 30, 2025 •

edited

Loading

These changes ensure robust type handling for `Float32` problems on backends without `Float64` support, while remaining compatible with standard `Float64` usage on CUDA and CPU.