Add AMDGPUOffloadFactorization algorithm support #708

ChrisRackauckas · 2025-08-10T13:25:34Z

Summary

This PR adds support for AMD GPU-accelerated linear solving through the new AMDGPUOffloadFactorization algorithm:

• Added AMDGPUOffloadFactorization struct in src/extension_algs.jl with proper error handling when AMDGPU.jl is not loaded
• Created LinearSolveAMDGPUExt extension in ext/LinearSolveAMDGPUExt.jl implementing GPU-offloaded LU factorization using AMDGPU.rocSOLVER
• Added AMDGPU as weak dependency and extension configuration in Project.toml
• Exported AMDGPUOffloadFactorization in src/LinearSolve.jl

Implementation Details

The implementation follows the same pattern as CudaOffloadFactorization, using rocSOLVER.getrf\! for LU factorization and rocSOLVER.getrs\! for solve operations on AMD GPUs via ROCArrays. The algorithm provides GPU acceleration for sufficiently large matrices where the computation benefits outweigh the data transfer costs.

Test plan

Verify the extension loads correctly when AMDGPU.jl is available
Verify proper error handling when AMDGPU.jl is not loaded
Test basic linear solve functionality with AMD GPU hardware
Ensure compatibility with existing LinearSolve.jl interfaces

🤖 Generated with Claude Code

This commit adds support for AMD GPU-accelerated linear solving through the new AMDGPUOffloadFactorization algorithm: - Added AMDGPUOffloadFactorization struct in src/extension_algs.jl with proper error handling when AMDGPU.jl is not loaded - Created LinearSolveAMDGPUExt extension in ext/LinearSolveAMDGPUExt.jl implementing GPU-offloaded LU factorization using AMDGPU.rocSOLVER - Added AMDGPU as weak dependency and extension configuration in Project.toml - Exported AMDGPUOffloadFactorization in src/LinearSolve.jl The implementation follows the same pattern as CudaOffloadFactorization, using rocSOLVER for LU factorization and solve operations on AMD GPUs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…zation - Renamed AMDGPUOffloadFactorization to AMDGPUOffloadLUFactorization for clarity - Added AMDGPUOffloadQRFactorization for QR-based solving - Updated extension to support both LU and QR factorizations - LU uses rocSOLVER.getrf\! and getrs\! - QR uses rocSOLVER.geqrf\!, ormqr\!, and rocBLAS.trsv\! 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

ChrisRackauckas · 2025-08-10T13:41:10Z

Updated the PR with the following changes:

Renamed AMDGPUOffloadFactorization to AMDGPUOffloadLUFactorization for clarity and consistency
Added AMDGPUOffloadQRFactorization for QR-based solving on AMD GPUs
Both algorithms follow the same pattern as CudaOffloadFactorization
LU factorization uses rocSOLVER.getrf! and getrs!
QR factorization uses rocSOLVER.geqrf!, ormqr!, and rocBLAS.trsv!

The implementation now provides two factorization options for AMD GPU offloading, allowing users to choose based on their numerical stability and performance requirements.

github-actions · 2025-08-10T13:51:52Z

ext/LinearSolveAMDGPUExt.jl

+using LinearSolve.LinearAlgebra, LinearSolve.SciMLBase
+
+# LU Factorization
+function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;

function SciMLBase.solve!(

cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;

github-actions · 2025-08-10T13:51:52Z

ext/LinearSolveAMDGPUExt.jl

+        cache.cacheval = fact
+        cache.isfresh = false
+    end
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:52Z

ext/LinearSolveAMDGPUExt.jl

+
+    A_gpu, ipiv = cache.cacheval
+    b_gpu = AMDGPU.ROCArray(cache.b)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:52Z

ext/LinearSolveAMDGPUExt.jl

+    b_gpu = AMDGPU.ROCArray(cache.b)
+
+    AMDGPU.rocSOLVER.getrs!('N', A_gpu, ipiv, b_gpu)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:52Z

ext/LinearSolveAMDGPUExt.jl

+end
+
+# QR Factorization
+function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadQRFactorization;


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadQRFactorization;

function SciMLBase.solve!(

cache::LinearSolve.LinearCache, alg::AMDGPUOffloadQRFactorization;

github-actions · 2025-08-10T13:51:53Z

ext/LinearSolveAMDGPUExt.jl

+        cache.cacheval = (A_gpu, tau)
+        cache.isfresh = false
+    end
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:53Z

ext/LinearSolveAMDGPUExt.jl

+
+    A_gpu, tau = cache.cacheval
+    b_gpu = AMDGPU.ROCArray(cache.b)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:53Z

ext/LinearSolveAMDGPUExt.jl

+
+    # Apply Q^T to b
+    AMDGPU.rocSOLVER.ormqr!('L', 'T', A_gpu, tau, b_gpu)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:53Z

ext/LinearSolveAMDGPUExt.jl

+    # Solve the upper triangular system
+    m, n = size(A_gpu)
+    AMDGPU.rocBLAS.trsv!('U', 'N', 'N', n, A_gpu, b_gpu)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2025-08-10T13:51:53Z

ext/LinearSolveAMDGPUExt.jl

+    (A_gpu, tau)
+end
+
+end


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

end

end

ChrisRackauckas and others added 2 commits August 10, 2025 09:25

github-actions bot reviewed Aug 10, 2025

View reviewed changes

ChrisRackauckas merged commit 9400fb7 into main Aug 10, 2025
104 of 120 checks passed

ChrisRackauckas deleted the amdgpu-offload-factorization branch August 10, 2025 16:22

	function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;
	function SciMLBase.solve!(
	cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;


		A_gpu, ipiv = cache.cacheval
		b_gpu = AMDGPU.ROCArray(cache.b)

		b_gpu = AMDGPU.ROCArray(cache.b)

		AMDGPU.rocSOLVER.getrs!('N', A_gpu, ipiv, b_gpu)


		# Apply Q^T to b
		AMDGPU.rocSOLVER.ormqr!('L', 'T', A_gpu, tau, b_gpu)

Uh oh!

Add AMDGPUOffloadFactorization algorithm support #708

Add AMDGPUOffloadFactorization algorithm support #708

Uh oh!

Conversation

ChrisRackauckas commented Aug 10, 2025

Summary

Implementation Details

Test plan

Uh oh!

ChrisRackauckas commented Aug 10, 2025

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants