Skip to content

Conversation

ChrisRackauckas
Copy link
Member

Summary

This PR adds support for AMD GPU-accelerated linear solving through the new AMDGPUOffloadFactorization algorithm:

• Added AMDGPUOffloadFactorization struct in src/extension_algs.jl with proper error handling when AMDGPU.jl is not loaded
• Created LinearSolveAMDGPUExt extension in ext/LinearSolveAMDGPUExt.jl implementing GPU-offloaded LU factorization using AMDGPU.rocSOLVER
• Added AMDGPU as weak dependency and extension configuration in Project.toml
• Exported AMDGPUOffloadFactorization in src/LinearSolve.jl

Implementation Details

The implementation follows the same pattern as CudaOffloadFactorization, using rocSOLVER.getrf\! for LU factorization and rocSOLVER.getrs\! for solve operations on AMD GPUs via ROCArrays. The algorithm provides GPU acceleration for sufficiently large matrices where the computation benefits outweigh the data transfer costs.

Test plan

  • Verify the extension loads correctly when AMDGPU.jl is available
  • Verify proper error handling when AMDGPU.jl is not loaded
  • Test basic linear solve functionality with AMD GPU hardware
  • Ensure compatibility with existing LinearSolve.jl interfaces

🤖 Generated with Claude Code

ChrisRackauckas and others added 2 commits August 10, 2025 09:25
This commit adds support for AMD GPU-accelerated linear solving through the new AMDGPUOffloadFactorization algorithm:

- Added AMDGPUOffloadFactorization struct in src/extension_algs.jl with proper error handling when AMDGPU.jl is not loaded
- Created LinearSolveAMDGPUExt extension in ext/LinearSolveAMDGPUExt.jl implementing GPU-offloaded LU factorization using AMDGPU.rocSOLVER
- Added AMDGPU as weak dependency and extension configuration in Project.toml
- Exported AMDGPUOffloadFactorization in src/LinearSolve.jl

The implementation follows the same pattern as CudaOffloadFactorization, using rocSOLVER for LU factorization and solve operations on AMD GPUs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…zation

- Renamed AMDGPUOffloadFactorization to AMDGPUOffloadLUFactorization for clarity
- Added AMDGPUOffloadQRFactorization for QR-based solving
- Updated extension to support both LU and QR factorizations
- LU uses rocSOLVER.getrf\! and getrs\!
- QR uses rocSOLVER.geqrf\!, ormqr\!, and rocBLAS.trsv\!

🤖 Generated with Claude Code

Co-Authored-By: Claude <[email protected]>
@ChrisRackauckas
Copy link
Member Author

Updated the PR with the following changes:

  • Renamed AMDGPUOffloadFactorization to AMDGPUOffloadLUFactorization for clarity and consistency
  • Added AMDGPUOffloadQRFactorization for QR-based solving on AMD GPUs
  • Both algorithms follow the same pattern as CudaOffloadFactorization
  • LU factorization uses rocSOLVER.getrf! and getrs!
  • QR factorization uses rocSOLVER.geqrf!, ormqr!, and rocBLAS.trsv!

The implementation now provides two factorization options for AMD GPU offloading, allowing users to choose based on their numerical stability and performance requirements.

using LinearSolve.LinearAlgebra, LinearSolve.SciMLBase

# LU Factorization
function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;
function SciMLBase.solve!(
cache::LinearSolve.LinearCache, alg::AMDGPUOffloadLUFactorization;

cache.cacheval = fact
cache.isfresh = false
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


A_gpu, ipiv = cache.cacheval
b_gpu = AMDGPU.ROCArray(cache.b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

b_gpu = AMDGPU.ROCArray(cache.b)

AMDGPU.rocSOLVER.getrs!('N', A_gpu, ipiv, b_gpu)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

end

# QR Factorization
function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadQRFactorization;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::AMDGPUOffloadQRFactorization;
function SciMLBase.solve!(
cache::LinearSolve.LinearCache, alg::AMDGPUOffloadQRFactorization;

cache.cacheval = (A_gpu, tau)
cache.isfresh = false
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


A_gpu, tau = cache.cacheval
b_gpu = AMDGPU.ROCArray(cache.b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


# Apply Q^T to b
AMDGPU.rocSOLVER.ormqr!('L', 'T', A_gpu, tau, b_gpu)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# Solve the upper triangular system
m, n = size(A_gpu)
AMDGPU.rocBLAS.trsv!('U', 'N', 'N', n, A_gpu, b_gpu)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

(A_gpu, tau)
end

end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
end
end

@ChrisRackauckas ChrisRackauckas merged commit 9400fb7 into main Aug 10, 2025
104 of 120 checks passed
@ChrisRackauckas ChrisRackauckas deleted the amdgpu-offload-factorization branch August 10, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant