Add CUDA extension for GPU-accelerated solvers#574
Add CUDA extension for GPU-accelerated solvers#574zazabap wants to merge 1 commit intoJuliaManifolds:masterfrom
Conversation
Add ManoptCUDAExt package extension that enables Manopt solvers (gradient_descent, conjugate_gradient_descent) to work transparently with CuArray-backed manifold points. The extension overrides ManifoldsBase.allocate for CuArray and patches linesearch_backtrack! to handle CPU workspace / GPU iterate mismatches. Supported stepsizes: ConstantLength, ArmijoLinesearch, NonmonotoneLinesearch. Includes tests for both allocate and solver paths on Euclidean manifolds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Audit: Unresolved & Untested ItemsWhat the extension currently covers
Solvers
Stepsize Types
Direction Update Rules
Stopping Criteria
Root cause for "will break" itemsAll Missing test coverage
|
mateuszbaran
left a comment
There was a problem hiding this comment.
Thanks for starting this. I think it would be good to have a well-designed solution here so it will require a bit of discussion.
| function ManifoldsBase.allocate(::ManifoldsBase.AbstractManifold, p::CuArray{T,N}) where {T,N} | ||
| return CuArray{T,N}(undef, size(p)) | ||
| end | ||
|
|
||
| function ManifoldsBase.allocate( | ||
| ::ManifoldsBase.AbstractManifold, p::CuArray{T,N}, ::Type{S} | ||
| ) where {T,N,S} | ||
| return CuArray{S,N}(undef, size(p)) | ||
| end | ||
|
|
||
| # Allocate without manifold argument (used by some ManifoldsBase paths) | ||
| function ManifoldsBase.allocate(p::CuArray{T,N}) where {T,N} | ||
| return CuArray{T,N}(undef, size(p)) | ||
| end | ||
|
|
||
| function ManifoldsBase.allocate(p::CuArray{T,N}, ::Type{S}) where {T,N,S} | ||
| return CuArray{S,N}(undef, size(p)) | ||
| end |
There was a problem hiding this comment.
Could you write a test which is fixed by these methods? I would expect these to not be necessary. All added allocation tests pass on unpatched JuliaManifolds.
| function Manopt.linesearch_backtrack!( | ||
| M::ManifoldsBase.AbstractManifold, | ||
| q::Array, # CPU workspace (from stepsize.candidate_point) | ||
| f::TF, | ||
| p::CuArray, # GPU iterate | ||
| s, | ||
| decrease, | ||
| contract, | ||
| η; | ||
| kwargs..., | ||
| ) where {TF} | ||
| q_gpu = ManifoldsBase.allocate(M, p) | ||
| return Manopt.linesearch_backtrack!(M, q_gpu, f, p, s, decrease, contract, η; kwargs...) | ||
| end |
There was a problem hiding this comment.
I think a better solution would be to extend _produce_type to pass a point along the manifold, which would allow us to properly propagate the type. @kellertuer , what do you think about it?
There was a problem hiding this comment.
Effectively _produce_type(stepsize, M) would be _produce_type(stepsize, M, p) and then ArmijoLinesearchStepsize constructor could use that p to initialize candidate_point::P = allocate(p).
There was a problem hiding this comment.
While it is an internal function and we could even break that without having a breaking change, it would even be possible to do p = rand(M) or so in order pass the point type a well; but sure, we could do that.
|
Thanks for the start. I can see that the initial post is for sure AI, a bit verbose, but besides that a bit of road map is a good idea for sure Here is a few generic comments not directly related to specific code lines
We should probably move those to
This is absolutely not possible. For performance reasons the point and tangent vectors types (with parameters
not sure what this refers to, but things breaking is not something I would prefer
This should be a PR in Manifolds.jl then. I do not understand the points listed last in the “test plan”. |
| # Note: WolfePowellLinesearchStepsize and CubicBracketingLinesearchStepsize | ||
| # use candidate_point directly (not via linesearch_backtrack!). GPU support | ||
| # for these stepsizes requires changes to the Manopt.jl base package to make | ||
| # the candidate_point field non-parametric or to add a dispatch hook. For now, |
There was a problem hiding this comment.
What is a parametric field? Unclear what this refers to, but I would prefer generic solutions that work with as many step sizes as possible; for best of cases all.
| p0_cpu = zeros(10) | ||
| result_cpu = gradient_descent( | ||
| M, f_cpu, grad_f_cpu, p0_cpu; | ||
| stopping_criterion=StopAfterIteration(100), | ||
| stepsize=ConstantLength(0.1), | ||
| ) |
There was a problem hiding this comment.
For the quadratic task above this is not necessary, you know the closed form solution to just be target.
I am also considering directly modify the Manifoldbase in the Manifold.jl. --> Since the update is made in the #577 I will follow it and later merge into my PR.
I apologize the current PR is still a bit verbose. I will spend more time to review the changes and update the change. |
|
Please wait with further changes to Manopt.jl until #577 is finished, it will resolve many issues encountered here. I think the changes required in Manifolds.jl are largely independent, so you could try addressing those. |
The general scheme and idea here is that these are very modular:
I do not understand this sentence. high cost is never intended. Of the manifold you list, only two use specific types (not mandatory):
A probably interesting case that does always need an internal structure is the fixed rank manifold, but also that is just a parametric type storing 3 abstract arrays (truncated svd) – so that should also be possible (hopefully as easy as the others). For the table, I do not understand what “OK” and “Fail” actually mean. |
Summary
Adds
ManoptCUDAExtpackage extension enabling Manopt solvers to work transparently withCuArray-backed manifold points on NVIDIA GPUs.What the extension does
ManifoldsBase.allocateforCuArray(4 methods) so workspace buffers stay on GPUlinesearch_backtrack!to handle CPU workspace / GPU iterate type mismatches[weakdeps]+[extensions]) with compatCUDA = "5"Files changed
ext/ManoptCUDAExt.jltest/test_cuda_ext.jlProject.tomlChangelog.mdSolver and Stepsize Coverage
Tested
gradient_descent+ConstantLengthonEuclideangradient_descent+ArmijoLinesearch(default) onEuclideanPatched but not tested
NonmonotoneLinesearchStepsize— patched via samelinesearch_backtrack!overrideconjugate_gradient_descent— uses same Armijo default, should workLikely work (no workspace arrays), not tested
DecreasingStepsize,PolyakStepsize,AdaptiveWNGradient,ConstantStepsizeWolfePowellBinaryLinesearchStepsize— allocatingretract_fused, benefits fromallocateoverrideIdentityUpdateRule,ConjugateDescent,FletcherReevesGradientNormLess,CostLess,CostChangeLess,StepsizeLessWill break — requires base-package changes
These stepsizes pre-allocate
candidate_point::Pviaallocate_result(M, rand)at construction time (before iterate type is known) and use it directly inretract_fused!, bypassinglinesearch_backtrack!:WolfePowellLinesearchStepsize— affectsquasi_NewtondefaultCubicBracketingLinesearchStepsize—candidate_point+candidate_directionDomainBackTrackingStepsize— affectsconvex_bundle_methodNullStepBackTrackingStepsize—candidate_point+XProximalGradientMethodBacktrackingStepsize— affectsproximal_gradient_methodPossible fix: lazy-init
candidate_pointon first solver step (when iterate type is known), or make the field non-parametric.Will break — direction/stopping criterion issues
MomentumGradientRule,AverageGradientRule,NesterovRule— stored state + vector transport type mismatchDirectionUpdateRuleStorage/StoreStateAction— stores CPU iteratesStopWhenIterateNaN—any(isnan.(get_iterate(s)))triggers scalar indexing onCuArrayManifolds.jl GPU Compatibility (tested locally)
The Manopt extension alone is sufficient for
EuclideanandSpheremanifolds. Other manifolds need work in Manifolds.jl:Three failure modes in Manifolds.jl (not in scope for this PR):
p[i,j]access or CPU-onlyeigen/expcopyto!ambiguity (SPD) —copyto!(CuArray, Symmetric{CuArray})is ambiguous between LinearAlgebra and GPUArraysA
ManifoldsCUDAExtin Manifolds.jl would be needed to address these.Test plan
test/test_cuda_ext.jl— GPU allocate, GD + ConstantLength, GD + Armijoconjugate_gradient_descentwith CuArrayrecord=[:Cost]+return_state=true