Add CUDA extension for GPU-accelerated solvers by zazabap · Pull Request #574 · JuliaManifolds/Manopt.jl

zazabap · 2026-02-11T16:18:20Z

Summary

Adds ManoptCUDAExt package extension enabling Manopt solvers to work transparently with CuArray-backed manifold points on NVIDIA GPUs.

What the extension does

Overrides ManifoldsBase.allocate for CuArray (4 methods) so workspace buffers stay on GPU
Patches linesearch_backtrack! to handle CPU workspace / GPU iterate type mismatches
Adds CUDA as a weak dependency ([weakdeps] + [extensions]) with compat CUDA = "5"

Files changed

File	Description
`ext/ManoptCUDAExt.jl`	Extension module (109 lines)
`test/test_cuda_ext.jl`	GPU tests for allocate + solver paths
`Project.toml`	CUDA weak dep, extension registration, compat
`Changelog.md`	v0.5.33 entry

Solver and Stepsize Coverage

Tested

gradient_descent + ConstantLength on Euclidean
gradient_descent + ArmijoLinesearch (default) on Euclidean

Patched but not tested

NonmonotoneLinesearchStepsize — patched via same linesearch_backtrack! override
conjugate_gradient_descent — uses same Armijo default, should work

Likely work (no workspace arrays), not tested

DecreasingStepsize, PolyakStepsize, AdaptiveWNGradient, ConstantStepsize
WolfePowellBinaryLinesearchStepsize — allocating retract_fused, benefits from allocate override
Stateless direction rules: IdentityUpdateRule, ConjugateDescent, FletcherReeves
Scalar stopping criteria: GradientNormLess, CostLess, CostChangeLess, StepsizeLess

Will break — requires base-package changes

These stepsizes pre-allocate candidate_point::P via allocate_result(M, rand) at construction time (before iterate type is known) and use it directly in retract_fused!, bypassing linesearch_backtrack!:

WolfePowellLinesearchStepsize — affects quasi_Newton default
CubicBracketingLinesearchStepsize — candidate_point + candidate_direction
DomainBackTrackingStepsize — affects convex_bundle_method
NullStepBackTrackingStepsize — candidate_point + X
ProximalGradientMethodBacktrackingStepsize — affects proximal_gradient_method

Possible fix: lazy-init candidate_point on first solver step (when iterate type is known), or make the field non-parametric.

Will break — direction/stopping criterion issues

MomentumGradientRule, AverageGradientRule, NesterovRule — stored state + vector transport type mismatch
DirectionUpdateRuleStorage / StoreStateAction — stores CPU iterates
StopWhenIterateNaN — any(isnan.(get_iterate(s))) triggers scalar indexing on CuArray

Manifolds.jl GPU Compatibility (tested locally)

The Manopt extension alone is sufficient for Euclidean and Sphere manifolds. Other manifolds need work in Manifolds.jl:

Manifold	retract_fused	inner/norm	vector_transport	distance
Euclidean	all OK	OK	OK	OK
Sphere	Exp OK, Proj OK	OK	Parallel OK	OK
Stiefel	Polar OK, Exp FAIL	OK	FAIL	FAIL
Rotations	Exp FAIL, Proj FAIL	OK	FAIL	FAIL
SPD	Exp OK, Proj FAIL	FAIL	FAIL	FAIL
Hyperbolic	FAIL	FAIL	FAIL	FAIL

Three failure modes in Manifolds.jl (not in scope for this PR):

Scalar indexing (Stiefel exp, Rotations, Hyperbolic) — implementations use element-wise p[i,j] access or CPU-only eigen/exp
copyto! ambiguity (SPD) — copyto!(CuArray, Symmetric{CuArray}) is ambiguous between LinearAlgebra and GPUArrays
Missing GPU dispatch — matrix exp/log/eigen don't dispatch to CUSOLVER for CuArray

A ManifoldsCUDAExt in Manifolds.jl would be needed to address these.

Test plan

test/test_cuda_ext.jl — GPU allocate, GD + ConstantLength, GD + Armijo
conjugate_gradient_descent with CuArray
Matrix-valued iterates (Euclidean(n,n))
Float32 iterates
record=[:Cost] + return_state=true
Sphere manifold end-to-end
CI with CUDA runner (tested locally on RTX 3090)
Verify no regressions on CPU-only paths

Add ManoptCUDAExt package extension that enables Manopt solvers (gradient_descent, conjugate_gradient_descent) to work transparently with CuArray-backed manifold points. The extension overrides ManifoldsBase.allocate for CuArray and patches linesearch_backtrack! to handle CPU workspace / GPU iterate mismatches. Supported stepsizes: ConstantLength, ArmijoLinesearch, NonmonotoneLinesearch. Includes tests for both allocate and solver paths on Euclidean manifolds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zazabap · 2026-02-11T16:31:47Z

Audit: Unresolved & Untested Items

What the extension currently covers

ManifoldsBase.allocate overrides for CuArray (4 methods)
linesearch_backtrack! patch for CPU workspace / GPU iterate mismatch

Solvers

gradient_descent (ConstantLength + Armijo)
conjugate_gradient_descent — claimed supported, not tested
quasi_Newton — will break (default WolfePowellLinesearch uses candidate_point directly)
stochastic_gradient_descent, subgradient_method, FrankWolfe — likely work (scalar stepsizes), not tested
All other solvers (20+) — not tested

Stepsize Types

ConstantStepsize — works (no workspace)
ArmijoLinesearchStepsize — patched via linesearch_backtrack!
NonmonotoneLinesearchStepsize — patched, not tested
WolfePowellLinesearchStepsize — will break (uses retract_fused!(M, a.candidate_point, p, ...) directly, not via linesearch_backtrack!)
CubicBracketingLinesearchStepsize — will break (same pattern + candidate_direction)
WolfePowellBinaryLinesearchStepsize — uses allocating retract_fused, should work via allocate override, not tested
DomainBackTrackingStepsize, NullStepBackTrackingStepsize, ProximalGradientMethodBacktrackingStepsize — will break (pre-allocated candidate_point)
DecreasingStepsize, PolyakStepsize, AdaptiveWNGradient, DistanceOverGradients, AffineCovariant — likely work, not tested

Direction Update Rules

IdentityUpdateRule — works
MomentumGradientRule, AverageGradientRule, NesterovRule — will break (stored state + vector transport type mismatch)
DirectionUpdateRuleStorage — will break (StoreStateAction stores iterates)
Stateless coefficient rules (ConjugateDescent, FletcherReeves, etc.) — likely work, not tested

Stopping Criteria

StopAfterIteration — works
StopWhenIterateNaN — will break (any(isnan.(get_iterate(s))) on CuArray triggers scalar indexing)
StopWhenChangeLess, StopWhenEntryChangeLess — risk from StoreStateAction type mismatch
Scalar criteria (GradientNormLess, CostLess, etc.) — likely work, not tested

Root cause for "will break" items

All candidate_point issues stem from stepsize constructors calling allocate_result(M, rand) at construction time (before the iterate type is known), storing the result in a parametric field candidate_point::P where P = Vector{Float64}. The extension cannot fix this — it requires a base-package change (e.g. lazy-init or non-parametric field).

Missing test coverage

conjugate_gradient_descent with CuArray
Matrix-valued iterates (Euclidean(n,n))
Float32 iterates
record=[:Cost] + return_state=true
Non-Euclidean manifolds (Sphere, Stiefel)
Transpose/Adjoint CuArray wrappers (union CUDAManifoldPoint defined but unused)
CI with CUDA runner

mateuszbaran

Thanks for starting this. I think it would be good to have a well-designed solution here so it will require a bit of discussion.

mateuszbaran · 2026-02-11T17:28:38Z

ext/ManoptCUDAExt.jl

+function ManifoldsBase.allocate(::ManifoldsBase.AbstractManifold, p::CuArray{T,N}) where {T,N}
+    return CuArray{T,N}(undef, size(p))
+end
+
+function ManifoldsBase.allocate(
+    ::ManifoldsBase.AbstractManifold, p::CuArray{T,N}, ::Type{S}
+) where {T,N,S}
+    return CuArray{S,N}(undef, size(p))
+end
+
+# Allocate without manifold argument (used by some ManifoldsBase paths)
+function ManifoldsBase.allocate(p::CuArray{T,N}) where {T,N}
+    return CuArray{T,N}(undef, size(p))
+end
+
+function ManifoldsBase.allocate(p::CuArray{T,N}, ::Type{S}) where {T,N,S}
+    return CuArray{S,N}(undef, size(p))
+end


Could you write a test which is fixed by these methods? I would expect these to not be necessary. All added allocation tests pass on unpatched JuliaManifolds.

mateuszbaran · 2026-02-11T17:36:52Z

ext/ManoptCUDAExt.jl

+function Manopt.linesearch_backtrack!(
+    M::ManifoldsBase.AbstractManifold,
+    q::Array,       # CPU workspace (from stepsize.candidate_point)
+    f::TF,
+    p::CuArray,     # GPU iterate
+    s,
+    decrease,
+    contract,
+    η;
+    kwargs...,
+) where {TF}
+    q_gpu = ManifoldsBase.allocate(M, p)
+    return Manopt.linesearch_backtrack!(M, q_gpu, f, p, s, decrease, contract, η; kwargs...)
+end


I think a better solution would be to extend _produce_type to pass a point along the manifold, which would allow us to properly propagate the type. @kellertuer , what do you think about it?

Effectively _produce_type(stepsize, M) would be _produce_type(stepsize, M, p) and then ArmijoLinesearchStepsize constructor could use that p to initialize candidate_point::P = allocate(p).

While it is an internal function and we could even break that without having a breaking change, it would even be possible to do p = rand(M) or so in order pass the point type a well; but sure, we could do that.

kellertuer · 2026-02-12T00:04:27Z

Thanks for the start. I can see that the initial post is for sure AI, a bit verbose, but besides that a bit of road map is a good idea for sure

Here is a few generic comments not directly related to specific code lines

Overrides ManifoldsBase.allocate

We should probably move those to ManifoldsBase if they are really necessary; similarly if manifold do need updates to be “CUDA ready” that should probably go into an extension within Manifolds.jl so other people using manifolds would benefit as well.

Possible fix: lazy-init candidate_point on first solver step

This is absolutely not possible. For performance reasons the point and tangent vectors types (with parameters P and T are parameters of all structs they appear in, so if we postpone their initialisation for the initialisation (also not first solver step but before that) we can not offer that parametric type any longer (probably have to type the field ::Any) and that would be a huge performance loss also for the CPU case.

Will break — direction/stopping criterion issues [...]

not sure what this refers to, but things breaking is not something I would prefer

Manifolds.jl GPU Compatibility

This should be a PR in Manifolds.jl then.

I do not understand the points listed last in the “test plan”.

kellertuer · 2026-02-11T23:59:35Z

ext/ManoptCUDAExt.jl

+# Note: WolfePowellLinesearchStepsize and CubicBracketingLinesearchStepsize
+# use candidate_point directly (not via linesearch_backtrack!). GPU support
+# for these stepsizes requires changes to the Manopt.jl base package to make
+# the candidate_point field non-parametric or to add a dispatch hook. For now,


What is a parametric field? Unclear what this refers to, but I would prefer generic solutions that work with as many step sizes as possible; for best of cases all.

kellertuer · 2026-02-12T00:05:54Z

test/test_cuda_ext.jl

+            p0_cpu = zeros(10)
+            result_cpu = gradient_descent(
+                M, f_cpu, grad_f_cpu, p0_cpu;
+                stopping_criterion=StopAfterIteration(100),
+                stepsize=ConstantLength(0.1),
+            )


For the quadratic task above this is not necessary, you know the closed form solution to just be target.

zazabap · 2026-02-13T16:04:42Z

For the Manopt.jl, if using manifold structure defined in the Manifold.jl not on CUDA array, it might cause some problems like overhead. The computation cost is intended to be extremely high in such case. I am still trying to grab the core structure for how manifold.jl and manopt.jl interact so it will take me a bit more time to think a better method for implementation. For the table listed here it is mandate to update the original Manifold.jl for CUDA extension.

Manifold	retract_fused	inner/norm	vector_transport	distance
Euclidean	all OK	OK	OK	OK
Sphere	Exp OK, Proj OK	OK	Parallel OK	OK
Stiefel	Polar OK, Exp FAIL	OK	FAIL	FAIL
Rotations	Exp FAIL, Proj FAIL	OK	FAIL	FAIL
SPD	Exp OK, Proj FAIL	FAIL	FAIL	FAIL
Hyperbolic	FAIL	FAIL	FAIL	FAIL

I am also considering directly modify the Manifoldbase in the Manifold.jl. --> Since the update is made in the #577 I will follow it and later merge into my PR.

For the lazy-init, break point and some other features. I will remove the opinions and keep the code more efficient in the coding region and resolve your comments, please give me slightly a bit more time.

I apologize the current PR is still a bit verbose. I will spend more time to review the changes and update the change.

mateuszbaran · 2026-02-13T16:11:48Z

Please wait with further changes to Manopt.jl until #577 is finished, it will resolve many issues encountered here.

I think the changes required in Manifolds.jl are largely independent, so you could try addressing those.

kellertuer · 2026-02-13T21:47:24Z

I am still trying to grab the core structure for how manifold.jl and manopt.jl interact

The general scheme and idea here is that these are very modular:

ManifoldsBase.jl defines an interface for defining/using manifolds, e.g. it declares function signatures like exp(M, p, X) and exp!(M, q, p, X) (working in place of given memory q.)
Manopt.jl only has this interface package ManifoldsBase.jl as a dependency and hence its algorithms aim to work (and hopefully do work) on arbitrary manifolds. Manopt is not aware of any concrete manifold, only the interface
Manifolds.jl is a library of manifolds using the interface ManifoldsBase.jl.

The computation cost is intended to be extremely high in such case.

I do not understand this sentence. high cost is never intended.

Of the manifold you list, only two use specific types (not mandatory):

SPD uses an internal structure (optionally) to store the matrix sqrt (and its inverse) since they are computed very often, but if you do not use the and “just an abstract array” it should work at the cost of repeated svd computations
Hyperbolic works just on arrays in the Hyperboloid model. For Poincaré half-line and Poincaré the abstract array to represent these are wrapped in point/vector types to distinguish which of the three representation is used (Hyperboloid has an optional wrapper for completeness.

A probably interesting case that does always need an internal structure is the fixed rank manifold, but also that is just a parametric type storing 3 abstract arrays (truncated svd) – so that should also be possible (hopefully as easy as the others).

For the table, I do not understand what “OK” and “Fail” actually mean.

mateuszbaran reviewed Feb 11, 2026

View reviewed changes

kellertuer reviewed Feb 12, 2026

View reviewed changes

mateuszbaran mentioned this pull request Feb 12, 2026

Add point capability to _produce_type to improve GPU compatibility #577

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA extension for GPU-accelerated solvers#574

Add CUDA extension for GPU-accelerated solvers#574
zazabap wants to merge 1 commit intoJuliaManifolds:masterfrom
zazabap:cuda-gpu-extension

zazabap commented Feb 11, 2026 •

edited

Loading

Uh oh!

zazabap commented Feb 11, 2026

Uh oh!

mateuszbaran left a comment

Uh oh!

mateuszbaran Feb 11, 2026

Uh oh!

mateuszbaran Feb 11, 2026

Uh oh!

mateuszbaran Feb 11, 2026

Uh oh!

kellertuer Feb 11, 2026

Uh oh!

kellertuer commented Feb 12, 2026

Uh oh!

kellertuer Feb 11, 2026

Uh oh!

kellertuer Feb 12, 2026

Uh oh!

zazabap commented Feb 13, 2026 •

edited

Loading

Uh oh!

mateuszbaran commented Feb 13, 2026

Uh oh!

kellertuer commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zazabap commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the extension does

Files changed

Solver and Stepsize Coverage

Tested

Patched but not tested

Likely work (no workspace arrays), not tested

Will break — requires base-package changes

Will break — direction/stopping criterion issues

Manifolds.jl GPU Compatibility (tested locally)

Test plan

Uh oh!

zazabap commented Feb 11, 2026

Audit: Unresolved & Untested Items

What the extension currently covers

Solvers

Stepsize Types

Direction Update Rules

Stopping Criteria

Root cause for "will break" items

Missing test coverage

Uh oh!

mateuszbaran left a comment

Choose a reason for hiding this comment

Uh oh!

mateuszbaran Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

mateuszbaran Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

mateuszbaran Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kellertuer Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kellertuer commented Feb 12, 2026

Uh oh!

kellertuer Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kellertuer Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

zazabap commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mateuszbaran commented Feb 13, 2026

Uh oh!

kellertuer commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zazabap commented Feb 11, 2026 •

edited

Loading

zazabap commented Feb 13, 2026 •

edited

Loading