Skip to content

Nonlinear least squares CuArrays support for arbitrary residual shapeΒ #746

@xDerekY

Description

@xDerekY

Describe the bug 🐞

Nonlinear least squares does not support arbitrary dimension residual shape when using CuArrays aka the case when size(du) != size(u)

Expected behavior

I expected same behaviour as the cpu version with Arrays

Minimal Reproducible Example πŸ‘‡

using NonlinearSolve
using ForwardDiff # Optional, I used it to check jacobian correctness
using CUDA # This is needed

# shape residuals != shape u
N = 4
du,u,p,jac = (zeros(N),randn(2),ones(N),zeros(N,2))

function my_residuals!(du,u,p)
    for idx in eachindex(du)
        if (idx-1) % 2 == 0
            du[idx] = u[1]^2 - p[idx]
        else
            du[idx] = u[2]^2 - p[idx]
        end
    end
end

function my_jac!(jac,u,p)    
    indices = CartesianIndices((N,2))
    for _idx in indices
        idx = _idx[1]
        if (idx-1) % 2 == 0
            jac[idx,1] = 2u[1]
        else
            jac[idx,2] = 2u[2]
        end
    end
end

# Checking that my jacobian is correct using autodiff
@assert (my_jac!(jac,u,p);reshape(jac,:,2)) β‰ˆ ForwardDiff.jacobian((du,u)->my_residuals!(du,u,p),du,u)

# Let's solve
prob = NonlinearLeastSquaresProblem(NonlinearFunction(my_residuals!; resid_prototype = du, jac=my_jac! ,jac_prototype = jac),u,p)
solve(prob, GaussNewton())

# CPU VERSION WORKS! BUT THE GPU VERSION DOES NOT:
du,u,p,jac = (du,u,p,jac) .|> cu
prob = NonlinearLeastSquaresProblem(NonlinearFunction(my_residuals!; resid_prototype = du, jac=my_jac! ,jac_prototype = jac),u,p)
CUDA.@allowscalar solve(prob, GaussNewton())

# THIS ALSO FAILS
prob = NonlinearLeastSquaresProblem(NonlinearFunction(my_residuals!; resid_prototype = du),u,p)
CUDA.@allowscalar solve(prob, GaussNewton())

Error & Stacktrace ⚠️

ERROR: DimensionMismatch: matrix is not square: dimensions are (4, 2)
Stacktrace:

checksquare at LinearAlgebra.jl

cholesky!(A::CuArray{Float32, 2, CUDA.DeviceMemory}, ::LinearAlgebra.NoPivot; check::Bool) at cholesky.jl

cholesky! at cholesky.jl (repeats 2 times)

#_cholesky#175 at cholesky.jl

_cholesky at cholesky.jl

#cholesky#171 at cholesky.jl

cholesky at cholesky.jl (repeats 2 times)

init_cacheval at factorization.jl

macro expansion at default.jl

init_cacheval(alg::LinearSolve.DefaultLinearSolver, A::CuArray{…}, b::CuArray{…}, u::CuArray{…}, Pl::IdentityOperator, Pr::IdentityOperator, maxiters::Int64, abstol::Float32, reltol::Float32, verbose::LinearVerbosity{…}, assump::OperatorAssumptions{…}) at default.jl

__init(::LinearProblem{…}, ::LinearSolve.DefaultLinearSolver; alias::LinearAliasSpecifier, abstol::Float32, reltol::Float32, maxiters::Int64, verbose::SciMLLogging.None, Pl::Nothing, Pr::Nothing, assumptions::OperatorAssumptions{…}, sensealg::LinearSolveAdjoint{…}, kwargs::Base.Pairs{…}) at common.jl

__init at common.jl

#init#15 at common.jl

init at common.jl

#init#169 at default.jl

init at default.jl

#construct_linear_solver#42 at linear_solve.jl

construct_linear_solver at linear_solve.jl

#init#113 at newton.jl

init at newton.jl

__init(::NonlinearLeastSquaresProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, maxiters::Int64, abstol::Nothing, reltol::Nothing, maxtime::Nothing, termination_condition::Nothing, internalnorm::typeof(NonlinearSolveBase.L2_NORM), verbose::NonlinearVerbosity{…}, linsolve_kwargs::@NamedTuple{}, initializealg::NonlinearSolveBase.NonlinearSolveDefaultInit, kwargs::Base.Pairs{…}) at solve.jl

__init at solve.jl

#__solve#155 at solve.jl

__solve at solve.jl

#solve_call#150 at solve.jl

solve_call at solve.jl

solve_up(prob::NonlinearLeastSquaresProblem{…}, sensealg::Nothing, u0::CuArray{…}, p::CuArray{…}, args::GeneralizedFirstOrderAlgorithm{…}; originator::SciMLBase.ChainRulesOriginator, kwargs::Base.Pairs{…}) at solve.jl

solve_up at solve.jl

#solve#148 at solve.jl

solve(prob::NonlinearLeastSquaresProblem{…}, args::GeneralizedFirstOrderAlgorithm{…}) at solve.jl

top-level scope at MWE_manual_jac.jl

macro expansion at GPUArraysCore.jl

Environment (please complete the following information):

  • Output of using Pkg; Pkg.status()
[052768ef] CUDA v5.9.5
βŒ… [f68482b8] Cthulhu v2.16.6
  [a0c0ee7d] DifferentiationInterface v0.7.12
  [31c24e10] Distributions v0.25.122
  [7da242da] Enzyme v0.13.109
  [f6369f11] ForwardDiff v1.3.0
  [e9467ef8] GLMakie v0.13.8
  [5c1252a2] GeometryBasics v0.5.10
  [5903a43b] Infiltrator v1.9.5
βŒƒ [da2b9cff] Mooncake v0.4.183
βŒƒ [b8a86587] NearestNeighbors v0.4.25
  [8913a72c] NonlinearSolve v4.12.0
  [429524aa] Optim v1.13.3
  [7f7a1694] Optimization v5.2.0
  [36348300] OptimizationOptimJL v0.4.8
  [42171d58] PlyIO v1.2.0
  [d236fae5] PreallocationTools v0.4.34
βŒƒ [3c362404] Reactant v0.2.182
  [6038ab10] Rotations v1.7.1
  [90137ffa] StaticArrays v1.9.15
  [bc48ee85] Tullio v0.3.8
  [e88e6eb3] Zygote v0.7.10
βŒ… [0192cb87] Reactant_jll v0.0.270+0
  • Output of versioninfo()
Julia Version 1.12.2
Commit ca9b6662be (2025-11-20 16:25 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 Γ— AMD Ryzen 7 1700 Eight-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver1)
  GC: Built with stock GC
Threads: 16 default, 1 interactive, 16 GC (on 16 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_VSCODE_REPL = 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions