-
-
Notifications
You must be signed in to change notification settings - Fork 61
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug π
Nonlinear least squares does not support arbitrary dimension residual shape when using CuArrays aka the case when size(du) != size(u)
Expected behavior
I expected same behaviour as the cpu version with Arrays
Minimal Reproducible Example π
using NonlinearSolve
using ForwardDiff # Optional, I used it to check jacobian correctness
using CUDA # This is needed
# shape residuals != shape u
N = 4
du,u,p,jac = (zeros(N),randn(2),ones(N),zeros(N,2))
function my_residuals!(du,u,p)
for idx in eachindex(du)
if (idx-1) % 2 == 0
du[idx] = u[1]^2 - p[idx]
else
du[idx] = u[2]^2 - p[idx]
end
end
end
function my_jac!(jac,u,p)
indices = CartesianIndices((N,2))
for _idx in indices
idx = _idx[1]
if (idx-1) % 2 == 0
jac[idx,1] = 2u[1]
else
jac[idx,2] = 2u[2]
end
end
end
# Checking that my jacobian is correct using autodiff
@assert (my_jac!(jac,u,p);reshape(jac,:,2)) β ForwardDiff.jacobian((du,u)->my_residuals!(du,u,p),du,u)
# Let's solve
prob = NonlinearLeastSquaresProblem(NonlinearFunction(my_residuals!; resid_prototype = du, jac=my_jac! ,jac_prototype = jac),u,p)
solve(prob, GaussNewton())
# CPU VERSION WORKS! BUT THE GPU VERSION DOES NOT:
du,u,p,jac = (du,u,p,jac) .|> cu
prob = NonlinearLeastSquaresProblem(NonlinearFunction(my_residuals!; resid_prototype = du, jac=my_jac! ,jac_prototype = jac),u,p)
CUDA.@allowscalar solve(prob, GaussNewton())
# THIS ALSO FAILS
prob = NonlinearLeastSquaresProblem(NonlinearFunction(my_residuals!; resid_prototype = du),u,p)
CUDA.@allowscalar solve(prob, GaussNewton())Error & Stacktrace
ERROR: DimensionMismatch: matrix is not square: dimensions are (4, 2)
Stacktrace:
checksquare at LinearAlgebra.jl
cholesky!(A::CuArray{Float32, 2, CUDA.DeviceMemory}, ::LinearAlgebra.NoPivot; check::Bool) at cholesky.jl
cholesky! at cholesky.jl (repeats 2 times)
#_cholesky#175 at cholesky.jl
_cholesky at cholesky.jl
#cholesky#171 at cholesky.jl
cholesky at cholesky.jl (repeats 2 times)
init_cacheval at factorization.jl
macro expansion at default.jl
init_cacheval(alg::LinearSolve.DefaultLinearSolver, A::CuArray{β¦}, b::CuArray{β¦}, u::CuArray{β¦}, Pl::IdentityOperator, Pr::IdentityOperator, maxiters::Int64, abstol::Float32, reltol::Float32, verbose::LinearVerbosity{β¦}, assump::OperatorAssumptions{β¦}) at default.jl
__init(::LinearProblem{β¦}, ::LinearSolve.DefaultLinearSolver; alias::LinearAliasSpecifier, abstol::Float32, reltol::Float32, maxiters::Int64, verbose::SciMLLogging.None, Pl::Nothing, Pr::Nothing, assumptions::OperatorAssumptions{β¦}, sensealg::LinearSolveAdjoint{β¦}, kwargs::Base.Pairs{β¦}) at common.jl
__init at common.jl
#init#15 at common.jl
init at common.jl
#init#169 at default.jl
init at default.jl
#construct_linear_solver#42 at linear_solve.jl
construct_linear_solver at linear_solve.jl
#init#113 at newton.jl
init at newton.jl
__init(::NonlinearLeastSquaresProblem{β¦}, ::GeneralizedFirstOrderAlgorithm{β¦}; stats::SciMLBase.NLStats, alias_u0::Bool, maxiters::Int64, abstol::Nothing, reltol::Nothing, maxtime::Nothing, termination_condition::Nothing, internalnorm::typeof(NonlinearSolveBase.L2_NORM), verbose::NonlinearVerbosity{β¦}, linsolve_kwargs::@NamedTuple{}, initializealg::NonlinearSolveBase.NonlinearSolveDefaultInit, kwargs::Base.Pairs{β¦}) at solve.jl
__init at solve.jl
#__solve#155 at solve.jl
__solve at solve.jl
#solve_call#150 at solve.jl
solve_call at solve.jl
solve_up(prob::NonlinearLeastSquaresProblem{β¦}, sensealg::Nothing, u0::CuArray{β¦}, p::CuArray{β¦}, args::GeneralizedFirstOrderAlgorithm{β¦}; originator::SciMLBase.ChainRulesOriginator, kwargs::Base.Pairs{β¦}) at solve.jl
solve_up at solve.jl
#solve#148 at solve.jl
solve(prob::NonlinearLeastSquaresProblem{β¦}, args::GeneralizedFirstOrderAlgorithm{β¦}) at solve.jl
top-level scope at MWE_manual_jac.jl
macro expansion at GPUArraysCore.jlEnvironment (please complete the following information):
- Output of
using Pkg; Pkg.status()
[052768ef] CUDA v5.9.5
β
[f68482b8] Cthulhu v2.16.6
[a0c0ee7d] DifferentiationInterface v0.7.12
[31c24e10] Distributions v0.25.122
[7da242da] Enzyme v0.13.109
[f6369f11] ForwardDiff v1.3.0
[e9467ef8] GLMakie v0.13.8
[5c1252a2] GeometryBasics v0.5.10
[5903a43b] Infiltrator v1.9.5
β [da2b9cff] Mooncake v0.4.183
β [b8a86587] NearestNeighbors v0.4.25
[8913a72c] NonlinearSolve v4.12.0
[429524aa] Optim v1.13.3
[7f7a1694] Optimization v5.2.0
[36348300] OptimizationOptimJL v0.4.8
[42171d58] PlyIO v1.2.0
[d236fae5] PreallocationTools v0.4.34
β [3c362404] Reactant v0.2.182
[6038ab10] Rotations v1.7.1
[90137ffa] StaticArrays v1.9.15
[bc48ee85] Tullio v0.3.8
[e88e6eb3] Zygote v0.7.10
β
[0192cb87] Reactant_jll v0.0.270+0- Output of
versioninfo()
Julia Version 1.12.2
Commit ca9b6662be (2025-11-20 16:25 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 Γ AMD Ryzen 7 1700 Eight-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver1)
GC: Built with stock GC
Threads: 16 default, 1 interactive, 16 GC (on 16 virtual cores)
Environment:
JULIA_EDITOR = code
JULIA_VSCODE_REPL = 1Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working