-
Notifications
You must be signed in to change notification settings - Fork 256
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Calculating the norm of an Hermitian typed matrix triggers scalar indexing error.
To reproduce
The Minimal Working Example (MWE) for this bug:
using CUDA
using LinearAlgebra
X = CuArray(randn(ComplexF64, 100, 30))
M = Hermitian(X' * X)
@show norm(M)Manifest.toml
[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Compiler_jll", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "GPUToolbox", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics", "demumble_jll"]
git-tree-sha1 = "756f031a1ef3137f497ee73ed595e4acf65d753f"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.9.3"
[deps.CUDA.extensions]
ChainRulesCoreExt = "ChainRulesCore"
EnzymeCoreExt = "EnzymeCore"
SparseMatricesCSRExt = "SparseMatricesCSR"
SpecialFunctionsExt = "SpecialFunctions"
[deps.CUDA.weakdeps]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
SparseMatricesCSR = "a0a7dd2c-ebf4-11e9-1f05-cf50bc540ca1"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
[[deps.CUDA_Compiler_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "b63428872a0f60d87832f5899369837cd930b76d"
uuid = "d1e2174e-dfdc-576e-b43e-73b79eb1aca8"
version = "0.3.0+0"
[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "2023be0b10c56d259ea84a94dbfc021aa452f2c6"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "13.0.2+0"
[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "f9a521f52d236fe49f1028d69e549e7f2644bb72"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "1.0.0"
[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Compiler_jll", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "GPUToolbox", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics", "demumble_jll"]
git-tree-sha1 = "756f031a1ef3137f497ee73ed595e4acf65d753f"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.9.3"
[deps.CUDA.extensions]
ChainRulesCoreExt = "ChainRulesCore"
EnzymeCoreExt = "EnzymeCore"
SparseMatricesCSRExt = "SparseMatricesCSR"
SpecialFunctionsExt = "SpecialFunctions"
[deps.CUDA.weakdeps]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
SparseMatricesCSR = "a0a7dd2c-ebf4-11e9-1f05-cf50bc540ca1"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
[[deps.CUDA_Compiler_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "b63428872a0f60d87832f5899369837cd930b76d"
uuid = "d1e2174e-dfdc-576e-b43e-73b79eb1aca8"
version = "0.3.0+0"
[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "2023be0b10c56d259ea84a94dbfc021aa452f2c6"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "13.0.2+0"
[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "f9a521f52d236fe49f1028d69e549e7f2644bb72"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "1.0.0"
[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "KernelAbstractions", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "ScopedValues", "Serialization", "Statistics"]
git-tree-sha1 = "8ddb438e956891a63a5367d7fab61550fc720026"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "11.2.6"
[deps.GPUArrays.extensions]
JLD2Ext = "JLD2"
[deps.GPUArrays.weakdeps]
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "83cf05ab16a73219e5f6bd1bdfa9848fa24ac627"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.2.0"
[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "PrecompileTools", "Preferences", "Scratch", "Serialization", "TOML", "Tracy", "UUIDs"]
git-tree-sha1 = "9a8b92a457f55165923fcfe48997b7b93b712fca"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "1.7.2"
[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Unicode"]
git-tree-sha1 = "ce8614210409eaa54ed5968f4b50aa96da7ae543"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "9.4.4"
weakdeps = ["BFloat16s"]
[deps.LLVM.extensions]
BFloat16sExt = "BFloat16s"
Expected behavior
An efficient norm calculation on the GPU, without scalar indexing.
Version info
Details on Julia:
Julia Version 1.10.10
Commit 95f30e51f41 (2025-06-27 09:51 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: 288 × unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, generic)
Threads: 1 default, 0 interactive, 1 GC (on 288 virtual cores)
Environment:
JULIA_REVISE_POLL = 1
JULIA_DEPOT_PATH = /capstor/scratch/cscs/abussy/dftk_santis/.julia
Details on CUDA:
CUDA toolchain:
- runtime 12.8, artifact installation
- driver 550.54.15 for 13.0
- compiler 12.9
CUDA libraries:
- CUBLAS: 12.8.4
- CURAND: 10.3.9
- CUFFT: 11.3.3
- CUSOLVER: 11.7.3
- CUSPARSE: 12.5.8
- CUPTI: 2025.1.1 (API 12.8.0)
- NVML: 12.0.0+550.54.15
Julia packages:
- CUDA: 5.9.3
- CUDA_Driver_jll: 13.0.2+0
- CUDA_Compiler_jll: 0.3.0+0
- CUDA_Runtime_jll: 0.19.2+0
Toolchain:
- Julia: 1.10.10
- LLVM: 15.0.7
Preferences:
- CUDA_Runtime_jll.version: 12.8
4 devices:
0: NVIDIA GH200 120GB (sm_90, 69.740 GiB / 95.577 GiB available)
1: NVIDIA GH200 120GB (sm_90, 70.279 GiB / 95.577 GiB available)
2: NVIDIA GH200 120GB (sm_90, 70.374 GiB / 95.577 GiB available)
3: NVIDIA GH200 120GB (sm_90, 70.349 GiB / 95.577 GiB available)
Additional context
I use the following workaround in my code:
function LinearAlgebra.norm(A::Hermitian{T, <:AbstractGPUArray}) where {T}
upper_triangle = sum(abs2, triu(parent(A)))
diago = sum(abs2, diag(parent(A)))
sqrt(2upper_triangle - diago)
end
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working