Norm of Hermitian matrices trigger scalar indexing error

**Describe the bug**
Calculating the norm of an Hermitian typed matrix triggers scalar indexing error.


**To reproduce**

The Minimal Working Example (MWE) for this bug:

```julia
using CUDA
using LinearAlgebra

X = CuArray(randn(ComplexF64, 100, 30))
M = Hermitian(X' * X)

@show norm(M)
```

<details><summary>Manifest.toml</summary>
<p>

```
[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Compiler_jll", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "GPUToolbox", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics", "demumble_jll"]
git-tree-sha1 = "756f031a1ef3137f497ee73ed595e4acf65d753f"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.9.3"

    [deps.CUDA.extensions]
    ChainRulesCoreExt = "ChainRulesCore"
    EnzymeCoreExt = "EnzymeCore"
    SparseMatricesCSRExt = "SparseMatricesCSR"
    SpecialFunctionsExt = "SpecialFunctions"

    [deps.CUDA.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
    SparseMatricesCSR = "a0a7dd2c-ebf4-11e9-1f05-cf50bc540ca1"
    SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"

[[deps.CUDA_Compiler_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "b63428872a0f60d87832f5899369837cd930b76d"
uuid = "d1e2174e-dfdc-576e-b43e-73b79eb1aca8"
version = "0.3.0+0"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "2023be0b10c56d259ea84a94dbfc021aa452f2c6"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "13.0.2+0"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "f9a521f52d236fe49f1028d69e549e7f2644bb72"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "1.0.0"

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Compiler_jll", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "GPUToolbox", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics", "demumble_jll"]
git-tree-sha1 = "756f031a1ef3137f497ee73ed595e4acf65d753f"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.9.3"

    [deps.CUDA.extensions]
    ChainRulesCoreExt = "ChainRulesCore"
    EnzymeCoreExt = "EnzymeCore"
    SparseMatricesCSRExt = "SparseMatricesCSR"
    SpecialFunctionsExt = "SpecialFunctions"

    [deps.CUDA.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
    SparseMatricesCSR = "a0a7dd2c-ebf4-11e9-1f05-cf50bc540ca1"
    SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"

[[deps.CUDA_Compiler_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "b63428872a0f60d87832f5899369837cd930b76d"
uuid = "d1e2174e-dfdc-576e-b43e-73b79eb1aca8"
version = "0.3.0+0"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "2023be0b10c56d259ea84a94dbfc021aa452f2c6"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "13.0.2+0"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "f9a521f52d236fe49f1028d69e549e7f2644bb72"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "1.0.0"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "KernelAbstractions", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "ScopedValues", "Serialization", "Statistics"]
git-tree-sha1 = "8ddb438e956891a63a5367d7fab61550fc720026"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7" 
version = "11.2.6"
    
    [deps.GPUArrays.extensions]
    JLD2Ext = "JLD2"
    
    [deps.GPUArrays.weakdeps]
    JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "83cf05ab16a73219e5f6bd1bdfa9848fa24ac627"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.2.0"
    
[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "PrecompileTools", "Preferences", "Scratch", "Serialization", "TOML", "Tracy", "UUIDs"] 
git-tree-sha1 = "9a8b92a457f55165923fcfe48997b7b93b712fca"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "1.7.2"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Unicode"]
git-tree-sha1 = "ce8614210409eaa54ed5968f4b50aa96da7ae543"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "9.4.4"
weakdeps = ["BFloat16s"]

    [deps.LLVM.extensions]
    BFloat16sExt = "BFloat16s"
```

</p>
</details>


**Expected behavior**

An efficient norm calculation on the GPU, without scalar indexing.


**Version info**

Details on Julia:

```
Julia Version 1.10.10
Commit 95f30e51f41 (2025-06-27 09:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 288 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, generic)
Threads: 1 default, 0 interactive, 1 GC (on 288 virtual cores)
Environment:
  JULIA_REVISE_POLL = 1
  JULIA_DEPOT_PATH = /capstor/scratch/cscs/abussy/dftk_santis/.julia
```

Details on CUDA:

```
CUDA toolchain: 
- runtime 12.8, artifact installation
- driver 550.54.15 for 13.0
- compiler 12.9

CUDA libraries: 
- CUBLAS: 12.8.4
- CURAND: 10.3.9
- CUFFT: 11.3.3
- CUSOLVER: 11.7.3
- CUSPARSE: 12.5.8
- CUPTI: 2025.1.1 (API 12.8.0)
- NVML: 12.0.0+550.54.15

Julia packages: 
- CUDA: 5.9.3
- CUDA_Driver_jll: 13.0.2+0
- CUDA_Compiler_jll: 0.3.0+0
- CUDA_Runtime_jll: 0.19.2+0

Toolchain:
- Julia: 1.10.10
- LLVM: 15.0.7

Preferences:
- CUDA_Runtime_jll.version: 12.8

4 devices:
  0: NVIDIA GH200 120GB (sm_90, 69.740 GiB / 95.577 GiB available)
  1: NVIDIA GH200 120GB (sm_90, 70.279 GiB / 95.577 GiB available)
  2: NVIDIA GH200 120GB (sm_90, 70.374 GiB / 95.577 GiB available)
  3: NVIDIA GH200 120GB (sm_90, 70.349 GiB / 95.577 GiB available)
```


**Additional context**

I use the following workaround in my code:
```
function LinearAlgebra.norm(A::Hermitian{T, <:AbstractGPUArray}) where {T}
    upper_triangle = sum(abs2, triu(parent(A)))
    diago = sum(abs2, diag(parent(A)))
    sqrt(2upper_triangle - diago)
end
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Norm of Hermitian matrices trigger scalar indexing error #2965

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Norm of Hermitian matrices trigger scalar indexing error #2965

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions