Skip to content

randn gives code 700, ERROR_ILLEGAL_ADDRESS, but not rand #3028

@mattsignorelli

Description

@mattsignorelli

Describe the bug

When randn is called inside a kernel, we get this above error. But switching the exact same code to use rand instead of randn works just fine.

To reproduce

I am struggling to reproduce this error in a simple kernel, the bug only appears deeply within our code. However it is immediately solved by changing the randn calls to rand, so unfortunately, this is the best MWE I can provide. You'll need to have the attached file

import Pkg
Pkg.add(Pkg.PackageSpec(;name="BeamTracking", version="0.5.2")) 
Pkg.add("Beamlines")

using CUDA

b0 = Bunch(CUDA.rand(10,6))

ring = include("esr-v6.3.1-tapered.jl") # File in attached ZIP 
foreach(x->x.tracking_method=Yoshida(radiation_damping_on=true, radiation_fluctuations_on=true), ring.line)
track!(b0, ring)

Archive 2.zip

Note that we use KernelAbstractions.jl. The exact call causing the problem is here: https://github.com/bmad-sim/BeamTracking.jl/blob/744e3a2cd9defe1e17ee38957caf35cc9c6029ea/src/kernels/radiation.jl#L162-L163 and the gaussian_random implementation is here: https://github.com/bmad-sim/BeamTracking.jl/blob/744e3a2cd9defe1e17ee38957caf35cc9c6029ea/src/utils/math_simd.jl#L161-L163 . As I said, simply changing the randn() call to rand() in this function fixes the error.

Manifest.toml

MANIFEST IS IN ATTACHED ZIP FILE (too many characters for Github to put here)

Expected behavior

randn should also work, just like rand.

Version info

Details on Julia:

Julia Version 1.10.9
Commit 5595d20a287 (2025-03-10 12:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)
Environment:
  LD_LIBRARY_PATH = /global/common/software/nersc9/darshan/default/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/math_libs/12.4/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/extras/Debugger/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/nvvm/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/lib64:/opt/cray/pe/papi/7.1.0.2/lib64:/opt/cray/libfabric/1.22.0/lib64:/opt/cray/libfabric/default/lib64

Details on CUDA:

CUDA toolchain: 
- runtime 13.0, artifact installation
- driver 550.163.1 for 13.1
- compiler 13.1

CUDA libraries: 
- CUBLAS: 13.1.0
- CURAND: 10.4.0
12.0.0T: 
- CUSOLVER: 12.0.4
- CUSPARSE: 12.6.3
- CUPTI: 2025.3.1 (API 13.0.1)
- NVML: 12.0.0+550.163.1

Julia packages: 
- CUDA: 5.9.6
- GPUArrays: 11.3.4
- GPUCompiler: 1.8.2
- KernelAbstractions: 0.9.39
- CUDA_Driver_jll: 13.1.0+2
- CUDA_Compiler_jll: 0.4.1+1
- CUDA_Runtime_jll: 0.19.2+0

Toolchain:
- Julia: 1.10.9
- LLVM: 15.0.7

1 device:
  0: NVIDIA A100-SXM4-40GB (sm_80, 38.918 GiB / 40.000 GiB available)

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions