Skip to content

Problem with RNN and CUDA. #2352

@anbjos

Description

@anbjos

I want to run a RNN (https://fluxml.ai/Flux.jl/stable/models/recurrence/) on the GPU, using the explicit (https://fluxml.ai/Flux.jl/stable/training/training/#Implicit-or-Explicit?) gradients.

This seems to work fine one CPU, but fails on the GPU with the message pointing to the last statement in the loss function, called from gradient:

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\libcuda.jl:27
 [2] isdone
   @ C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:171
 [5] device_synchronize()
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:169
 [6] top-level scope
   @ C:\Users\XXX\.julia\packages\CUDA\nIZkq\src\initialization.jl:210

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\libcuda.jl:27

Since I am dealing with a RNN, the history must be available for the gradient (https://en.wikipedia.org/wiki/Backpropagation_through_time). The recurrence documentation for Flux specifies that the input should be structured as a vector (over time steps) of vectors (over features).

Code below is simplified to expose the problem, the training loop is stripped away.

using Flux
using ChainRulesCore
using CUDA

dev=gpu # cpu is working fine

m = Chain(RNN(2 => 5), Dense(5 => 1)) |> dev

x = [rand(Float32, 2) for i = 1:3] |> dev;
y = [rand(Float32, 1) for i=1:1] |> dev

[m(xi) for xi in x]

using Flux.Losses: mse

function loss(m, x, y)
    @ignore_derivatives Flux.reset!(m)
    m(x[1]) # ignores the output but updates the hidden states
    m(x[2]) # ignore second output
    mse(m(x[3]),y[1])
end
  
loss(m, x, y)

grads = Flux.gradient(m, x, y) do m,x,y
    loss(m, x, y)
end

optim = Flux.setup(Flux.Adam(), m) 
Flux.update!(optim, m, grads[1])

Versions (in clean environment):

Julia 1.9.3
CUDA v5.1.0
ChainRulesCore v1.18.0
Flux v0.14.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions