-
-
Notifications
You must be signed in to change notification settings - Fork 615
Closed
Description
I want to run a RNN (https://fluxml.ai/Flux.jl/stable/models/recurrence/) on the GPU, using the explicit (https://fluxml.ai/Flux.jl/stable/training/training/#Implicit-or-Explicit?) gradients.
This seems to work fine one CPU, but fails on the GPU with the message pointing to the last statement in the loss function, called from gradient:
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\libcuda.jl:27
[2] isdone
@ C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\stream.jl:111 [inlined]
[3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
@ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:79
[4] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:171
[5] device_synchronize()
@ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:169
[6] top-level scope
@ C:\Users\XXX\.julia\packages\CUDA\nIZkq\src\initialization.jl:210
caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\libcuda.jl:27
Since I am dealing with a RNN, the history must be available for the gradient (https://en.wikipedia.org/wiki/Backpropagation_through_time). The recurrence documentation for Flux specifies that the input should be structured as a vector (over time steps) of vectors (over features).
Code below is simplified to expose the problem, the training loop is stripped away.
using Flux
using ChainRulesCore
using CUDA
dev=gpu # cpu is working fine
m = Chain(RNN(2 => 5), Dense(5 => 1)) |> dev
x = [rand(Float32, 2) for i = 1:3] |> dev;
y = [rand(Float32, 1) for i=1:1] |> dev
[m(xi) for xi in x]
using Flux.Losses: mse
function loss(m, x, y)
@ignore_derivatives Flux.reset!(m)
m(x[1]) # ignores the output but updates the hidden states
m(x[2]) # ignore second output
mse(m(x[3]),y[1])
end
loss(m, x, y)
grads = Flux.gradient(m, x, y) do m,x,y
loss(m, x, y)
end
optim = Flux.setup(Flux.Adam(), m)
Flux.update!(optim, m, grads[1])
Versions (in clean environment):
Julia 1.9.3
CUDA v5.1.0
ChainRulesCore v1.18.0
Flux v0.14.6