-
-
Notifications
You must be signed in to change notification settings - Fork 127
Open
FluxML/NNlibCUDA.jl
#52Description
using CUDA, NNlib
using Flux:gpu #this imports NNlibCUDA
using BenchmarkTools
CUDA.allowscalar(false)
a = CUDA.rand(200,3000,64)
idx = rand(1:64,500)
idx_gpu = idx |> gpu
@benchmark CUDA.@sync NNlib.gather(a, idx)
@benchmark CUDA.@sync NNlib.gather(a, idx_gpu)
BenchmarkTools.Trial: 129 samples with 1 evaluation.
Range (min … max): 12.639 ms … 49.596 ms ┊ GC (min … max): 0.00% … 1.82%
Time (median): 37.477 ms ┊ GC (median): 0.00%
Time (mean ± σ): 38.929 ms ± 4.242 ms ┊ GC (mean ± σ): 0.43% ± 0.75%
▄█
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁███▅▅▅▃▂▁▁▁▁▂▄▄▄▁▃▃▂▃ ▂
12.6 ms Histogram: frequency by time 48.8 ms <
Memory estimate: 1.36 KiB, allocs estimate: 24.
BenchmarkTools.Trial: 131 samples with 1 evaluation.
Range (min … max): 14.508 ms … 48.818 ms ┊ GC (min … max): 0.00% … 1.83%
Time (median): 37.054 ms ┊ GC (median): 0.00%
Time (mean ± σ): 38.231 ms ± 3.701 ms ┊ GC (mean ± σ): 0.42% ± 0.77%
▃█▂
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄███▃▃▂▁▁▂▁▃▃▃▄▄▂▃▁▁▁▂ ▂
14.5 ms Histogram: frequency by time 47.9 ms <
Memory estimate: 1.36 KiB, allocs estimate: 24.
There's no major difference as it appears to me. And moving the index to GPU could cause problems as mentioned in #411
Metadata
Metadata
Assignees
Labels
No labels