GPU performance optimization

I observed that broadcast tend to allocate less memory than `map!`. E.g.,

```julia
julia> @time X .= .+(X,X); @time map!(+, X, X, X);
  0.000074 seconds (33 allocations: 608 bytes)
  0.000121 seconds (64 allocations: 2.078 KiB)
```

Also, we should cache the spmv buffer to reduce the matmul time
https://github.com/JuliaGPU/CUDA.jl/blob/1389800a26078df16cc689ed5138f0691185ce61/lib/cusparse/generic.jl#L181-L189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU performance optimization #398

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU performance optimization #398

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions