unroll convolution

Inner loops on CUDA convolution code should run faster using a `#pragma unroll` statement.
- [`#pragma unroll N` in CUDA](https://devblogs.nvidia.com/parallelforall/new-compiler-features-cuda-8/)
- [`#pragma unroll N` in OpenCL](https://www.khronos.org/registry/OpenCL/extensions/nv/cl_nv_pragma_unroll.txt)
- Unavailable for OpenAcc, but try `-Munroll` flag with pgcc
- [`#pragma unroll N` for icc]()
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unroll convolution #134

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unroll convolution #134

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions