File tree Expand file tree Collapse file tree 1 file changed +7
-0
lines changed Expand file tree Collapse file tree 1 file changed +7
-0
lines changed Original file line number Diff line number Diff line change @@ -698,3 +698,10 @@ double each element in a fragment, you can simply use:
698
698
``` julia
699
699
frag = 2.0f0 .* frag
700
700
```
701
+
702
+ !!! note
703
+ The WMMA instructions don't take advantage of [ memory swizzling] ( https://leimao.github.io/blog/CUDA-Shared-Memory-Swizzling/ ) .
704
+ The custom load/store operations for WMMA don't allow the programmer to control * how* data is loaded,
705
+ so register bank conflicts can only be reduced, but not entirely eliminated. In general, using the PTX
706
+ instructions [ ` mma.sync ` ] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma )
707
+ and friends are preferred, as they give the programmer finer control over the memory access pattern.
You can’t perform that action at this time.
0 commit comments