diff --git a/gpu-glossary/perf/light-green-mem-coal.svg b/gpu-glossary/perf/light-green-mem-coal.svg new file mode 100644 index 0000000..367afdb --- /dev/null +++ b/gpu-glossary/perf/light-green-mem-coal.svg @@ -0,0 +1,219 @@ + + + + + +Stride 1 +CUDA Threads + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +128-byte Physical DRAM Burst + + +Coalesced Access + + + +Stride 16 +CUDA Threads + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Fragmented DRAM Fetch + + diff --git a/gpu-glossary/perf/light-mem-coal.svg b/gpu-glossary/perf/light-mem-coal.svg new file mode 100644 index 0000000..1e7c78f --- /dev/null +++ b/gpu-glossary/perf/light-mem-coal.svg @@ -0,0 +1,219 @@ + + + + + +Stride 1 +CUDA Threads + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +128-byte Physical DRAM Burst + + +Coalesced Access + + + +Stride 16 +CUDA Threads + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Fragmented DRAM Fetch + + diff --git a/gpu-glossary/perf/memory-coalescing.md b/gpu-glossary/perf/memory-coalescing.md index 05b17a2..62d57d6 100644 --- a/gpu-glossary/perf/memory-coalescing.md +++ b/gpu-glossary/perf/memory-coalescing.md @@ -54,6 +54,8 @@ bytes – not coincidentally, enough for each of the 32 [threads](/gpu-glossary/device-software/thread) in a [warp](/gpu-glossary/device-software/warp) to load one 32 bit float. +![With stride 1 (left), all 32 [threads](/gpu-glossary/device-software/thread) in a [warp](/gpu-glossary/device-software/warp) access contiguous memory addresses, serviced by a single 128-byte DRAM burst. With stride 16 (right), each thread's access is spread across memory, requiring many separate DRAM fetches.](./light-mem-coal.svg) + To demonstrate the performance impact of memory coalescing, let's consider the following [kernel](/gpu-glossary/device-software/kernel), which reads values from an array with a variable `stride`, or spacing between accessed elements. diff --git a/gpu-glossary/perf/terminal-mem-coal.svg b/gpu-glossary/perf/terminal-mem-coal.svg new file mode 100644 index 0000000..9631a73 --- /dev/null +++ b/gpu-glossary/perf/terminal-mem-coal.svg @@ -0,0 +1,219 @@ + + + + + +Stride 1 +CUDA Threads + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +128-byte Physical DRAM Burst + + +Coalesced Access + + + +Stride 16 +CUDA Threads + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Fragmented DRAM Fetch + +