|
12 | 12 | //! Threads are the fundamental element of GPU computing. Threads execute the same kernel
|
13 | 13 | //! at the same time, controlling their task by retrieving their corresponding global thread ID.
|
14 | 14 | //!
|
15 |
| -//! # Thread Blocks |
| 15 | +//! ## Thread Blocks |
| 16 | +//! |
| 17 | +//! The most important structure after threads. Thread blocks arrange threads into one-dimensional, |
| 18 | +//! two-dimensional, or three-dimensional blocks. The dimensionality of the thread block |
| 19 | +//! typically corresponds to the dimensionality of the data being worked with. The number of |
| 20 | +//! threads in the block is configurable. The maximum number of threads in a black is |
| 21 | +//! device-specific, but 1024 is a typical maximum on current GPUs. |
| 22 | +//! |
| 23 | +//! Thread blocks the primary elements for GPU scheduling. A thread block may be scheduled for |
| 24 | +//! execution on any of the GPUs available streaming multiprocessors. If a GPU does not have |
| 25 | +//! a streaming multiprocessor available to run the block, it will be queued for scheduling. Because |
| 26 | +//! thread blocks are the fundamental scheduling element, they are required to execute |
| 27 | +//! independently and in any order. |
| 28 | +//! |
| 29 | +//! Threads within a block can share data between each other via shared memory and barrier |
| 30 | +//! synchronization. |
| 31 | +//! |
| 32 | +//! The kernel can retrieve the index of a given thread within a block via the |
| 33 | +//! `thread_idx_x`, `thread_idx_y`, and `thread_idx_z` functions (depending on the dimensionality |
| 34 | +//! of the thread block). |
| 35 | +//! |
| 36 | +//! ## Grids |
| 37 | +//! |
| 38 | +//! Multiple thread blocks make up the grid, the highest level of the CUDA thread model. Like thread |
| 39 | +//! blocks, grids can arrange thread blocks into one-dimensional, two-dimensional, or |
| 40 | +//! three-dimensional grids. |
| 41 | +//! |
| 42 | +//! The kernel can retrieve the index of a given block within a grid via the |
| 43 | +//! `block_idx_x`, `block_idx_y`, and `block_idx_z` functions (depending on the dimensionality |
| 44 | +//! of the grid). Additionally, the dimensionality of the block can be retrieved via the |
| 45 | +//! `block_dim_x`, `block_dim_y`, and `block_dim_z` functions. These functions, along with the |
| 46 | +//! `thread_*` functions mentioned previously, can be used to identify portions of the data the |
| 47 | +//! kernel should operate on. |
16 | 48 | //!
|
17 |
| -//! The most important structure after threads, thread blocks arrange |
18 |
| -
|
19 |
| -// TODO: write some docs about the terms used in this module. |
20 |
| - |
21 | 49 | use cuda_std_macros::gpu_only;
|
22 | 50 | use glam::{UVec2, UVec3};
|
23 | 51 |
|
|
0 commit comments