Skip to content

Commit ce79eef

Browse files
james-mchughLegNeato
authored andcommitted
docs: added sections to thread.rs on thread blocks and grids
Signed-off-by: James Riley McHugh <[email protected]>
1 parent 471e3c6 commit ce79eef

File tree

1 file changed

+33
-5
lines changed

1 file changed

+33
-5
lines changed

crates/cuda_std/src/thread.rs

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,40 @@
1212
//! Threads are the fundamental element of GPU computing. Threads execute the same kernel
1313
//! at the same time, controlling their task by retrieving their corresponding global thread ID.
1414
//!
15-
//! # Thread Blocks
15+
//! ## Thread Blocks
16+
//!
17+
//! The most important structure after threads. Thread blocks arrange threads into one-dimensional,
18+
//! two-dimensional, or three-dimensional blocks. The dimensionality of the thread block
19+
//! typically corresponds to the dimensionality of the data being worked with. The number of
20+
//! threads in the block is configurable. The maximum number of threads in a black is
21+
//! device-specific, but 1024 is a typical maximum on current GPUs.
22+
//!
23+
//! Thread blocks the primary elements for GPU scheduling. A thread block may be scheduled for
24+
//! execution on any of the GPUs available streaming multiprocessors. If a GPU does not have
25+
//! a streaming multiprocessor available to run the block, it will be queued for scheduling. Because
26+
//! thread blocks are the fundamental scheduling element, they are required to execute
27+
//! independently and in any order.
28+
//!
29+
//! Threads within a block can share data between each other via shared memory and barrier
30+
//! synchronization.
31+
//!
32+
//! The kernel can retrieve the index of a given thread within a block via the
33+
//! `thread_idx_x`, `thread_idx_y`, and `thread_idx_z` functions (depending on the dimensionality
34+
//! of the thread block).
35+
//!
36+
//! ## Grids
37+
//!
38+
//! Multiple thread blocks make up the grid, the highest level of the CUDA thread model. Like thread
39+
//! blocks, grids can arrange thread blocks into one-dimensional, two-dimensional, or
40+
//! three-dimensional grids.
41+
//!
42+
//! The kernel can retrieve the index of a given block within a grid via the
43+
//! `block_idx_x`, `block_idx_y`, and `block_idx_z` functions (depending on the dimensionality
44+
//! of the grid). Additionally, the dimensionality of the block can be retrieved via the
45+
//! `block_dim_x`, `block_dim_y`, and `block_dim_z` functions. These functions, along with the
46+
//! `thread_*` functions mentioned previously, can be used to identify portions of the data the
47+
//! kernel should operate on.
1648
//!
17-
//! The most important structure after threads, thread blocks arrange
18-
19-
// TODO: write some docs about the terms used in this module.
20-
2149
use cuda_std_macros::gpu_only;
2250
use glam::{UVec2, UVec3};
2351

0 commit comments

Comments
 (0)