Skip to content

Commit cba6d1c

Browse files
committed
Feat: start work on atomics, add atomic intrinsics
1 parent 391e8e4 commit cba6d1c

File tree

3 files changed

+1140
-0
lines changed

3 files changed

+1140
-0
lines changed

crates/cuda_std/src/atomic.rs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
//! Atomic Types for modification of numbers in multiple threads in a sound way.
2+
//!
3+
//! # Core Interop
4+
//!
5+
//! Every type in this module works on the CPU (targets outside of nvptx). However, [`core::sync::atomic`] types
6+
//! do **NOT** work on the GPU currently. This is because CUDA atomics have some fundamental differences
7+
//! that make representing them fully with existing core types impossible:
8+
//!
9+
//! - CUDA has block-scoped, device-scoped, and system-scoped atomics, core does not make such a distinction (obviously).
10+
//! - CUDA trivially supports relaxed/acquire/release orderings on most architectures, but SeqCst and other orderings use
11+
//! specialized instructions on compute capabilities 7.x+, but can be emulated with fences/membars on 7.x >. This makes it difficult
12+
//! to hide away such details in the codegen.
13+
//! - CUDA has hardware atomic floats, core does not.
14+
//! - CUDA makes the distinction between "fetch, do operation, read" (`atom`) and "do operation" (`red`).
15+
//! - Core thinks CUDA supports 8 and 16 bit atomics, this is a bug in the nvptx target but it is nevertheless an annoying detail
16+
//! to silently trap on.
17+
//!
18+
//! Therefore we chose to go with the approach of implementing all atomics inside cuda_std. In the future, we may support
19+
//! a subset of core atomics, but for now, you will have to use cuda_std atomics.
20+
21+
pub mod intrinsics;

0 commit comments

Comments
 (0)