-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Add intrinsic for dynamic shared memory #146181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3300,6 +3300,44 @@ pub(crate) const fn miri_promise_symbolic_alignment(ptr: *const (), align: usize | |
| ) | ||
| } | ||
|
|
||
| /// Returns the pointer to dynamic group-shared memory on GPUs. | ||
| /// | ||
| /// Group-shared memory is a memory region that is shared between all threads in | ||
| /// the same work-group. It is faster to access then other memory but pointers do not | ||
| /// work outside the work-group where they were obtained. | ||
| /// Dynamic group-shared memory is in the group-shared memory region, the allocated | ||
| /// size is specified late, after compilation, when launching a gpu-kernel. | ||
| /// The size can differ between launches of a gpu-kernel, therefore it is called dynamic. | ||
| /// | ||
| /// The returned pointer is the start of the dynamic group-shared memory region. | ||
| /// All calls to `gpu_dynamic_groupshared_mem` in a work-group, independent of the | ||
| /// generic type, return the same address, so alias the same memory. | ||
| /// The returned pointer is aligned by at least the alignment of `T`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Speaking of safety requirements... how does one use this pointer? I get that it is aligned, but does it point to enough memory to store a Typically, intrinsic documentations should be detailed enough that I can read and write code using the intrinsic and know exactly whether the code is correct and what it will do in all circumstances. I don't know if there's any hope of achieving that with GPU intrinsics, but if not then we need to have a bit of a wider discussion -- we have had bad experience with just importing "externally defined" semantics into Rust without considering all the interactions (in general, it is not logically coherent to have semantics externally defined). The current docs would let me implement this intrinsic by just always returning 1024, and emitting a compile error if
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there some prior discussion of the design decision to determine the alignment by giving a type parameter? I could also be a const generic parameter, for instance. I don't have an opinion on the matter since I am an outsider to the GPU world, but as a compiler team member it'd be good to know if this is something you thought about for 5 minutes or whether there's some sort of larger design by a team that has a vision of how all these things will fit together.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is some discussion in #135516. I don’t mind either way, I thought (for 5 minutes ;)) that specifying the type of the returned pointer makes sense. For just a struct, static shared memory would make more sense, though we don’t support that yet (there’s some discussion in the tracking issue, but I think that’s more complicated to design and implement). |
||
| /// | ||
| /// # Safety | ||
| /// | ||
| /// The pointer is safe to dereference from the start (the returned pointer) up to the | ||
| /// size of dynamic group-shared memory that was specified when launching the current | ||
| /// gpu-kernel. | ||
| /// | ||
| /// The user must take care of synchronizing access to group-shared memory between | ||
| /// threads in a work-group. It is undefined behavior if one thread makes a non-atomic | ||
| /// write to a group-shared memory location and another thread simultaneously accesses | ||
| /// the same location. | ||
| /// | ||
| /// # Other APIs | ||
| /// | ||
| /// CUDA and HIP call this shared memory, shared between threads in a block. | ||
| /// OpenCL and SYCL call this local memory, shared between threads in a work-group. | ||
| /// GLSL calls this shared memory, shared between invocations in a work group. | ||
| /// DirectX calls this groupshared memory, shared between threads in a thread-group. | ||
| #[must_use = "returns a pointer that does nothing unless used"] | ||
| #[rustc_intrinsic] | ||
| #[rustc_nounwind] | ||
| #[unstable(feature = "gpu_dynamic_groupshared_mem", issue = "135513")] | ||
| #[cfg(any(target_arch = "amdgpu", target_arch = "nvptx64"))] | ||
| pub fn gpu_dynamic_groupshared_mem<T>() -> *mut T; | ||
|
|
||
| /// Copies the current location of arglist `src` to the arglist `dst`. | ||
| /// | ||
| /// FIXME: document safety requirements | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| // Checks that the GPU dynamic group-shared memory intrinsic works. | ||
|
|
||
| //@ revisions: amdgpu nvptx | ||
| //@ compile-flags: --crate-type=rlib | ||
| // | ||
| //@ [amdgpu] compile-flags: --target amdgcn-amd-amdhsa -Ctarget-cpu=gfx900 | ||
| //@ [amdgpu] needs-llvm-components: amdgpu | ||
| //@ [nvptx] compile-flags: --target nvptx64-nvidia-cuda | ||
| //@ [nvptx] needs-llvm-components: nvptx | ||
| //@ add-core-stubs | ||
| #![feature(intrinsics, no_core, rustc_attrs)] | ||
| #![no_core] | ||
|
|
||
| extern crate minicore; | ||
|
|
||
| #[rustc_intrinsic] | ||
| #[rustc_nounwind] | ||
| fn gpu_dynamic_groupshared_mem<T>() -> *mut T; | ||
|
|
||
| // CHECK: @gpu_dynamic_groupshared_mem = external addrspace(3) global [0 x i8], align 8 | ||
| // CHECK: ret ptr addrspacecast (ptr addrspace(3) @gpu_dynamic_groupshared_mem to ptr) | ||
| #[unsafe(no_mangle)] | ||
| pub fn fun() -> *mut i32 { | ||
| let res = gpu_dynamic_groupshared_mem::<i32>(); | ||
| gpu_dynamic_groupshared_mem::<f64>(); // Increase alignment to 8 | ||
| res | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.