|
2 | 2 |
|
3 | 3 | use crate::gpu_only; |
4 | 4 |
|
5 | | -/// Statically allocates a buffer large enough for `len` elements of `array_type`, yielding |
6 | | -/// a `*mut array_type` that points to uninitialized shared memory. `len` must be a constant expression. |
| 5 | +/// Statically allocates a buffer large enough for `len` elements of `array_type`, |
| 6 | +/// yielding a `*mut array_type` that points to uninitialized shared memory. `len` must |
| 7 | +/// be a constant expression. |
7 | 8 | /// |
8 | | -/// Note that this allocates the memory __statically__, it expands to a static in the `shared` address space. |
9 | | -/// Therefore, calling this macro multiple times in a loop will always yield the same data. However, separate |
10 | | -/// invocations of the macro will yield different buffers. |
| 9 | +/// Note that this allocates the memory __statically__, it expands to a static in the |
| 10 | +/// `shared` address space. Therefore, calling this macro multiple times in a loop will |
| 11 | +/// always yield the same data. However, separate invocations of the macro will yield |
| 12 | +/// different buffers. |
11 | 13 | /// |
12 | | -/// The data is uninitialized by default, therefore, you must be careful to not read the data before it is written to. |
13 | | -/// The semantics of what "uninitialized" actually means on the GPU (i.e. if it yields unknown data or if it is UB to read it whatsoever) |
14 | | -/// are not well known, so even if the type is valid for any backing memory, make sure to not read uninitialized data. |
| 14 | +/// The data is uninitialized by default, therefore, you must be careful to not read the |
| 15 | +/// data before it is written to. The semantics of what "uninitialized" actually means |
| 16 | +/// on the GPU (i.e. if it yields unknown data or if it is UB to read it whatsoever) are |
| 17 | +/// not well known, so even if the type is valid for any backing memory, make sure to |
| 18 | +/// not read uninitialized data. |
15 | 19 | /// |
16 | 20 | /// # Safety |
17 | 21 | /// |
18 | | -/// Shared memory usage is fundamentally extremely unsafe and impossible to statically prove, therefore |
19 | | -/// the burden of correctness is on the user. Some of the things you must ensure in your usage of |
20 | | -/// shared memory are: |
21 | | -/// - Shared memory is only shared across __thread blocks__, not the entire device, therefore it is |
22 | | -/// unsound to try and rely on sharing data across more than one block. |
23 | | -/// - You must write to the shared buffer before reading from it as the data is uninitialized by default. |
24 | | -/// - [`thread::sync_threads`](crate::thread::sync_threads) must be called before relying on the results of other |
25 | | -/// threads, this ensures every thread has reached that point before going on. For example, reading another thread's |
26 | | -/// data after writing to the buffer. |
27 | | -/// - No access may be out of bounds, this usually means making sure the amount of threads and their dimensions are correct. |
| 22 | +/// Shared memory usage is fundamentally extremely unsafe and impossible to statically |
| 23 | +/// prove, therefore the burden of correctness is on the user. Some of the things you |
| 24 | +/// must ensure in your usage of shared memory are: |
28 | 25 | /// |
29 | | -/// It is suggested to run your executable in `cuda-memcheck` to make sure usages of shared memory are right. |
| 26 | +/// - Shared memory is only shared across __thread blocks__, not the entire device, |
| 27 | +/// therefore it is unsound to try and rely on sharing data across more than one |
| 28 | +/// block. |
| 29 | +/// - You must write to the shared buffer before reading from it as the data is |
| 30 | +/// uninitialized by default. |
| 31 | +/// - [`thread::sync_threads`](crate::thread::sync_threads) must be called before |
| 32 | +/// relying on the results of other threads, this ensures every thread has reached |
| 33 | +/// that point before going on. For example, reading another thread's data after |
| 34 | +/// writing to the buffer. |
| 35 | +/// - No access may be out of bounds, this usually means making sure the amount of |
| 36 | +/// threads and their dimensions are correct. |
| 37 | +/// |
| 38 | +/// It is suggested to run your executable in `cuda-memcheck` to make sure usages of |
| 39 | +/// shared memory are right. |
30 | 40 | /// |
31 | 41 | /// # Examples |
32 | 42 | /// |
|
0 commit comments