You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The way that contexts are handled in cust has been completely overhauled, it now
@@ -17,63 +24,79 @@ overall simplifying the context handling APIs. This does mean that the API chang
17
24
The old context handling is fully present in `cust::context::legacy` for anyone who needs it for specific reasons. If you use `quick_init` you don't need to worry about
18
25
any breaking changes, the API is the same.
19
26
20
-
-`Stream::add_callback` now internally uses `cuLaunchHostFunc` anticipating the deprecation and removal of `cuStreamAddCallback` per the driver docs. This does however mean that the function no longer takes a device status as a parameter and does not execute on context error.
27
+
### `cust_core`
28
+
29
+
`DeviceCopy` has now been split into its own crate, `cust_core`. The crate is `#![no_std]`, which allows you to
30
+
pull in `cust_core` in GPU crates for deriving `DeviceCopy` without cfg shenanigans.
31
+
32
+
### Removed
33
+
34
+
- Deleted `DeviceBox::wrap`, use `DeviceBox::from_raw`.
35
+
- Deleted `DeviceSlice::as_ptr` and `DeviceSlice::as_mut_ptr`. Use `DeviceSlice::as_device_ptr` then `DevicePointer::as_(mut)_ptr`.
36
+
- Deleted `DeviceSlice::chunks` and consequently `DeviceChunks`.
37
+
- Deleted `DeviceSlice::chunks_mut` and consequently `DeviceChunksMut`.
38
+
- Deleted `DeviceSlice::from_slice` and `DeviceSlice::from_slice_mut` because it was unsound.
-`DeviceSlice` no longer implements `Index` and `IndexMut`, switching away from `[T]` made this impossible to implement.
42
+
Instead you can now use `DeviceSlice::index` which behaves the same.
43
+
-`vek` is no longer re-exported.
44
+
45
+
### Deprecated
46
+
47
+
- Deprecated `Module::from_str`, use `Module::from_ptx` and pass `&[]` for options.
48
+
`ModuleJitOption::MaxRegisters` does not seem to work currently, but NVIDIA is looking into it.
49
+
- Deprecated `Module::load_from_string`, use `Module::from_ptx_cstr`.
50
+
51
+
### Added
52
+
21
53
- Added `cust::memory::LockedBox`, same as `LockedBuffer` except for single elements.
22
54
- Added `cust::memory::cuda_malloc_async`.
23
55
- Added `cust::memory::cuda_free_async`.
24
56
- Added `impl AsyncCopyDestination<LockedBox<T>> for DeviceBox<T>` for async HtoD memcpy.
25
57
- Added the `bytemuck` feature which is enabled by default.
26
-
-`zeroed` functions on `DeviceBox` and others are no longer unsafe and instead now require `T: Zeroable`. The functions are only available with the `bytemuck` feature.
27
58
- Added `zeroed_async` to `DeviceBox`.
28
59
- Added `drop_async` to `DeviceBox`.
29
60
- Added `new_async` to `DeviceBox`.
30
-
-`Linker::complete` now only returns the built cubin, and not the cubin and a duration.
31
-
-`Stream`, `Module`, `Linker`, `Function`, `Event`, `UnifiedBox`, `ArrayObject`, `LockedBuffer`, `LockedBox`, `DeviceSlice`, `DeviceBuffer`, and `DeviceBox` all now impl `Send` and `Sync`, this makes
32
-
it much easier to write multigpu code. The CUDA API is fully thread-safe except for graph objects.
33
-
- Features such as `vek` for implementing DeviceCopy are now `impl_cratename`, e.g. `impl_vek`, `impl_half`, etc.
34
-
-`DevicePointer::as_raw` now returns a `CUdeviceptr` instead of a `*const T`.
35
61
- Added `DevicePointer::as_ptr` and `DevicePointer::as_mut_ptr` for returning `*const T` or `*mut T`.
36
62
- Added mint integration behind `impl_mint`.
37
63
- Added half integration behind `impl_half`.
38
64
- Added glam integration behind `impl_glam`.
39
-
-`num-complex` integration is now behind `impl_num_complex`, not `num-complex`.
40
65
- Added experimental linux external memory import APIs through `cust::external::ExternalMemory`.
41
-
-`vek` is no longer re-exported.
42
-
-`DeviceBox` now requires `T: DeviceCopy` (previously it didn't but almost all its methods did).
43
-
-`DeviceBox::from_raw` now takes a `CUdeviceptr` instead of a `*mut T`.
44
-
-`DeviceBox::as_device_ptr` now requires `&self` instead of `&mut self`.
45
-
- Deleted `DeviceBox::wrap`, use `DeviceBox::from_raw`.
46
-
-`DeviceBuffer` now requires `T: DeviceCopy`.
47
-
-`DeviceBuffer` is now `repr(C)` and is represented by a `DevicePointer<T>` and a `usize`.
48
66
- Added `DeviceBuffer::as_slice`.
49
-
-`DeviceSlice` now requires `T: DeviceCopy`.
50
-
-`DeviceSlice` is now represented as a `DevicePointer<T>` and a `usize` (and is repr(C)) instead of `[T]` which was definitely unsound.
51
-
-`DeviceSlice::as_ptr` and `DeviceSlice::as_ptr_mut` now both return a `DevicePointer<T>`.
52
-
- Deleted `DeviceSlice::as_ptr` and `DeviceSlice::as_mut_ptr`. Use `DeviceSlice::as_device_ptr` then `DevicePointer::as_(mut)_ptr`.
53
-
- Deleted `DeviceSlice::chunks` and consequently `DeviceChunks`.
54
-
- Deleted `DeviceSlice::chunks_mut` and consequently `DeviceChunksMut`.
55
-
- Deleted `DeviceSlice::from_slice` and `DeviceSlice::from_slice_mut` because it was unsound.
56
-
-`DeviceSlice` no longer implements `Index` and `IndexMut`, switching away from `[T]` made this impossible to implement.
57
-
Instead you can now use `DeviceSlice::index` which behaves the same.
58
-
-`DeviceSlice` is now `Clone` and `Copy`.
59
67
- Added `DeviceVariable`, a simple wrapper around `DeviceBox<T>` and `T` which allows easy management of a CPU and GPU version of a type.
60
68
- Added `DeviceMemory`, a trait describing any region of GPU memory that can be described with a pointer + a length.
61
69
- Added `memcpy_htod`, a wrapper around `cuMemcpyHtoD_v2`.
62
70
- Added `mem_get_info` to query the amount of free and total memory.
63
-
-`DevicePointer::as_raw` now returns a `CUdeviceptr`, not a `*const T` (use `DevicePointer::as_ptr`).
- Added dependency on `cust_core` for `DeviceCopy`.
69
74
- Added dependency on `goblin` for verifying cubins and fatbins (impossible to implement safe module loading without it).
70
-
- Deprecated `Module::from_str`, use `Module::from_ptx` and pass `&[]` for options.
71
75
- Added `ModuleJitOption`, `JitFallback`, `JitTarget`, and `OptLevel` for specifying options when loading a module. Note that
72
-
`ModuleJitOption::MaxRegisters` does not seem to work currently, but NVIDIA is looking into it.
73
76
- Added `Module::from_fatbin` and `Module::from_fatbin_unchecked`.
74
77
- Added `Module::from_cubin` and `Module::from_cubin_unchecked`.
75
78
- Added `Module::from_ptr` and `Module::from_ptx_cstr`.
76
-
- Deprecated `Module::load_from_string`, use `Module::from_ptx_cstr`.
79
+
-`Stream`, `Module`, `Linker`, `Function`, `Event`, `UnifiedBox`, `ArrayObject`, `LockedBuffer`, `LockedBox`, `DeviceSlice`, `DeviceBuffer`, and `DeviceBox` all now impl `Send` and `Sync`, this makes
80
+
it much easier to write multigpu code. The CUDA API is fully thread-safe except for graph objects.
81
+
82
+
### Changed
83
+
84
+
-`zeroed` functions on `DeviceBox` and others are no longer unsafe and instead now require `T: Zeroable`. The functions are only available with the `bytemuck` feature.
85
+
-`Stream::add_callback` now internally uses `cuLaunchHostFunc` anticipating the deprecation and removal of `cuStreamAddCallback` per the driver docs. This does however mean that the function no longer takes a device status as a parameter and does not execute on context error.
86
+
-`Linker::complete` now only returns the built cubin, and not the cubin and a duration.
87
+
- Features such as `vek` for implementing DeviceCopy are now `impl_cratename`, e.g. `impl_vek`, `impl_half`, etc.
88
+
-`DevicePointer::as_raw` now returns a `CUdeviceptr` instead of a `*const T`.
89
+
-`num-complex` integration is now behind `impl_num_complex`, not `num-complex`.
90
+
-`DeviceBox` now requires `T: DeviceCopy` (previously it didn't but almost all its methods did).
91
+
-`DeviceBox::from_raw` now takes a `CUdeviceptr` instead of a `*mut T`.
92
+
-`DeviceBox::as_device_ptr` now requires `&self` instead of `&mut self`.
93
+
-`DeviceBuffer` now requires `T: DeviceCopy`.
94
+
-`DeviceBuffer` is now `repr(C)` and is represented by a `DevicePointer<T>` and a `usize`.
95
+
-`DeviceSlice` now requires `T: DeviceCopy`.
96
+
-`DeviceSlice` is now represented as a `DevicePointer<T>` and a `usize` (and is repr(C)) instead of `[T]` which was definitely unsound.
97
+
-`DeviceSlice::as_ptr` and `DeviceSlice::as_ptr_mut` now both return a `DevicePointer<T>`.
98
+
-`DeviceSlice` is now `Clone` and `Copy`.
99
+
-`DevicePointer::as_raw` now returns a `CUdeviceptr`, not a `*const T` (use `DevicePointer::as_ptr`).
0 commit comments