Skip to content

Commit 8013378

Browse files
committed
Feat: sort changelog and implement DeviceCopy derive in cust_core
1 parent 5f5e451 commit 8013378

29 files changed

+281
-233
lines changed

crates/cust/CHANGELOG.md

Lines changed: 54 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ Notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### TLDR
8+
9+
This release is gigantic, so here are the main things you need to worry about:
10+
11+
`Context::create_and_push(FLAGS, device)` -> `Context::new(device)`.
12+
`Module::from_str(PTX)` -> `Module::from_ptx(PTX, &[])`.
13+
714
### Context handling overhaul
815

916
The way that contexts are handled in cust has been completely overhauled, it now
@@ -17,63 +24,79 @@ overall simplifying the context handling APIs. This does mean that the API chang
1724
The old context handling is fully present in `cust::context::legacy` for anyone who needs it for specific reasons. If you use `quick_init` you don't need to worry about
1825
any breaking changes, the API is the same.
1926

20-
- `Stream::add_callback` now internally uses `cuLaunchHostFunc` anticipating the deprecation and removal of `cuStreamAddCallback` per the driver docs. This does however mean that the function no longer takes a device status as a parameter and does not execute on context error.
27+
### `cust_core`
28+
29+
`DeviceCopy` has now been split into its own crate, `cust_core`. The crate is `#![no_std]`, which allows you to
30+
pull in `cust_core` in GPU crates for deriving `DeviceCopy` without cfg shenanigans.
31+
32+
### Removed
33+
34+
- Deleted `DeviceBox::wrap`, use `DeviceBox::from_raw`.
35+
- Deleted `DeviceSlice::as_ptr` and `DeviceSlice::as_mut_ptr`. Use `DeviceSlice::as_device_ptr` then `DevicePointer::as_(mut)_ptr`.
36+
- Deleted `DeviceSlice::chunks` and consequently `DeviceChunks`.
37+
- Deleted `DeviceSlice::chunks_mut` and consequently `DeviceChunksMut`.
38+
- Deleted `DeviceSlice::from_slice` and `DeviceSlice::from_slice_mut` because it was unsound.
39+
- Deleted `DevicePointer::as_raw_mut` (use `DevicePointer::as_mut_ptr`).
40+
- Deleted `DevicePointer::wrap` (use `DevicePointer::from_raw`).
41+
- `DeviceSlice` no longer implements `Index` and `IndexMut`, switching away from `[T]` made this impossible to implement.
42+
Instead you can now use `DeviceSlice::index` which behaves the same.
43+
- `vek` is no longer re-exported.
44+
45+
### Deprecated
46+
47+
- Deprecated `Module::from_str`, use `Module::from_ptx` and pass `&[]` for options.
48+
`ModuleJitOption::MaxRegisters` does not seem to work currently, but NVIDIA is looking into it.
49+
- Deprecated `Module::load_from_string`, use `Module::from_ptx_cstr`.
50+
51+
### Added
52+
2153
- Added `cust::memory::LockedBox`, same as `LockedBuffer` except for single elements.
2254
- Added `cust::memory::cuda_malloc_async`.
2355
- Added `cust::memory::cuda_free_async`.
2456
- Added `impl AsyncCopyDestination<LockedBox<T>> for DeviceBox<T>` for async HtoD memcpy.
2557
- Added the `bytemuck` feature which is enabled by default.
26-
- `zeroed` functions on `DeviceBox` and others are no longer unsafe and instead now require `T: Zeroable`. The functions are only available with the `bytemuck` feature.
2758
- Added `zeroed_async` to `DeviceBox`.
2859
- Added `drop_async` to `DeviceBox`.
2960
- Added `new_async` to `DeviceBox`.
30-
- `Linker::complete` now only returns the built cubin, and not the cubin and a duration.
31-
- `Stream`, `Module`, `Linker`, `Function`, `Event`, `UnifiedBox`, `ArrayObject`, `LockedBuffer`, `LockedBox`, `DeviceSlice`, `DeviceBuffer`, and `DeviceBox` all now impl `Send` and `Sync`, this makes
32-
it much easier to write multigpu code. The CUDA API is fully thread-safe except for graph objects.
33-
- Features such as `vek` for implementing DeviceCopy are now `impl_cratename`, e.g. `impl_vek`, `impl_half`, etc.
34-
- `DevicePointer::as_raw` now returns a `CUdeviceptr` instead of a `*const T`.
3561
- Added `DevicePointer::as_ptr` and `DevicePointer::as_mut_ptr` for returning `*const T` or `*mut T`.
3662
- Added mint integration behind `impl_mint`.
3763
- Added half integration behind `impl_half`.
3864
- Added glam integration behind `impl_glam`.
39-
- `num-complex` integration is now behind `impl_num_complex`, not `num-complex`.
4065
- Added experimental linux external memory import APIs through `cust::external::ExternalMemory`.
41-
- `vek` is no longer re-exported.
42-
- `DeviceBox` now requires `T: DeviceCopy` (previously it didn't but almost all its methods did).
43-
- `DeviceBox::from_raw` now takes a `CUdeviceptr` instead of a `*mut T`.
44-
- `DeviceBox::as_device_ptr` now requires `&self` instead of `&mut self`.
45-
- Deleted `DeviceBox::wrap`, use `DeviceBox::from_raw`.
46-
- `DeviceBuffer` now requires `T: DeviceCopy`.
47-
- `DeviceBuffer` is now `repr(C)` and is represented by a `DevicePointer<T>` and a `usize`.
4866
- Added `DeviceBuffer::as_slice`.
49-
- `DeviceSlice` now requires `T: DeviceCopy`.
50-
- `DeviceSlice` is now represented as a `DevicePointer<T>` and a `usize` (and is repr(C)) instead of `[T]` which was definitely unsound.
51-
- `DeviceSlice::as_ptr` and `DeviceSlice::as_ptr_mut` now both return a `DevicePointer<T>`.
52-
- Deleted `DeviceSlice::as_ptr` and `DeviceSlice::as_mut_ptr`. Use `DeviceSlice::as_device_ptr` then `DevicePointer::as_(mut)_ptr`.
53-
- Deleted `DeviceSlice::chunks` and consequently `DeviceChunks`.
54-
- Deleted `DeviceSlice::chunks_mut` and consequently `DeviceChunksMut`.
55-
- Deleted `DeviceSlice::from_slice` and `DeviceSlice::from_slice_mut` because it was unsound.
56-
- `DeviceSlice` no longer implements `Index` and `IndexMut`, switching away from `[T]` made this impossible to implement.
57-
Instead you can now use `DeviceSlice::index` which behaves the same.
58-
- `DeviceSlice` is now `Clone` and `Copy`.
5967
- Added `DeviceVariable`, a simple wrapper around `DeviceBox<T>` and `T` which allows easy management of a CPU and GPU version of a type.
6068
- Added `DeviceMemory`, a trait describing any region of GPU memory that can be described with a pointer + a length.
6169
- Added `memcpy_htod`, a wrapper around `cuMemcpyHtoD_v2`.
6270
- Added `mem_get_info` to query the amount of free and total memory.
63-
- `DevicePointer::as_raw` now returns a `CUdeviceptr`, not a `*const T` (use `DevicePointer::as_ptr`).
64-
- Deleted `DevicePointer::as_raw_mut` (use `DevicePointer::as_mut_ptr`).
6571
- Added `DevicePointer::as_ptr` and `DevicePointer::as_mut_ptr` for `*const T` and `*mut T`.
6672
- Added `DevicePointer::from_raw` for `CUdeviceptr -> DevicePointer<T>` with a safe function.
67-
- Deleted `DevicePointer::wrap` (use `DevicePointer::from_raw`).
6873
- Added dependency on `cust_core` for `DeviceCopy`.
6974
- Added dependency on `goblin` for verifying cubins and fatbins (impossible to implement safe module loading without it).
70-
- Deprecated `Module::from_str`, use `Module::from_ptx` and pass `&[]` for options.
7175
- Added `ModuleJitOption`, `JitFallback`, `JitTarget`, and `OptLevel` for specifying options when loading a module. Note that
72-
`ModuleJitOption::MaxRegisters` does not seem to work currently, but NVIDIA is looking into it.
7376
- Added `Module::from_fatbin` and `Module::from_fatbin_unchecked`.
7477
- Added `Module::from_cubin` and `Module::from_cubin_unchecked`.
7578
- Added `Module::from_ptr` and `Module::from_ptx_cstr`.
76-
- Deprecated `Module::load_from_string`, use `Module::from_ptx_cstr`.
79+
- `Stream`, `Module`, `Linker`, `Function`, `Event`, `UnifiedBox`, `ArrayObject`, `LockedBuffer`, `LockedBox`, `DeviceSlice`, `DeviceBuffer`, and `DeviceBox` all now impl `Send` and `Sync`, this makes
80+
it much easier to write multigpu code. The CUDA API is fully thread-safe except for graph objects.
81+
82+
### Changed
83+
84+
- `zeroed` functions on `DeviceBox` and others are no longer unsafe and instead now require `T: Zeroable`. The functions are only available with the `bytemuck` feature.
85+
- `Stream::add_callback` now internally uses `cuLaunchHostFunc` anticipating the deprecation and removal of `cuStreamAddCallback` per the driver docs. This does however mean that the function no longer takes a device status as a parameter and does not execute on context error.
86+
- `Linker::complete` now only returns the built cubin, and not the cubin and a duration.
87+
- Features such as `vek` for implementing DeviceCopy are now `impl_cratename`, e.g. `impl_vek`, `impl_half`, etc.
88+
- `DevicePointer::as_raw` now returns a `CUdeviceptr` instead of a `*const T`.
89+
- `num-complex` integration is now behind `impl_num_complex`, not `num-complex`.
90+
- `DeviceBox` now requires `T: DeviceCopy` (previously it didn't but almost all its methods did).
91+
- `DeviceBox::from_raw` now takes a `CUdeviceptr` instead of a `*mut T`.
92+
- `DeviceBox::as_device_ptr` now requires `&self` instead of `&mut self`.
93+
- `DeviceBuffer` now requires `T: DeviceCopy`.
94+
- `DeviceBuffer` is now `repr(C)` and is represented by a `DevicePointer<T>` and a `usize`.
95+
- `DeviceSlice` now requires `T: DeviceCopy`.
96+
- `DeviceSlice` is now represented as a `DevicePointer<T>` and a `usize` (and is repr(C)) instead of `[T]` which was definitely unsound.
97+
- `DeviceSlice::as_ptr` and `DeviceSlice::as_ptr_mut` now both return a `DevicePointer<T>`.
98+
- `DeviceSlice` is now `Clone` and `Copy`.
99+
- `DevicePointer::as_raw` now returns a `CUdeviceptr`, not a `*const T` (use `DevicePointer::as_ptr`).
77100

78101
## 0.2.2 - 12/5/21
79102

crates/cust/src/lib.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,9 @@ mod surface;
7575
mod texture;
7676
pub mod util;
7777

78-
pub use cust_raw as sys;
79-
78+
pub use cust_core;
8079
pub use cust_derive::DeviceCopy;
80+
pub use cust_raw as sys;
8181

8282
use crate::context::{Context, ContextFlags};
8383
use crate::device::Device;

crates/cust/src/memory/mod.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,8 @@ pub use self::unified::*;
8989

9090
use crate::error::*;
9191

92-
pub use cust_core::DeviceCopy;
92+
pub use crate::DeviceCopy;
93+
pub use cust_core::_hidden::DeviceCopy;
9394

9495
use std::ffi::c_void;
9596

crates/cust_core/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ glam = { version = "0.20", features=["cuda", "libm"], default-features=false, op
99
mint = { version = "^0.5", optional = true }
1010
half = { version = "1.8", optional = true }
1111
num-complex = { version = "0.4", optional = true }
12+
cust_derive = { path = "../cust_derive", version = "0.1" }
1213

1314
[features]
1415
default = ["vek", "glam", "mint"]

0 commit comments

Comments
 (0)