-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Unsafe fields #3458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Unsafe fields #3458
Changes from 1 commit
95dd295
fa712be
46766ed
bb6e76a
aafd5f4
ed5ceb4
3f851ae
9908419
247bc41
2d7a512
000075f
1d0d8be
e8aebe2
9521d12
27628e0
f04701b
ed920e9
cf0516b
aa0c333
322c4a8
a5cc19f
b229496
f4426ea
f4e85fb
94c173a
462cee4
d1d7388
f5a82f8
d4cfd14
7769117
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
- Feature Name: `unsafe_fields` | ||
- Start Date: 2023-07-13 | ||
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
|
||
Fields may be declared `unsafe`. Unsafe fields may only be mutated (excluding interior mutability) | ||
jswrenn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
or initialized in an unsafe context. Reading the value of an unsafe field may occur in either safe | ||
or unsafe contexts. An unsafe field may be relied upon as a safety invariant in other unsafe code. | ||
|
||
# Motivation | ||
|
||
Emphasis on safety is a key strength of Rust. A major point here is that any code path that can | ||
result in undefined behavior must be explicitly marked with the `unsafe` keyword. However, the | ||
current system is insufficient. While Rust provides the `unsafe` keyword at the function level, | ||
there is currently no mechanism to mark fields as `unsafe`. | ||
|
||
For a real-world example, consider the `Vec` type in the standard library. It has a `len` field that | ||
is used to store the number of elements present. Setting this field is exposed publicly in the | ||
`Vec::set_len` method, which has safety requirements: | ||
|
||
- `new_len` must be less than or equal to `capacity()`. | ||
- The elements at `old_len..new_len` must be initialized. | ||
|
||
This field is safe to read, but unsafe to mutate or initialize due to the invariants. These | ||
invariants cannot be expressed in the type system, so they must be enforced manually. Failure to do | ||
so may result in undefined behavior elsewhere in `Vec`. | ||
|
||
By introducing unsafe fields, Rust can improve the situation where a field that is otherwise safe is | ||
used as a safety invariant. | ||
|
||
# Guide-level explanation | ||
|
||
Fields may be declared `unsafe`. Unsafe fields may only be initialized or accessed mutably in an | ||
unsafe context. Reading the value of an unsafe field may occur in either safe or unsafe contexts. An | ||
unsafe field may be relied upon as a safety invariant in other unsafe code. | ||
|
||
Here is an example to illustrate usage: | ||
|
||
```rust | ||
struct Foo { | ||
safe_field: u32, | ||
/// Safety: Value must be an odd number. | ||
unsafe unsafe_field: u32, | ||
} | ||
|
||
// Unsafe field initialization requires an `unsafe` block. | ||
// Safety: `unsafe_field` is odd. | ||
let mut foo = unsafe { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’m not sure I like that the entire struct expression is now inside an unsafe block (though I’m not sure what a better syntax would be). If the safety invariant requires the entire struct, this makes sense. However, if it is more specific, a larger unsafe block is too broad as it also allows unsafe code usage to initialise the safe fields, which should get their own unsafe blocks. Though perhaps this could just be linted against, e.g. don’t use unsafe expressions in a struct initialiser without putting them inside nested unsafe blocks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The alternative is to do this: let mut foo = Foo {
safe_field: 0,
unsafe_field: unsafe { 1 },
}; That feels worse to me. Presumably the safety invariant being promised is an invariant within the struct, even if it is not strictly required that that be the case. Ideally we'd also have partial initialization, such that it would be possible to do let mut foo = Foo { safe_field: 0 };
unsafe { foo.unsafe_field = 1; } but I expect that's quite a bit farther away. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that just unsafe in one field initialiser expression is insufficient as it means something very different. I do like however that it clearly shows which field the unsafety applies to. It reminds me a bit of unsafe blocks in unsafe functions (#2585). Ideally, the fact that struct init is unsafe would not allow unsafe field init expressions. I doubt that special-casing struct inits to not propagate outer unsafe blocks would be backwards compatible, so a new lint would be the only avenue in this direction. Some new syntax like this let mut foo = unsafe Foo {
safe_field: 0,
unsafe unsafe_field: 1,
}; would communicate intent better but looks quite unnatural to me. Perhaps in the future there might be explicit safe blocks to reassert a safe context within an unsafe block, so that it could be written as follows: let mut foo = unsafe {
Foo {
safe_field: safe { /* some more complex expr here */ },
unsafe_field: safe { /* some more complex expr here */ },
}
}; which could be encouraged with clippy lints. Though for now just moving safe non-trivial struct field initialisers into variables that are initialised outside the unsafe block. Overall, I think going with the original syntax of let mut foo = unsafe {
Foo {
safe_field: 2,
unsafe_field: 1,
}
}; looks like a good and intuitive solution, though I still think that a lint against using unsafe expressions inside the struct expression without a nested unsafe block would still help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe this? let mut foo = unsafe Foo {
safe_field: 0,
unsafe_field: 1,
};
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like it - the syntax is still close enough to the unsafe block syntax (except when initialising tuple structs) that it's relatively intuitive what the unsafety applies to, but coupled strongly to the struct name so that it also makes sense why no unsafe context is introduced. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's something nice about let mut foo = unsafe Foo {
safe_field: 0,
unsafe_field: 1,
}; but to me that would go with Spitballing: if per-field safety is really needed, then maybe let mut foo = Foo {
safe_field: 0,
unsafe unsafe_field: 1,
}; though maybe the answer is the same as for After all, there's always field shorthand available, so you can do let safe_field = unsafe { complex to construct };
let unsafe_field = easy and safe to construct;
unsafe { Foo { safe_field, unsafe_field } } Perhaps the more important factor would be the workflow for the errors the programmer gets when changing a (internal and thus not semver break) field to Of course, if the safety is at the struct level (not the field level) then that doesn't come up. Hmm, the "you added an additional requirement to something that's already in an unsafe block and there's no way to help you make sure you handed that" is a pre-existing problem that needs tooling for that everywhere, so maybe it's not something this RFC needs to think about. If we had There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not Foo {
safe_field: 0,
unsafe { unsafe_field: 1 },
} ? |
||
Foo { | ||
safe_field: 0, | ||
unsafe_field: 1, | ||
} | ||
}; | ||
|
||
// Safe field: no unsafe block. | ||
foo.safe_field = 1; | ||
|
||
// Unsafe field with mutation: unsafe block is required. | ||
// Safety: The value is odd. | ||
unsafe { foo.unsafe_field = 3; } | ||
|
||
// Unsafe field without mutation: no unsafe block. | ||
println!("{}", foo.unsafe_field); | ||
``` | ||
|
||
For a full description of where a mutable access is considered to have occurred (and why), see | ||
[RFC 3323]. Keep in mind that due to reborrowing, a mutable access of an unsafe field is not | ||
necessarily explicit. | ||
|
||
[RFC 3323]: https://rust-lang.github.io/rfcs/3323-restrictions.html#where-does-a-mutation-occur | ||
|
||
```rust | ||
fn change_unsafe_field(foo: &mut Foo) { | ||
// Safety: An odd number plus two remains an odd number. | ||
unsafe { foo.unsafe_field += 2; } | ||
jswrenn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
``` | ||
|
||
# Reference-level explanation | ||
|
||
## Syntax | ||
|
||
Using the syntax from [the reference for structs][struct syntax], the change needed to support | ||
unsafe fields is minimal. | ||
|
||
[struct syntax]: https://doc.rust-lang.org/stable/reference/items/structs.html#structs | ||
|
||
```diff | ||
StructField : | ||
OuterAttribute* | ||
Visibility? | ||
+ unsafe? | ||
IDENTIFIER : Type | ||
|
||
TupleField : | ||
OuterAttribute* | ||
Visibility? | ||
+ unsafe? | ||
Type | ||
``` | ||
|
||
## Behavior | ||
|
||
An unsafe field may only be mutated or initialized in an unsafe context. Failure to do so is a compile error. | ||
|
||
## "Mutable use" in the compiler | ||
|
||
The concept of a "mutable use" [already exists][mutating use] within the compiler. This catches all | ||
situations that are relevant here, including `ptr::addr_of_mut!`, `&mut`, and direct assignment to a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One could argue that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider this in conjunction with the examples present in the RFC. fn make_zero(p: *mut u8) {
unsafe { *p = 0; }
}
let p = ptr::addr_of_mut!(foo.unsafe_field);
make_zero(p); Ignoring thread-safety, which can be easily achieved with locks, no single step appears wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "assigning zero to an arbitrary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True — I typed that far too quickly and without thinking. Regardless, it's not immediately obvious to me that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not fully convinced either way, but I fear it would just be confusing to require There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
While you can do that, it's undefined behavior to actually mutate the resulting mut pointer. I've just confirmed this with miri. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's UB under Stacked Borrows, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's surprising. I'm not familiar with tree borrows, admittedly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
field, while excluding interior mutability. As such, formal semantics of what constitutes a "mutable | ||
use" are not stated here. | ||
|
||
[mutating use]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/visit/enum.PlaceContext.html#method.is_mutating_use | ||
|
||
# Drawbacks | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add something about the proliferation of This isn't necessarily a bad thing, but it does seem like it starts uplifting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a crate uses no unsafe, then it has no need for unsafe fields. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposal is guided toward more broadly preventing logic errors rather than strictly what is now known as UB via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I stand corrected on the flags case - but the RFC example is using these fields to ensure that a value is odd with no other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's fair. The example of an even number isn't the best, as it's contrived. There's an unstated assumption that it's being relied upon as a safety invariant somewhere else in the program. I can definitely improve that example. |
||
|
||
- Additional syntax for macros to handle | ||
- More syntax to learn | ||
|
||
# Prior art | ||
|
||
Some items in the Rust standard library have `#[rustc_layout_scalar_valid_range_start]`, | ||
`#[rustc_layout_scalar_valid_range_end]`, or both. These items have identical behavior to that of | ||
unsafe fields described here. It is likely (though not required by this RFC) that these items will | ||
be required to use unsafe fields, which would reduce special-casing of the standard library. | ||
jswrenn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Unresolved questions | ||
|
||
- If the syntax for restrictions does not change, what is the ordering of keywords on a field that | ||
is both unsafe and mut-restricted? | ||
- Are there any interactions or edge cases with other language features that need to be considered? | ||
|
||
# Future possibilities | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this RFC is written, and as unstably implemented,
I scanned the discussion but did not see any mention of supporting an unsafe-to-modify-only variety. Is that something that could be a future possibility? Or has it been considered and rejected? (One could argue that it is pointless because, in such cases, one can always provide a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I argue the same thing here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is true. Since this RFC is the first major extension of field safety tooling since 1.0, we take a syntactically and semantically conservative approach: The keyword for denoting an invariant is simply
It is true that there are a lot of use-cases in which mutation is specifically problematic, but there are also a lot of use-cases for which other uses are problematic, too. Lately, I've been reviewing more of the latter than the former! In any case, this RFC is aimed at folks who want to be very diligent about their use of fields that carry invariants. I've found that it's no so hard to write "
Whoops, I thought I had written about this somewhere. An unsafe-to-modify-only variety is difficult to support well due to the possibility of interior mutation.
Perhaps! If Rust has more information about the kind of invariant the field has, its safety analysis can be more refined and tailored to the restrictions imposed by that invariant. We might also pair additional syntactic information with facts known about the type, like whether or not it contains There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see also: #3458 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That (and the thing about interior mutability being a hazard) makes sense to me. However, getters impede borrow splitting (unless and until we get some kind of subsetting type for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @moulins It's impossible to determine whether or not a field can be written through an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In fact, because of the moving-out problem, the wrapper struct solution can enable better scoped Compare these two examples (for simplicity, we assume that // With unsafe-to-read fields
mod string_stuff {
/// Safe to call even if `input` is not valid UTF-8
pub fn process_sketchy_string(input: String) {
if core::str::from_utf8(input.as_bytes()).is_ok() {
dbg!(input);
} else {
println!("not valid utf8");
}
}
}
struct Foo {
unsafe maybe_invalid_utf8: String,
}
fn do_thing(foo: Foo) {
// SAFETY: we immediately pass to `process_sketchy_string()`,
// which is documented safe to call even when `s` is invalid UTF-8
let s = unsafe { foo.maybe_invalid_utf8 };
process_sketchy_string(s);
}
fn just_drop_the_string(foo: Foo) {
// SAFETY: we immediately drop the `String` without reading its contents
let s = unsafe { foo.maybe_invalid_utf8 };
drop(s);
} // With unsafe-to-write fields + wrapper struct
mod string_stuff {
pub fn process_sketchy_string(input: UnsafeReadWrapper<String>) {
let string: Option<String> = {
// SAFETY: we immediately check for UTF-8 validity
// before doing anything else with the string
let input = unsafe { input.into_inner() };
core::str::from_utf8(input.as_bytes())
.is_ok()
.then_some(input)
};
if let Some(string) = string {
dbg!(string);
} else {
println!("not valid utf8");
}
}
}
struct Foo {
unsafe maybe_invalid_utf8: UnsafeReadWrapper<String>,
}
fn do_thing(foo: Foo) {
let s = foo.maybe_invalid_utf8;
process_sketchy_string(s);
}
fn just_drop_the_string(foo: Foo) {
let s = foo.maybe_invalid_utf8;
drop(s);
} With the wrapper type solution, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Would be good if the RFC would contain this example, since the question is bound to come up again :)
This only works if you have a type that exactly matches the invariant you care for. Presumably There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The RFC now documents the alternatives of making moving safe, and copying safe: cf0516b There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Even if the call to |
||
|
||
?? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unsafe fields could give the compiler a signal that is useful for tons of things -- safe transmute, for instance. But we should also at least consider whether e.g. deriving auto traits like |
Uh oh!
There was an error while loading. Please reload this page.