Skip to content

Conversation

HKalbasi
Copy link
Member

This PR adds a -Zmiri-tree-borrows-strong which tries to be what requested in rust-lang/unsafe-code-guidelines#573. Currently, I only implemented the bullet 3 (implicit writes on &mut retags) to get feedback, but the plan is to eventually add flags for the other bullets under this -Zmiri-tree-borrows-strong and retire stacked-borrows by replacing it with tree-borrows-strong.

About the implementation, I didn't do an actual write, and instead set the initial permission to Active which I think has the same effect if I understand tree borrows correctly.

If anyone wants to run this PR against the ecosystem crates, please check my implementation first. I wrote a test which is passing, but I'm not sure about the corner cases.

@rustbot
Copy link
Collaborator

rustbot commented Sep 14, 2025

Thank you for contributing to Miri!
Please remember to not force-push to the PR branch except when you need to rebase due to a conflict or when the reviewer asks you for it.

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Sep 14, 2025
@RalfJung
Copy link
Member

Thanks for the PR! However, it would have been better to ask first before just going ahead and implementing something. I have a student who was meant to work on this if there is time (though it's not clear yet whether there will be time).

@RalfJung
Copy link
Member

About the implementation, I didn't do an actual write, and instead set the initial permission to Active which I think has the same effect if I understand tree borrows correctly.

Not quite. There's also this logic here which has to do writes instead of reads:

tree_borrows.perform_access(
orig_tag,
Some((range_in_alloc, AccessKind::Read, diagnostics::AccessCause::Reborrow)),
this.machine.borrow_tracker.as_ref().unwrap(),
alloc_id,
this.machine.current_span(),
)?;
// Also inform the data race model (but only if any bytes are actually affected).
if range_in_alloc.size.bytes() > 0 {
if let Some(data_race) = alloc_extra.data_race.as_vclocks_ref() {
data_race.read(
alloc_id,
range_in_alloc,
NaReadType::Retag,
Some(place.layout.ty),
&this.machine,
)?
}
}

I expect this will cause widespread issues...

@HKalbasi
Copy link
Member Author

it would have been better to ask first before just going ahead and implementing something. I have a student who was meant to work on this if there is time (though it's not clear yet whether there will be time).

Sorry for not coordinating in advance. I thought there is no news on it since there were no comments from ~3 months ago that the issue was created, and I looked at open PRs and didn't find something related. I just liked to see progress on this, so I opened this PR. I didn't put too much effort on the implementation and tried to make the scope of this PR as small as possible to get fast feedback on the technical details, but didn't expect that it may interfere with your planned works. Feel free to close this if it is causing problems for your student.

Not quite. There's also this logic here which has to do writes instead of reads:

Can you give me a test which should be UB if we want to use llvm's writeable attribute, but it is not in my implementation? From the given code I guessed it may have something to do with data races, but I added a data race test, and it is UB with current -Ztree-borrows-strong. My understanding is that all data races with &mut x are now UB since it is both active and protected so it rejects every foreign read and write.

@RalfJung
Copy link
Member

RalfJung commented Sep 17, 2025

Sorry for not coordinating in advance. I thought there is no news on it since there were no comments from ~3 months ago that the issue was created, and I looked at open PRs and didn't find something related. I just liked to see progress on this, so I opened this PR. I didn't put too much effort on the implementation and tried to make the scope of this PR as small as possible to get fast feedback on the technical details, but didn't expect that it may interfere with your planned works. Feel free to close this if it is causing problems for your student.

Yeah fair, there was no way for you to know.
And as I said it is unclear whether the student will get to this. It's just always good to have some extra tasks available in case the student finishes the original project way too fast and the semester is not over yet. ;)

So, if this starts to become difficult and needs actual design work, I might hold off on this. For now, we can see how it goes. However as you noticed tests are already failing, because creating new Active nodes without pushing a "write" side-effect through the tree violates some core Tree Borrows invariants.

Can you give me a test which should be UB if we want to use llvm's writeable attribute, but it is not in my implementation?

First of all, your approach allows trees where there are two Active adjacent to each other. The general assumption everywhere in Tree Borrows is that for a node to become "active", a write has occurred, which will update the states of all the other nodes accordingly. Your proposal breaks this fundamental invariant of Tree Borrows, and I think it is related to why the tests are failing. It also means much of the theory we developed for Tree Borrows in terms of correctness proofs for optimizations goes out of the window.

I don't think there is any world in which we want to have "active" nodes without propagating an actual write through the entire tree. Do you have a particular motivation for this approach?

In terms of examples, looking at your tests, it's actually quite interesting. The reason you get these behaviors is that when the function returns, an implicit write is performed on all locations that are "active". So the implicit write actually happens, but it happens oddly late (on function return, rather than already on function call). So examples that surprisingly have no UB would be examples that never return from the function, for instance:

fn f(x: &mut i32) {
    std::process::exit(0);
}

fn main() {
    let mut a = &0;
    let ptr = &raw const *a as *mut i32;
    unsafe { f(&mut *ptr) };
}

I don't know if this is actually problematic, but it is certainly surprising.

The problematic examples are the ones that are UB now, and they are the reason why I think some deeper design work is required. I assume this is UB now:

fn foo(x: &mut [i32]) {
    assert!(x.len() >= 2);
    unsafe {
        let to = x.as_mut_ptr();
        let from = x.as_ptr().add(1);
        to.copy_from(from, 1);
    }
}

foo(&mut [0, 1]);

Stacked Borrows accepts this (thanks to a terrible hack). I don't think we can actually roll out an aliasing model that rejects this. (Worse, it remains UB even if you swap to and from. If at least one of the two orders would work that'd be something at least.)

@HKalbasi
Copy link
Member Author

Your proposal breaks this fundamental invariant of Tree Borrows, and I think it is related to why the tests are failing. ... Do you have a particular motivation for this approach?

No, I thought doing a write and starting with active are almost equivalent, but now I understand that they are very different. I will change it to an actual write once we have decided on more important issues. The reason of my stubbornness is that I want to learn the reasons behind decisions. Thank you for your patience and detailed responses.

The problematic examples are the ones that are UB now, and they are the reason why I think some deeper design work is required. I assume this is UB now:

This one is interesting. I think the problem is that .as_mut_ptr() is intended to work like an as cast to a raw pointer, but it may get spurious writes if we add writable to every function argument. That is, if we have code that has two identical reads from a reference, and an .as_mut_ptr() in the middle of those reads, any aliasing model should do one of these:

  1. Declare the code as UB, which limits the usability of .as_mut_ptr(). (stacked-borrows)
  2. Miss "using the old value, eliminating a read" optimization
  3. Don't use writeable attribute on the argument of .as_mut_ptr(). (tree-borrows)

Otherwise, eliminating the read is a wrong optimization, since a spurious write can happen in the .as_mut_ptr().

The correct option I think is the option 3, since the writeable attribute is useless for the .as_mut_ptr(). But that attribute is generally useful, so if I want to "deduce" a design from this observation, it would be some way to opt-out from the writable attribute, and use that in .as_mut_ptr() and similar functions.

For opting out of the writable attribute, these choices come to my mind:

  1. Hard code it for some std functions (makes spec and miri more complex, so not desired, but might be a good way to start experimenting)
  2. Use some type or wrapper on the reference (very hard to do since .as_mut_ptr() is stable since forever and takes a simple &mut)
  3. Use some function attribute

For function attribute, there are many choices, for example a simple #[dont_use_llvm_writable] would do the job, but it is very LLVM-centric. I thought about #[opsem_inline] which basically disables all protectors and FnEntry related things for the given function.

What do you think? Do you agree with me that we need some way to opt-out from llvm's writable attribute for the .as_mut_ptr()? Do you want to run an experiment on top crates, with initial write on every function entry except some hard-coded std functions?

@RalfJung
Copy link
Member

No, I thought doing a write and starting with active are almost equivalent, but now I understand that they are very different. I will change it to an actual write once we have decided on more important issues.

I am not suggesting to do an actual write, just to propagate the effects of a write through the tree. That's not quite the same thing.

Declare the code as UB, which limits the usability of .as_mut_ptr(). (stacked-borrows)

SB doesn't make this code UB.

@HKalbasi
Copy link
Member Author

SB doesn't make this code UB.

Your example isn't UB, but I changed the order of let to and let from and it became UB with SB. So I'm fairly confident that my example with two reads would be UB. I will make a concrete example and run it with miri to make sure.

@RalfJung
Copy link
Member

RalfJung commented Sep 19, 2025

Use some function attribute

Indeed, that's the usual answer here. (This has been discussed a few times in various t-opsem venues.) This requires a bunch of design work for figuring the best way to indicate which references allow spurious writes and which do not.

Your example isn't UB, but I changed the order of let to and let from and it became UB with SB.

Indeed. But it's quite crucial that one of the two orders works at least. Ideally both should work.

So I'm fairly confident that my example with two reads would be UB.

I have no idea which example you are referring to.

Do you want to run an experiment on top crates, with initial write on every function entry except some hard-coded std functions?

That would probably end very similar to rust-lang/rust#142170.

@HKalbasi
Copy link
Member Author

That would probably end very similar to rust-lang/rust#142170.

So, hard coding std functions doesn't help, since there are similar functions in the ecosystem. I have another idea for making an experiment. Instead of adding a new attribute and causing churn in the ecosystem, we can disable protectors for #[inline] functions. My reasoning is that if a function is going to be inlined, there is almost no benefit in annotating it with noalias or writeable since the llvm would see the more accurate context and doesn't need these tags. The offending functions in smallvec are already marked as #[inline]. If the experiment becomes successful (we found no additional major UB in the ecosystem when doing the writes on FnEntry but disabling it for inline functions) then we have these actions for the next step:

  1. We can actually include this behavior of #[inline] in the model, and maybe introducing a new #[inline(opsem_isolated)] for people who want #[inline] as just a hint and don't want this semantic.
  2. We can add a dedicated attribute like #[opsem_inline] and invite the ecosystem to use this new attribute for functions that need to opt out from writeable.
  3. We can move this behavior to #[inline(always)]

The next step really depends on how missing noalias on #[inline] functions affects performance. We can do an experiment for this as well. We can also only omit the additional write for the #[inline] functions and keep the protectors, but doing this on a general attribute like #[inline] doesn't look good and I think it would be only useful for experiments.

@saethlin
Copy link
Member

I think #[inline] has too many other effects to staple additional properties onto it. As it is (often) an optimization hint, making it disable optimizations seems deeply wrong. I also offer my usual warning that there is a clippy lint missing_inline_in_public_items which I have seen in several libraries, and your proposed change would turn that into quite the footgun.

In addition one of the primary problems with #[inline] is that it tends to cause compile time explosion, and #[inline(always)] causes significantly worse compile time explosion, so encouraging their use for other purpose is quite dubious.

That being said, disabling noalias for #[inline] and #[inline(always)] seems like a good compiler experiment. The optimization value of noalias is nonzero but marginal, so I'm not sure how I'd interpret the results. Perhaps by comparison to -Zmutable-noalias=no?

@RalfJung
Copy link
Member

RalfJung commented Sep 20, 2025

If we're to do an experiment here I'd suggest to just disable the new "implicit write" behavior for inline functions, but keep the protectors (just make them reserved as before this PR).

I think what we'd want is that x.as_mut_ptr() becomes the same as x as *mut T, but that requires a bunch of things to change. Currently, the pointer returned by as_mut_ptr is retagged (at least) twice:

  • First the argument x passed to the method is retagged
  • Then inside the method it gets retagged+protected

Ideally we'd suppress both of these retags. Then we fully avoid the issues discussed in rust-lang/unsafe-code-guidelines#450. But it's unclear how to do that since one of the retags happens in the caller...

The weaker version of this just suppresses spurious writes, which is much easier: just retag the as_mut_ptr argument as reserved (like today) instead of active (as would have to be the case to allow spurious writes). Still unclear how to best express that. The crude version is an attribute that suppresses spurious writes for all reference arguments of a function.

@HKalbasi
Copy link
Member Author

I think #[inline] has too many other effects to staple additional properties onto it. As it is (often) an optimization hint, making it disable optimizations seems deeply wrong. I also offer my usual warning that there is a clippy lint missing_inline_in_public_items which I have seen in several libraries, and your proposed change would turn that into quite the footgun.

That would be a footgun if my hypothesis "noalias do not help inline functions anyway" is wrong. Let's test my hypothesis in rust-lang/rust#146823

I think what we'd want is that x.as_mut_ptr() becomes the same as x as *mut T, but that requires a bunch of things to change. Currently, the pointer returned by as_mut_ptr is retagged (at least) twice:

Do you see a world where we omit the retags and have noalias on the argument at the same time? I see that we can do noalias on x.as_mut_ptr() and treat it like x as *mut T at the same time (since noalias on as_mut_ptr is basically meaningless, there is no other arguments or function calls to alias with) but I guess we should get rid of noalias if we want to give a general mechanism for opting out of these retags.

If we're to do an experiment here I'd suggest to just disable the new "implicit write" behavior for inline functions, but keep the protectors (just make them reserved as before this PR).

Not touching the protectors makes sense to me, as we are not going to benefit from it anyway due to the caller's retag, and this is a tree-borrows-strong experiment, not tree-borrows-relaxed. I will try to do an experiment after fixing my code to propagate a write instead of using active as initial state.

@RalfJung
Copy link
Member

Do you see a world where we omit the retags and have noalias on the argument at the same time?

Absolutely not, the noalias is only sound if we retag there. Otherwise we'd have LLVM-level UB without Miri-level UB, aka a soundness bug in Miri.

@rustbot

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Sep 25, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@HKalbasi
Copy link
Member Author

Based on the experiments in rust-lang/rust#146823, I ended up not filtering functions based on #[inline], but based on whether they have raw pointers in their signature. You may say I'm overfitting to the .as_mut_ptr() but I looked at many functions in rustc that have raw pointers in their signature, and not having noalias/writable for references in their arguments seems reasonable to me.

I added tree_strong mode to all tests in pass/both_borrows, including the smallvec one which has an .as_mut_ptr() like function, and they are all passing. I also removed the initial active state and tried to perform an implicit write instead of it.

@RalfJung
Copy link
Member

You may say I'm overfitting to the .as_mut_ptr()

Yeah I might say that. ;)
Not sure how I feel about this as a new language-level guarantee... seems rather ad-hoc TBH. I can already see people add dummy raw ptr arguments to their functions just to appease Miri.^^

Also this should definitely include newtypes around raw pointers, in particular NonNull.

I also removed the initial active state and tried to perform an implicit write instead of it.

I wasn't suggesting an "instead of", I meant that both should happen. But this might also work.

@HKalbasi
Copy link
Member Author

Not sure how I feel about this as a new language-level guarantee... seems rather ad-hoc TBH. I can already see people add dummy raw ptr arguments to their functions just to appease Miri.^^

I think at the end we need an attribute to control whether a function can have noalias/writeable on it. But this "raw pointers in signature" can serve as a default for that attribute, to make existing code work without annotation.

Also this should definitely include newtypes around raw pointers, in particular NonNull.

How we should detect these newtypes? #[repr(transparent)] types with a raw pointer field?

I wasn't suggesting an "instead of", I meant that both should happen. But this might also work.

I think it works, since the immediate write will make the state immediately Active. I thought setting the initial state to Active is wrong since tree borrows is designed with not having Active as an initial state, but I will bring it back if you think that one is more correct.


Do you think a new experiment like the one in rust-lang/rust#142170 yields new results, now that the .as_mut_ptr() case has been worked around?

@RalfJung
Copy link
Member

I think at the end we need an attribute to control whether a function can have noalias/writeable on it. But this "raw pointers in signature" can serve as a default for that attribute, to make existing code work without annotation.

We can't use that heuristic/hack for noalias -- we already add noalias in codegen so we must do the corresponding retag in Miri, or else Miri might miss UB that LLVM exploits.

What we can consider as a hack/experiment is to tie the decision of whether to create reserved or unique mutable references to a heuristic that looks for raw pointers.

How we should detect these newtypes? #[repr(transparent)] types with a raw pointer field?

That's an option, though it'd be the first time (outside of ABI guarantees) that we make repr(transparent) relevant for Miri. Ideally we'd scan all fields... not sure what to do with enums.

For a first hack, just recursing below transparent isn't the most hacky part of this so 🤷 ;)

I think it works, since the immediate write will make the state immediately Active. I thought setting the initial state to Active is wrong since tree borrows is designed with not having Active as an initial state, but I will bring it back if you think that one is more correct.

We don't usually rely on the implicit read/write adjusting the state of the new node, that seems odd.

What is wrong is setting the initial state to active without doing an implicit write to do the corresponding updates to everything else.

But all the sanity checks we have (that failed in earlier versions of this PR) are written assuming there's only implicit reads, no implicit writes. It might be a bunch of work to fix them, and it's certainly a bunch of work to review any such change for correctness.

It really boils down to what the point of this PR is. The goal of rust-lang/unsafe-code-guidelines#573 is to remove SB from Miri. But we won't remove SB in favor of a variant of TB with a heuristic like this. It also seem rather obvious that if we have an attribute to control whether functions immediately "activate" their mutable references, then there's a way to make existing code sound by adding the attribute in enough places. So... what's the new information we learn from playing around with this heuristic?

@HKalbasi
Copy link
Member Author

We can't use that heuristic/hack for noalias -- we already add noalias in codegen so we must do the corresponding retag in Miri, or else Miri might miss UB that LLVM exploits.

If we continue with this, we need to remove noalias from codegen as well. Based on rust-lang/rust#146823 removing noalias from all functions involving raw pointers in their signature is perf neutral. At the end, we want to remove noalias at least for .as_mut_ptr() like functions, so some changes in codegen is needed anyway.

But noalias/retag is not touched in this PR, I just want to have a look on the future of solutions.

It really boils down to what the point of this PR is .... what's the new information we learn from playing around with this heuristic?

Using this heuristic, we can find other problems of making tree borrows strong beside .as_mut_ptr() like functions. If it ends up not having any other problems, and works as good as SB, why not having this as a temporary solution and retire SB? SB needs to eventually go, and this hack is forward compatible with solutions rust-lang/unsafe-code-guidelines#450. We can basically remove this hack when the correct solution is found, whether it be an attribute on functions, changing receiver type of .as_mut_ptr() and do some deref magic, and even changing the model to address it. The benefit of doing this instead of an explicit attribute is that we can change our mind and attempt other solutions without causing churn in the ecosystem.

Comment on lines +433 to +439
tree_borrows.perform_access(
new_tag,
Some((range_in_alloc, AccessKind::Write, AccessCause::Reborrow)),
this.machine.borrow_tracker.as_ref().unwrap(),
alloc_id,
this.machine.current_span(),
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already code a bit further up that does a write, that should be adjusted instead of adding a copy. We definitely don't want to do a read and a write.

@RalfJung
Copy link
Member

RalfJung commented Sep 30, 2025

If we continue with this, we need to remove noalias from codegen as well. Based on rust-lang/rust#146823 removing noalias from all functions involving raw pointers in their signature is perf neutral.

It's perf neutral for rustc itself. That's not necessarily a representative benchmark of its effect on other Rust code.

At the end, we want to remove noalias at least for .as_mut_ptr() like functions

Maybe we do, maybe we don't, that question is still open.

But yeah as you say your PR doesn't remove the retag for any function so it'd be sound without any codegen changes. That wasn't clear to me from your description.


If it ends up not having any other problems, and works as good as SB, why not having this as a temporary solution and retire SB?

Because it's too ad-hoc to be an acceptable long-term solution in my eyes. We won't remove a model that could be acceptable in favor of one that isn't.

We can use this to do a crater-at-home run if @saethlin has time, but I don't think we want to land something like this.

Comment on lines +216 to +219
do_a_write: start_mut_ref_on_fn_entry_as_active
&& is_protected
&& !current_function_involves_raw_pointers
&& matches!(ref_mutability, Some(Mutability::Mut)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new field, this should set both freeze_perm and nonfreeze_perm to Active.

That should then be used as a signal to control the AccessKind for the (already existing) initial access logic in tb_reborrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants