-
Notifications
You must be signed in to change notification settings - Fork 225
Fix: stack-use-after-free after recent run_loop changes #1742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The changes introduced to NVIDIA#1683 (changing run_loop from a condition variable to atomics) introduced a stack-use-after-free when the task completion is happening on a different thread: `run_loop::finish` might cause `sync_wait_t::apply_sender` to return before scheduling the noop task, ending the lifetime of the queue before the noop task push could be finalized: - Thread 1 sets __finishing_ to true - Thread 2 observed __finishing is true, __noop_task hasn't been pushed yet and finishes successfully returns. - Thread 1 tries to push __noop_task to the queue, which now is out of scope. To avoid this situation, the sync_wait state is additionally synchronized on the `finish` being returning successfully (avoiding the aforementioned race condition).
|
|
||
| void wait() noexcept { | ||
| // Account for spurios wakeups | ||
| while (!__done_.load(stdexec::__std::memory_order_acquire)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a use-after-free bug; a background thread can store true to __done_, then before it calls notify, the waiter thread may end up seeing __done_ is true and break (say, a spurious wakeup, or the wait had never reached the first iteration of the loop), then the sync_wait returns and destructs.
Btw, I don't think you need to loop here .. std::atomic::wait only returns when the value has changed, accounting for spurious wakeups (it has a loop internally).
I was able to detect this with Relacy, which I am in the process of updating to work with stdexec. The test is https://github.com/NVIDIA/stdexec/compare/main...ccotter:stdexec:sync-wait-rrd-bug?expand=1
To run the test, follow these instructions, except use this branch instead of relacy's main branch: dvyukov/relacy#33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure how this can be addressed. In code like
exec::static_thread_pool pool{1};
{
auto s = ex::schedule(pool.get_scheduler()) | ex::then([] { return 0; });
ex::sync_wait(s);
}
when the background thread notifies the foreground thread to wake up and unblock, the communication channel (condition variable, atomic, other) may become destroyed when the sync_wait objects on the stack created by the foreground thread go out of scope. Whether the notifier uses done.store(true); done.notify_one(); or done = true; cv.notify();, I'm not sure how to ensure the foreground thread will wake up and continue, but only after the background thread has finishing doing the notification...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lewissbaker - wondering if you might be able to shed some light on this .. I seem to have hit a mental dead end in not understanding how it's possible to to have a sync_wait() without allocating the internal run_loop on the heap, and giving a strong reference to the receiver to guarantee lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccotter The receiver is the one allocating the shared state (currently on the stack).
In fact, the code you mentioned above was very similar to the code where I detected the bug this fix is for (see here: https://github.com/olympus-robotics/hephaestus/blob/main/modules/concurrency/tests/context_tests.cpp#L96).
In fact, the solution I came up with should be actually safe. I don't have a full proof, but ran my testcase for around a million times.
Let me try to answer why what you are describing should be safe:
- Both the state and the receiver are owned by thread calling sync_wait
- The execution context on a different thread merely has a reference to the shared state
The execution context signals completion by setting performing the following steps (in a strongly "happens before" order):
1.1. Set the finishing of the run loop to true
1.2. (optional) signalling the run loop
1.3. Set done to true
1.4. Signal done
The execution context waiting on completion is performing the following steps:
2.1. If finished is true, goto 5.
2.2. Wait for new task
2.3. Execute task
2.4. Goto 1.
2.6. Wait for done to be set to true
2.7. return (ends lifetime of the state)
Without the 'done' step, we run into the situation that step 1.2 might access invalid memory because we already ended up in 2.7. too early.
Previously, this was ensured by actually holding the mutex while signalling the condition variable (
| __cv_.notify_all(); |
I can see how there is still a tiny window which can cause a use after free as you described (even after removing the loop).
I think the only way this can be truly fixed is to introduce a count of 'tasks in flight' for the run_loop:
- Each
run_loop::__push_backcall increments this count - Each
run_loop::__execute_alldecrements this count by the amount it handled run_loop::finishincrements the count before setting the finishing flag and decrements it before returning.run_loop::runis looping as long as the count is non zero and finishing is set to true.
I wanted to avoid this, since it sounded expensive and the solution I propose here at least seems to work...
Thanks for pointing this out! |
We need to synchronize returning from `__run_loop_base::run` with potentially concurrent calls to `__run_loop_base::finish`. This is done by introducing a counter, ensuring proper completion of all tasks in flight. Also see NVIDIA#1742 for additional information.
|
Closing in favor of #1746 |
We need to synchronize returning from `__run_loop_base::run` with potentially concurrent calls to `__run_loop_base::finish`. This is done by introducing a counter, ensuring proper completion of all tasks in flight. Also see NVIDIA#1742 for additional information.
The changes introduced to #1683 (changing run_loop from a condition variable to atomics) introduced a stack-use-after-free when the task completion is happening on a different thread:
run_loop::finishmight causesync_wait_t::apply_senderto return before scheduling the noop task, endingthe lifetime of the queue before the noop task push could be finalized:
To avoid this situation, the sync_wait state is additionally synchronized on the
finishbeing returning successfully (avoiding the aforementioned race condition).