|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Rust & the case of the disappearing stack frames" |
| 4 | +author: Kyle Strand on behalf of the FFI-unwind project group |
| 5 | +description: "introducing an exploration of how `longjmp` and similar functions can be handled in Rust" |
| 6 | +team: the FFI-unwind project group <https://www.rust-lang.org/governance/teams/lang#wg-ffi-unwind> |
| 7 | +--- |
| 8 | + |
| 9 | +Now that the [FFI-unwind Project Group][proj-group-gh] has merged [an |
| 10 | +RFC][c-unwind-rfc] specifying the `"C unwind"` ABI and removing some instances |
| 11 | +of undefined behavior in the `"C"` ABI, we are ready to establish new goals for |
| 12 | +the group. |
| 13 | + |
| 14 | +Our most important task, of course, is to implement the newly-specified |
| 15 | +behavior. This work has been undertaken by Katelyn Martin and can be followed |
| 16 | +[here][c-unwind-pr]. |
| 17 | + |
| 18 | +The requirements of our current charter, and the [RFC creating the |
| 19 | +group][proj-group-rfc], are effectively fulfilled by the specification of `"C |
| 20 | +unwind"`, so one option is to simply wind down the project group. While |
| 21 | +drafting the `"C unwind"` RFC, however, we discovered that the existing |
| 22 | +guarantees around `longjmp` and similar functions could be improved. Although |
| 23 | +this is not strictly related to unwinding<sup>[1](#longjmp-unwind)</sup>, they |
| 24 | +are closesly related: they are both "non-local" control-flow mechanisms that |
| 25 | +prevent functions from returning normally. Because one of the goals of the Rust |
| 26 | +project is for Rust to interoperate with existing C-like languages, and these |
| 27 | +control-flow mechanisms are widely used in practice, we believe that Rust must |
| 28 | +have some level of support for them. |
| 29 | + |
| 30 | +This blog post will explain the problem space. If you're interested in helping |
| 31 | +specify this behavior, please come join us in [our Zulip |
| 32 | +stream][proj-group-zulip]! |
| 33 | + |
| 34 | +## `longjmp` and its ilk |
| 35 | + |
| 36 | +Above, I mentioned `longjmp` and "similar functions". Within the context of the |
| 37 | +`"C unwind"` PR, this referred to functions that have different implementations |
| 38 | +on different platforms, and which, on *some* platforms, rely on [forced |
| 39 | +unwinding][forced-unwinding]. In our next specification effort, however, we |
| 40 | +would like to ignore the connection to unwinding entirely, and define a class |
| 41 | +of functions with the following characteristic: |
| 42 | + |
| 43 | +> a function that causes a "jump" in control flow by deallocating some number of |
| 44 | +> stack frames without performing any additional "clean-up" such as running |
| 45 | +> destructors |
| 46 | +
|
| 47 | +This is the class of functions we would like to address. The other primary |
| 48 | +example is `pthread_exit`. As part of our specification, we would like to |
| 49 | +create a name for this type of function, but we have not settled on one yet; |
| 50 | +for now, we are referring to them as "cancelable", "`longjmp`-like", or |
| 51 | +"stack-deallocating" functions. |
| 52 | + |
| 53 | +## Our constraints |
| 54 | + |
| 55 | +Taking a step back, we have two mandatory constraints on our design: |
| 56 | + |
| 57 | +* There must be sound way to call `libc` functions that may `pthread_cancel`. |
| 58 | +* There must be a sound way for Rust code to invoke C code that may `longjmp` |
| 59 | + over Rust frames. |
| 60 | + |
| 61 | +In addition, we would like to adhere to several design principles: |
| 62 | + |
| 63 | +* The specified behavior can't be target-platform-specific; in other words, our |
| 64 | + specification of Rust's interaction with `longjmp` should not depend on |
| 65 | + whether `longjmp` deallocates frames or initiates a forced-unwind. |
| 66 | + Optimizations, however, *can* be target-platform-specific. |
| 67 | +* There should be no difference in the specified behavior of frame-deallocation |
| 68 | + performed by `longjmp` versus that performed by `pthread_cancel`. |
| 69 | +* We will only permit canceling POFs ("Plain Old Frames", explained in the next |
| 70 | + section). |
| 71 | + |
| 72 | +## POFs and stack-deallocating functions |
| 73 | + |
| 74 | +The `"C unwind"` RFC introduced a new concept designed to help us deal with |
| 75 | +force-unwinding or stack-deallocating functions: the [POF, or "Plain Old |
| 76 | +Frame"][POF-definition]. These are frames that can be trivially deallocated, |
| 77 | +i.e., they do no "cleanup" (such as running `Drop` destructors) before |
| 78 | +returning. |
| 79 | + |
| 80 | +From the definition, it should be clear that it is dangerous to call a |
| 81 | +stack-deallocating function in a context that could destroy a non-POF stack |
| 82 | +frame. A simple specification for Rust's interaction with stack-deallocating |
| 83 | +functions, then, could be that they are safe to call as long as only POFs are |
| 84 | +deallocated. This would make Rust's guarantees for `longjmp` essentially the |
| 85 | +same as C++'s. |
| 86 | + |
| 87 | +For now, however, we are considering POFs to be "necessary but not sufficient." |
| 88 | +We believe that a more restrictive specification may provide the following |
| 89 | +advantages: |
| 90 | + |
| 91 | +* more opportunities for helpful compiler warnings or errors to prevent misuse |
| 92 | + of stack-deallocation functions |
| 93 | +* semantic tracatbility: we can make reliance on stack-frame-deallocation |
| 94 | + visible for all functions involved |
| 95 | +* increased optimization potential when cleanup is "guaranteed" (i.e., the |
| 96 | + compiler may turn a POF into a non-POF if it knows that this is safe and that |
| 97 | + the newly inserted cleanup operation is necessary for an optimization) |
| 98 | + |
| 99 | +## Annotating POFs |
| 100 | + |
| 101 | +Our current plan is to introduce a new annotation for frames that are intended |
| 102 | +to be safe to cancel. These functions, of course, must be POFs. The |
| 103 | +annotation would be "transitive", just like `async`: functions without this |
| 104 | +annotation either must not invoke any annotated functions or must guarantee |
| 105 | +that they will cause the stack-deallocation to terminate (for instance, a |
| 106 | +non-POF, non-annotated function may call `setjmp`). |
| 107 | + |
| 108 | +### Open questions |
| 109 | + |
| 110 | +The name of the annotation should be based on the terminology used to refer to |
| 111 | +functions that are safe to deallocate. Because this terminology is not |
| 112 | +finalized, we do not yet have a name for the annotation. |
| 113 | + |
| 114 | +It is also not yet clear whether annotated functions should be able to invoke |
| 115 | +any functions without this annotation. As long as the function call does not |
| 116 | +return a new `Drop` resource (making the annotated function no longer a POF), |
| 117 | +it may be safe, as long as we guarantee that the annotated function cannot be |
| 118 | +canceled while the un-annotated function is still on the stack; i.e., |
| 119 | +cancelation must happen during an active call to an annotated cancelable |
| 120 | +function. |
| 121 | + |
| 122 | +Most importantly, we do not have a plan for how to indicate that a |
| 123 | +non-annotated function can safely call an annotated function. The example of |
| 124 | +using `setjmp` to ensure that a `longjmp` will not discard a stack frame is |
| 125 | +non-trivial: |
| 126 | + |
| 127 | +* `setjmp` is not a function but a C macro. There is no way to call it directly |
| 128 | + in Rust. |
| 129 | +* `setjmp` does not prevent arbitrary `longjmp`s from crossing over a frame, |
| 130 | + the way C++'s `catch` can catch any exception. Instead, `setjmp` creates an |
| 131 | + object of type `jmp_buf`, which must be passed to `longjmp`; this causes the |
| 132 | + jump to stop at the corresponding `setjmp` call. |
| 133 | + |
| 134 | +And, of course, `setjmp`/`longjmp` is not the only example of such a mechanism! |
| 135 | +Thus, there is probably no way for the compiler to guarantee that this is safe, |
| 136 | +and it's unclear what heuristics could be applied to make it as safe as |
| 137 | +possible. |
| 138 | + |
| 139 | +### Examples |
| 140 | + |
| 141 | +Let us use `#[pof-longjmp]` as a placeholder for the annotation indicating a |
| 142 | +function that can be safely deallocated, and let us assume that the following |
| 143 | +function is a wrapper around `longjmp`: |
| 144 | + |
| 145 | +```rust |
| 146 | +extern "C" { |
| 147 | + #[pof-longjmp] |
| 148 | + fn longjmp(CJmpBuf) -> !; |
| 149 | +} |
| 150 | +``` |
| 151 | + |
| 152 | +The compiler would not allow this: |
| 153 | + |
| 154 | +```rust |
| 155 | +fn has_drop(jmp_buf: CJmpBuf) { |
| 156 | + let s = "string data".to_owned(); |
| 157 | + unsafe { longjmp(jmp_buf); } |
| 158 | + println!("{}", s); |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +Here, `s` implements `Drop`, so `has_drop` is not a POF. Since `longjmp` is |
| 163 | +annotated `#[pof-longjmp]`, the un-annotated function `has_drop` can't call it |
| 164 | +(even in an `unsafe` block). If, however, `has_drop` is annotated: |
| 165 | + |
| 166 | +```rust |
| 167 | +#[pof-longjmp] |
| 168 | +fn has_drop(jmp_buf: CJmpBuf) { |
| 169 | + let s = "string data".to_owned(); |
| 170 | + unsafe { longjmp(jmp_buf); } |
| 171 | + println!("{}", s); |
| 172 | +} |
| 173 | +``` |
| 174 | + |
| 175 | +...there is a different error: `#[pof-longjmp]` can only be applied to POFs, |
| 176 | +and since `s` implements `Drop`, `has_drop` is not a POF. |
| 177 | + |
| 178 | +An example of a permissible `longjmp` call would be: |
| 179 | + |
| 180 | +```rust |
| 181 | +#[pof-longjmp] |
| 182 | +fn no_drop(jmp_buf: CJmpBuf) { |
| 183 | + let s = "string data"; |
| 184 | + unsafe { longjmp(jmp_buf); } |
| 185 | + println!("{}", s); |
| 186 | +} |
| 187 | +``` |
| 188 | + |
| 189 | +## Join us! |
| 190 | + |
| 191 | +If you would like to help us create this specification and write an RFC for it, |
| 192 | +please join us in [zulip][proj-group-zulip]! |
| 193 | + |
| 194 | +#### Footnotes |
| 195 | + |
| 196 | +<a name="longjmp-unwind">1</a>: As mentioned in the RFC, on Windows, |
| 197 | +`longjmp` actually *is* an unwinding operation. On other platforms, however, |
| 198 | +`longjmp` is unrelated to unwinding. |
| 199 | + |
| 200 | +[proj-group-gh]: https://github.com/rust-lang/project-ffi-unwind |
| 201 | +[proj-group-rfc]: https://github.com/rust-lang/rfcs/blob/master/text/2797-project-ffi-unwind.md |
| 202 | +[proj-group-zulip]: https://rust-lang.zulipchat.com/#narrow/stream/210922-project-ffi-unwind/topic/welcome.2C.20redux/near/216807277 |
| 203 | +[c-unwind-rfc]: https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md |
| 204 | +[c-unwind-pr]: https://github.com/rust-lang/rust/pull/76570 |
| 205 | +[forced-unwinding]: https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md#forced-unwinding |
| 206 | +[POF-definition]: https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md#plain-old-frames |
0 commit comments