Skip to content

Commit 8e03527

Browse files
committed
[BOLT] Address review
- improve docs/PacRetDesign.md - improve pacret-split-funcs test
1 parent 5cbd4ce commit 8e03527

File tree

2 files changed

+117
-48
lines changed

2 files changed

+117
-48
lines changed

bolt/docs/PacRetDesign.md

Lines changed: 109 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,14 @@
11
# Optimizing binaries with pac-ret hardening
22

3-
This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state` DWARF instruction in BOLT. As is describes internal design decisions, the intended audience is BOLT developers. The document is an updated version of the [RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
3+
This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
4+
DWARF instruction in BOLT. As it describes internal design decisions, the
5+
intended audience is BOLT developers. The document is an updated version of the
6+
[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
47

58

6-
`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state` in assembly, or `OpNegateRAState` is BOLT sources. In this document, I will use **negate-ra-state** as a shorthand.
9+
`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state`
10+
in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
11+
**negate-ra-state** as a shorthand.
712

813
## Introduction
914

@@ -13,17 +18,26 @@ Refer to the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalys
1318

1419
### DW_CFA_AARCH64_negate_ra_state
1520

16-
The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
21+
The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
22+
the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
1723

1824
```
1925
The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
2026
```
2127

22-
This bit indicates to the unwinder whether the current return address is signed or not (hence the name). The unwinder uses this information to authenticate the pointer, and remove the Pointer Authentication Code (PAC) bits. Incorrect negate-ra-state placement can lead to the unwinder trying to authenticate an unsigned pointer (which segfaults), or skipping authenticating a signed pointer, and trying to access an incorrect location (also leading to a segfault).
28+
This bit indicates to the unwinder whether the current return address is signed
29+
or not (hence the name). The unwinder uses this information to authenticate the
30+
pointer, and remove the Pointer Authentication Code (PAC) bits. Incorrect
31+
negate-ra-state placement can lead to the unwinder trying to authenticate an
32+
unsigned pointer (which segfaults), or skipping authenticating a signed pointer,
33+
and trying to access an incorrect location (also leading to a segfault).
2334

24-
(Note: not *all* unwinders do this. Some use the `xpac` instruction to strip the PAC bits without authenticating the pointer. This is incorrect, as it allows control-flow modification in the case of unwinding.)
35+
Note: not *all* unwinders do this. Some use the `xpac` instruction to strip the
36+
PAC bits without authenticating the pointer. This is an incorrect (incomplete)
37+
implementation, as it allows control-flow modification in the case of unwinding.
2538

26-
There are no DWARF instructions to directly set or clear the RA State. However, two other CFIs can also affect the RA state:
39+
There are no DWARF instructions to directly set or clear the RA State. However,
40+
two other CFIs can also affect the RA state:
2741
- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
2842
- `DW_CFA_restore_state`: this CFI pops rules from this stack.
2943

@@ -37,13 +51,25 @@ Example:
3751
| DW_CFA_AARCH64_negate_ra_state | 1 -> 0 |
3852
| DW_CFA_restore_state | 0 -> 1 (popped from the stack) |
3953

40-
The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
54+
The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
55+
is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
4156

4257
### Where are these CFIs needed?
4358

44-
In all locations, where two consecutive instructions have different RA state, this need to be indicated to the unwinder. This happens at pointer signing and authenticating. The other case where two consecutive instructions have different RA state, but neither of them is signing or authenticating means that they are not next to each other in control flow. One is part of an execution path with signed RA, the other is part of a path with an unsigned RA.
45-
46-
In the example below, the first BasicBlock ends in a conditional branch, and jumps to two different BasicBlocks, each with their own authentication, and return. The instructions on the border of the second and third BasicBlock have different RA states. The `ret` at the end of the second BasicBlock is in unsigned state. The start of the third BasicBlock is after the `paciasp` in the control flow, but before the authentication. In this case, a negate-ra-state is needed at the end of the second BasicBlock.
59+
In all locations, where two consecutive instructions have different RA state,
60+
this needs to be indicated to the unwinder. This happens at pointer signing and
61+
authenticating. The other case where two consecutive instructions have different
62+
RA state, but neither of them is signing or authenticating means that they are
63+
not next to each other in control flow. One is part of an execution path with
64+
signed RA, the other is part of a path with an unsigned RA.
65+
66+
In the example below, the first BasicBlock ends in a conditional branch, and
67+
jumps to two different BasicBlocks, each with their own authentication, and
68+
return. The instructions on the border of the second and third BasicBlock have
69+
different RA states. The `ret` at the end of the second BasicBlock is in unsigned
70+
state. The start of the third BasicBlock is after the `paciasp` in the control
71+
flow, but before the authentication. In this case, a negate-ra-state is needed
72+
at the end of the second BasicBlock.
4773

4874
```
4975
+----------------+
@@ -69,86 +95,132 @@ In the example below, the first BasicBlock ends in a conditional branch, and jum
6995
```
7096

7197
> [!important]
72-
> The unwinder does not follow the control flow graph. It reads unwind information in the layout order.
98+
> The unwinder does not follow the control flow graph. It reads unwind
99+
> information in the layout order.
73100
74-
Because these locations are dependent on how the function layout looks, negate-ra-state CFIs will become invalid during BasicBlock reordering.
101+
Because these locations are dependent on how the function layout looks,
102+
negate-ra-state CFIs will become invalid during BasicBlock reordering.
75103

76104
## Solution design
77105

78106
The patch introduces two new passes:
79-
1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs in the input binary
80-
2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct places: wherever there is a state change between two consecutive instructions in the layout order.
107+
1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs
108+
in the input binary
109+
2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after
110+
optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
111+
places: wherever there is a state change between two consecutive instructions
112+
in the layout order.
81113

82-
To track metadata on individual instructions, the `MCAnnotation` class was extended. These also have helper function in `MCPlusBuilder`.
114+
To track metadata on individual instructions, the `MCAnnotation` class was
115+
extended. These also have helper functions in `MCPlusBuilder`.
83116

84117
### Saving annotations at CFI reading
85118

86-
CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`. At this point, we add MCAnnotations about negate-ra-state, remember-state and restore-state CFIs to the instructions they refer to. This is to not interfere with the CFI processing that already happens in BOLT (e.g. remember-state and restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).
119+
CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
120+
At this point, we add MCAnnotations about negate-ra-state, remember-state and
121+
restore-state CFIs to the instructions they refer to. This is to not interfere
122+
with the CFI processing that already happens in BOLT (e.g. remember-state and
123+
restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).
87124

88-
As we add the MCAnnotations *to instructions*, we have to account for the case where the function starts with a CFI altering the RA state. If a function starts with a negate-ra-state CFI for example, we cannot save the annotation on the first instruction, because that itself should already be signed. This is why all BinaryFunctions have an `initialRAState` bool. If the `Offset` the CFI refers to is zero, we don't store an annotation, but set the `initialRAState` in `FillCFIInfoFor`. This info is then used in `MarkRAStates`.
125+
As we add the MCAnnotations *to instructions*, we have to account for the case
126+
where the function starts with a CFI altering the RA state. If a function starts
127+
with a negate-ra-state CFI for example, we cannot save the annotation on the
128+
first instruction, because that itself should already be signed. This is why all
129+
BinaryFunctions have an `initialRAState` bool. If the `Offset` the CFI refers to
130+
is zero, we don't store an annotation, but set the `initialRAState` in
131+
`FillCFIInfoFor`. This information is then used in `MarkRAStates`.
89132

90133
### Binaries without DWARF info
91134

92-
In some cases, the DWARF tables are stripped from the binary. These programs usually have some other unwind-mechanism. To account for code that uses Pointer Authentication, but does not have DWARF CFIs, the passes only run on functions that had at least one negate-ra-state CFI. This is marked during CFI reading.
135+
In some cases, the DWARF tables are stripped from the binary. These programs
136+
usually have some other unwind-mechanism. To account for code that uses Pointer
137+
Authentication, but does not have DWARF CFIs, the passes only run on functions
138+
that had at least one negate-ra-state CFI. This information is saved on the
139+
functions during CFI reading.
93140

94-
This also makes sure that the passes don't run on functions that do not store the return address to the stack, and don't need Pointer Authentication, saving on runtime overhead.
141+
This also makes sure that the passes don't run on functions that do not store
142+
the return address to the stack, and don't need Pointer Authentication, saving
143+
on runtime overhead.
95144

96145
In summary:
97146
- pointer auth is not used: no change, the new passes do not run.
98-
- pointer auth is used, but DWARF info is stripped: no change, the new passes do not run.
99-
- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the negate-ra-state CFI.
147+
- pointer auth is used, but DWARF info is stripped: no change, the new passes
148+
do not run.
149+
- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
150+
negate-ra-state CFI.
100151

101152
### MarkRAStates Pass
102153

103154
This pass runs before optimizations reorder anything.
104155

105-
It processes MCAnnotations generated during the CFI reading stage to check if instructions have either of the three CFIs that can modify RA state:
156+
It processes MCAnnotations generated during the CFI reading stage to check if
157+
instructions have either of the three CFIs that can modify RA state:
106158
- negate-ra-state
107159
- remember-state
108160
- restore-state
109161

110-
Then it adds new MCAnnotations to each instruction, indicating their RA state. Those annotations are:
162+
Then it adds new MCAnnotations to each instruction, indicating their RA state.
163+
Those annotations are:
111164
- Signed
112165
- Unsigned
113166

114-
Below is a simple example, that shows the two different type of annotations: what we have before the pass, and after it.
167+
Below is a simple example, that shows the two different type of annotations:
168+
what we have before the pass, and after it.
115169

116-
| Instruction | Before | After |
117-
| --------------------------- | --------------- | -------- |
118-
| paciasp | negate-ra-state | unsigned |
170+
| Instruction | Before | After |
171+
| ----------------------------- | --------------- | -------- |
172+
| paciasp | negate-ra-state | unsigned |
119173
| stp x29, x30, [sp, #-0x10]! | | signed |
120174
| mov x29, sp | | signed |
121175
| ldp x29, x30, [sp], #0x10 | | signed |
122-
| autiasp | negate-ra-state | signed |
123-
| ret | | unsigned |
176+
| autiasp | negate-ra-state | signed |
177+
| ret | | unsigned |
124178

125179
##### Error handling in MarkRAState Pass:
126180

127-
Whenever the MarkRAStates pass finds inconsistencies in the current BinaryFunction, it ignores it by calling `BF.setIgnored()`. This prevents BOLT from optimizing that function, but it will still be emitted as part of the original section (`.bolt.org.text`) in its original form.
181+
Whenever the MarkRAStates pass finds inconsistencies in the current
182+
BinaryFunction, it ignores it by calling `BF.setIgnored()`. This prevents BOLT
183+
from optimizing that function, but it will still be emitted as part of the
184+
original section (`.bolt.org.text`) in its original form.
128185

129186
The inconsistencies are as follows:
130187
- finding a `pac*` instruction when already in signed state
131188
- finding an `aut*` instruction when already in unsigned state
132189
- finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`.
133190

134-
Users will be informed about the number of ignored function in the pass, and the exact functions ignored.
191+
Users will be informed about the number of ignored functions in the pass, the
192+
exact functions ignored, and the found inconsistency.
135193

136194
### InsertNegateRAStatePass
137195

138-
This pass runs after the optimizations are done. In essence, it does the _inverse_ of MarkRAState pass:
196+
This pass runs after the optimizations are done. In essence, it does the _inverse_
197+
of MarkRAState pass:
139198
1. it reads the RA state annotations attached to the instructions, and
140-
2. whenever the state changes, it adds a PseudoInstruction that holds an OpNegateRAState CFI.
199+
2. whenever the state changes, it adds a PseudoInstruction that holds an
200+
OpNegateRAState CFI.
141201

142202
##### Covering newly generated instructions:
143203

144-
Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have to know what RA state these have.
204+
Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have
205+
to know what RA state these have.
145206

146-
The current solution has the `inferUnknownStates` function to cover these, using a fairly simple strategy: unknown states are inherited from last known state. Testing so far has shown this implementation is sufficient, but to prove correctness, we would need to examine all passes that insert new instructions.
207+
The current solution has the `inferUnknownStates` function to cover these, using
208+
a fairly simple strategy: unknown states inherit the last known state. Testing so
209+
far has shown that this implementation is sufficient.
147210

148211
### Optimizations requiring special attention
149212

150-
Marking states before optimizations assure that instructions can be moved around freely. The only special case is function splitting. When a function is split, the split part becomes a new function in the emitted binary. For unwinding to work, it needs to "replay" all CFI that lead up to the split point. BOLT does this for other CFIs. As negate-ra-state is not read (only stored as an Annotation), we have to do this "manually" in InsertNegateRAStatePass. Here, if the split part starts with an instruction that has Signed RA state, we add a negate-ra-state CFI to indicate this.
213+
Marking states before optimizations ensure that instructions can be moved around
214+
freely. The only special case is function splitting. When a function is split,
215+
the split part becomes a new function in the emitted binary. For unwinding to
216+
work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
217+
this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
218+
we have to do this manually in InsertNegateRAStatePass. Here, if the split part
219+
starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
220+
to indicate this.
151221

152222
## Option to disallow the feature
153223

154-
To aid debugging, we added the `--disallow-pacret` flag. If the flag is used, and a function `containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error. With this flag, the feature is on by default.
224+
To aid debugging, we added the `--disallow-pacret` flag. If the flag is used,
225+
and a function `containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits
226+
with an error. With this flag, the feature is on by default.

bolt/test/AArch64/pacret-split-funcs.s

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Checking that we generate an OpNegateRAState CFI after the split point,
22
# when splitting a region with signed RA state.
3+
# We split at the fallthrough label.
34

45
# REQUIRES: system-linux
56

6-
# RUN: %clang %cflags -o %t %s
7-
# RUN: %clang %s %cflags -Wl,-q -o %t
7+
# RUN: %clang %s %cflags -march=armv8.3-a -Wl,-q -o %t
88
# RUN: link_fdata --no-lbr %s %t %t.fdata
99
# RUN: llvm-bolt %t -o %t.bolt --data %t.fdata -split-functions \
1010
# RUN: --print-only foo --print-split --print-all 2>&1 | FileCheck %s
@@ -19,14 +19,11 @@
1919
# CHECK: ------- HOT-COLD SPLIT POINT -------
2020

2121
# CHECK: OpNegateRAState
22+
# CHECK-NEXT: mov x0, #0x1
2223
# CHECK-NEXT: autiasp
2324
# CHECK-NEXT: OpNegateRAState
2425
# CHECK-NEXT: ret
2526

26-
# CHECK: autiasp
27-
# CHECK-NEXT: OpNegateRAState
28-
# CHECK-NEXT: ret
29-
3027
# End of the insert-negate-ra-state-pass logs
3128
# CHECK: Binary Function "foo" after finalize-functions
3229

@@ -41,15 +38,15 @@ foo:
4138
.cfi_negate_ra_state // indicating that paciasp changed the RA state to signed
4239
cmp x0, #0
4340
b.eq .Lcold_bb1
44-
.Lfallthrough:
41+
.Lfallthrough: // split point
42+
mov x0, #1
4543
autiasp
4644
.cfi_negate_ra_state // indicating that autiasp changed the RA state to unsigned
4745
ret
46+
.Lcold_bb1: // Instructions below are not important, they are just here so the cold block is not empty.
4847
.cfi_negate_ra_state // ret has unsigned RA state, but the next inst (autiasp) has signed RA state
49-
.Lcold_bb1: // split point
50-
autiasp
51-
.cfi_negate_ra_state // indicating that autiasp changed the RA state to unsigned
52-
ret
48+
mov x0, #2
49+
retaa
5350
.cfi_endproc
5451
.size foo, .-foo
5552

0 commit comments

Comments
 (0)