Skip to content

Commit 889bfd9

Browse files
authored
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353) (#162435)
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)" (#162353) This reverts commit c7d776b. #120064 was reverted for breaking builders. Fix: changed the mismatched type in MarkRAStates.cpp to `auto`. --- Original message: OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records whether the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.
1 parent 4967bc1 commit 889bfd9

25 files changed

+1239
-27
lines changed

bolt/docs/PacRetDesign.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Optimizing binaries with pac-ret hardening
2+
3+
This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
4+
DWARF instruction in BOLT. As it describes internal design decisions, the
5+
intended audience is BOLT developers. The document is an updated version of the
6+
[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
7+
8+
9+
`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state`
10+
in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
11+
**negate-ra-state** as a shorthand.
12+
13+
## Introduction
14+
15+
### Pointer Authentication
16+
17+
For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).
18+
19+
### DW_CFA_AARCH64_negate_ra_state
20+
21+
The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
22+
the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
23+
24+
```
25+
The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
26+
```
27+
28+
This bit indicates to the unwinder whether the current return address is signed
29+
or not (hence the name). The unwinder uses this information to authenticate the
30+
pointer, and remove the Pointer Authentication Code (PAC) bits.
31+
Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt
32+
to authenticate an unsigned pointer (resulting in a segmentation fault), or skip
33+
authentication on a signed pointer, which can also cause a fault.
34+
35+
Note: some unwinders use the `xpac` instruction to strip the PAC bits without
36+
authenticating the pointer. This is an incorrect (incomplete) implementation,
37+
as it allows control-flow modification in the case of unwinding.
38+
39+
There are no DWARF instructions to directly set or clear the RA State. However,
40+
two other CFIs can also affect the RA state:
41+
- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
42+
- `DW_CFA_restore_state`: this CFI pops rules from this stack.
43+
44+
Example:
45+
46+
| CFI | Effect on RA state |
47+
| ------------------------------ | ------------------------------ |
48+
| (default) | 0 |
49+
| DW_CFA_AARCH64_negate_ra_state | 0 -> 1 |
50+
| DW_CFA_remember_state | 1 pushed to the stack |
51+
| DW_CFA_AARCH64_negate_ra_state | 1 -> 0 |
52+
| DW_CFA_restore_state | 0 -> 1 (popped from the stack) |
53+
54+
The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
55+
is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
56+
57+
### Where are these CFIs needed?
58+
59+
Whenever two consecutive instructions have different RA states, the unwinder must
60+
be informed of the change. This typically occurs during pointer signing or
61+
authentication. If adjacent instructions differ in RA state but neither signs
62+
nor authenticates the return address, they must belong to different control flow
63+
paths. One is part of an execution path with signed RA, the other is part of a
64+
path with an unsigned RA.
65+
66+
In the example below, the first BasicBlock ends in a conditional branch, and
67+
jumps to two different BasicBlocks, each with their own authentication, and
68+
return. The instructions on the border of the second and third BasicBlock have
69+
different RA states. The `ret` at the end of the second BasicBlock is in unsigned
70+
state. The start of the third BasicBlock is after the `paciasp` in the control
71+
flow, but before the authentication. In this case, a negate-ra-state is needed
72+
at the end of the second BasicBlock.
73+
74+
```
75+
+----------------+
76+
| paciasp |
77+
| |
78+
| b.cc |
79+
+--------+-------+
80+
|
81+
+----------------+
82+
| |
83+
| +--------v-------+
84+
| | |
85+
| | autiasp |
86+
| | ret | // RA: unsigned
87+
| +----------------+
88+
+----------------+
89+
|
90+
+--------v-------+ // RA: signed
91+
| |
92+
| autiasp |
93+
| ret |
94+
+----------------+
95+
```
96+
97+
> [!important]
98+
> The unwinder does not follow the control flow graph. It reads unwind
99+
> information in the layout order.
100+
101+
Because these locations are dependent on how the function layout looks,
102+
negate-ra-state CFIs will become invalid during BasicBlock reordering.
103+
104+
## Solution design
105+
106+
The implementation introduces two new passes:
107+
1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs
108+
in the input binary
109+
2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after
110+
optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
111+
places: wherever there is a state change between two consecutive instructions
112+
in the layout order.
113+
114+
To track metadata on individual instructions, the `MCAnnotation` class was
115+
extended. These also have helper functions in `MCPlusBuilder`.
116+
117+
### Saving annotations at CFI reading
118+
119+
CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
120+
At this point, we add MCAnnotations about negate-ra-state, remember-state and
121+
restore-state CFIs to the instructions they refer to. This is to not interfere
122+
with the CFI processing that already happens in BOLT (e.g. remember-state and
123+
restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).
124+
125+
As we add the MCAnnotations *to instructions*, we have to account for the case
126+
where the function starts with a CFI altering the RA state. As CFIs modify the RA
127+
state of the instructions before them, we cannot add the annotation to the first
128+
instruction.
129+
This special case is handled by adding an `initialRAState` bool to each BinaryFunction.
130+
If the `Offset` the CFI refers to is zero, we don't store an annotation, but set
131+
the `initialRAState` in `FillCFIInfoFor`. This information is then used in
132+
`MarkRAStates`.
133+
134+
### Binaries without DWARF info
135+
136+
In some cases, the DWARF tables are stripped from the binary. These programs
137+
usually have some other unwind-mechanism.
138+
These passes only run on functions that include at least one negate-ra-state CFI.
139+
This avoids processing functions that do not use Pointer Authentication, or on
140+
functions that use Pointer Authentication, but do not have DWARF info.
141+
142+
In summary:
143+
- pointer auth is not used: no change, the new passes do not run.
144+
- pointer auth is used, but DWARF info is stripped: no change, the new passes
145+
do not run.
146+
- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
147+
negate-ra-state CFI.
148+
149+
### MarkRAStates pass
150+
151+
This pass runs before optimizations reorder anything.
152+
153+
It processes MCAnnotations generated during the CFI reading stage to check if
154+
instructions have either of the three CFIs that can modify RA state:
155+
- negate-ra-state,
156+
- remember-state,
157+
- restore-state.
158+
159+
Then it adds new MCAnnotations to each instruction, indicating their RA state.
160+
Those annotations are:
161+
- Signed,
162+
- Unsigned.
163+
164+
Below is a simple example, that shows the two different type of annotations:
165+
what we have before the pass, and after it.
166+
167+
| Instruction | Before | After |
168+
| ----------------------------- | --------------- | -------- |
169+
| paciasp | negate-ra-state | unsigned |
170+
| stp x29, x30, [sp, #-0x10]! | | signed |
171+
| mov x29, sp | | signed |
172+
| ldp x29, x30, [sp], #0x10 | | signed |
173+
| autiasp | negate-ra-state | signed |
174+
| ret | | unsigned |
175+
176+
##### Error handling in MarkRAState Pass:
177+
178+
Whenever the MarkRAStates pass finds inconsistencies in the current
179+
BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT
180+
will not optimize this function but will emit it unchanged in the original section
181+
(`.bolt.org.text`).
182+
183+
The inconsistencies are as follows:
184+
- finding a `pac*` instruction when already in signed state
185+
- finding an `aut*` instruction when already in unsigned state
186+
- finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`.
187+
188+
Users will be informed about the number of ignored functions in the pass, the
189+
exact functions ignored, and the found inconsistency.
190+
191+
### InsertNegateRAStatePass
192+
193+
This pass runs after optimizations. It performns the _inverse_ of MarkRAState pa s:
194+
1. it reads the RA state annotations attached to the instructions, and
195+
2. whenever the state changes, it adds a PseudoInstruction that holds an
196+
OpNegateRAState CFI.
197+
198+
##### Covering newly generated instructions:
199+
200+
Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have
201+
to know what RA state these have.
202+
203+
The current solution has the `inferUnknownStates` function to cover these, using
204+
a fairly simple strategy: unknown states inherit the last known state.
205+
206+
This will be updated to a more robust solution.
207+
208+
> [!important]
209+
> As issue #160989 describes, unwind info is incorrect in stubs with multiple callers.
210+
> For this same reason, we cannot generate correct pac-specific unwind info: the signess
211+
> of the _incorrect_ return address is meaningless.
212+
213+
### Optimizations requiring special attention
214+
215+
Marking states before optimizations ensure that instructions can be moved around
216+
freely. The only special case is function splitting. When a function is split,
217+
the split part becomes a new function in the emitted binary. For unwinding to
218+
work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
219+
this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
220+
we have to do this manually in InsertNegateRAStatePass. Here, if the split part
221+
starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
222+
to indicate this.
223+
224+
## Option to disallow the feature
225+
226+
The feature can be guarded with the `--update-branch-prediction` flag, which is
227+
on by default. If the flag is set to false, and a function
228+
`containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error.

bolt/include/bolt/Core/BinaryFunction.h

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,11 @@ class BinaryFunction {
148148
PF_MEMEVENT = 4, /// Profile has mem events.
149149
};
150150

151+
void setContainedNegateRAState() { HadNegateRAState = true; }
152+
bool containedNegateRAState() const { return HadNegateRAState; }
153+
void setInitialRAState(bool State) { InitialRAState = State; }
154+
bool getInitialRAState() { return InitialRAState; }
155+
151156
/// Struct for tracking exception handling ranges.
152157
struct CallSite {
153158
const MCSymbol *Start;
@@ -218,6 +223,12 @@ class BinaryFunction {
218223
/// Current state of the function.
219224
State CurrentState{State::Empty};
220225

226+
/// Indicates if the Function contained .cfi-negate-ra-state. These are not
227+
/// read from the binary. This boolean is used when deciding to run the
228+
/// .cfi-negate-ra-state rewriting passes on a function or not.
229+
bool HadNegateRAState{false};
230+
bool InitialRAState{false};
231+
221232
/// A list of symbols associated with the function entry point.
222233
///
223234
/// Multiple symbols would typically result from identical code-folding
@@ -1640,6 +1651,51 @@ class BinaryFunction {
16401651

16411652
void setHasInferredProfile(bool Inferred) { HasInferredProfile = Inferred; }
16421653

1654+
/// Find corrected offset the same way addCFIInstruction does it to skip NOPs.
1655+
std::optional<uint64_t> getCorrectedCFIOffset(uint64_t Offset) {
1656+
assert(!Instructions.empty());
1657+
auto I = Instructions.lower_bound(Offset);
1658+
if (Offset == getSize()) {
1659+
assert(I == Instructions.end() && "unexpected iterator value");
1660+
// Sometimes compiler issues restore_state after all instructions
1661+
// in the function (even after nop).
1662+
--I;
1663+
Offset = I->first;
1664+
}
1665+
assert(I->first == Offset && "CFI pointing to unknown instruction");
1666+
if (I == Instructions.begin())
1667+
return {};
1668+
1669+
--I;
1670+
while (I != Instructions.begin() && BC.MIB->isNoop(I->second)) {
1671+
Offset = I->first;
1672+
--I;
1673+
}
1674+
return Offset;
1675+
}
1676+
1677+
void setInstModifiesRAState(uint8_t CFIOpcode, uint64_t Offset) {
1678+
std::optional<uint64_t> CorrectedOffset = getCorrectedCFIOffset(Offset);
1679+
if (CorrectedOffset) {
1680+
auto I = Instructions.lower_bound(*CorrectedOffset);
1681+
I--;
1682+
1683+
switch (CFIOpcode) {
1684+
case dwarf::DW_CFA_AARCH64_negate_ra_state:
1685+
BC.MIB->setNegateRAState(I->second);
1686+
break;
1687+
case dwarf::DW_CFA_remember_state:
1688+
BC.MIB->setRememberState(I->second);
1689+
break;
1690+
case dwarf::DW_CFA_restore_state:
1691+
BC.MIB->setRestoreState(I->second);
1692+
break;
1693+
default:
1694+
assert(0 && "CFI Opcode not covered by function");
1695+
}
1696+
}
1697+
}
1698+
16431699
void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
16441700
assert(!Instructions.empty());
16451701

bolt/include/bolt/Core/MCPlus.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,12 @@ class MCAnnotation {
7272
kLabel, /// MCSymbol pointing to this instruction.
7373
kSize, /// Size of the instruction.
7474
kDynamicBranch, /// Jit instruction patched at runtime.
75-
kGeneric /// First generic annotation.
75+
kRASigned, /// Inst is in a range where RA is signed.
76+
kRAUnsigned, /// Inst is in a range where RA is unsigned.
77+
kRememberState, /// Inst has rememberState CFI.
78+
kRestoreState, /// Inst has restoreState CFI.
79+
kNegateState, /// Inst has OpNegateRAState CFI.
80+
kGeneric, /// First generic annotation.
7681
};
7782

7883
virtual void print(raw_ostream &OS) const = 0;

0 commit comments

Comments
 (0)