Skip to content

Commit 3be96da

Browse files
committed
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353)
This reverts commit c7d776b.
1 parent 89e2d58 commit 3be96da

25 files changed

+1241
-27
lines changed

bolt/docs/PacRetDesign.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Optimizing binaries with pac-ret hardening
2+
3+
This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
4+
DWARF instruction in BOLT. As it describes internal design decisions, the
5+
intended audience is BOLT developers. The document is an updated version of the
6+
[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
7+
8+
9+
`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state`
10+
in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
11+
**negate-ra-state** as a shorthand.
12+
13+
## Introduction
14+
15+
### Pointer Authentication
16+
17+
For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).
18+
19+
### DW_CFA_AARCH64_negate_ra_state
20+
21+
The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
22+
the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
23+
24+
```
25+
The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
26+
```
27+
28+
This bit indicates to the unwinder whether the current return address is signed
29+
or not (hence the name). The unwinder uses this information to authenticate the
30+
pointer, and remove the Pointer Authentication Code (PAC) bits.
31+
Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt
32+
to authenticate an unsigned pointer (resulting in a segmentation fault), or skip
33+
authentication on a signed pointer, which can also cause a fault.
34+
35+
Note: some unwinders use the `xpac` instruction to strip the PAC bits without
36+
authenticating the pointer. This is an incorrect (incomplete) implementation,
37+
as it allows control-flow modification in the case of unwinding.
38+
39+
There are no DWARF instructions to directly set or clear the RA State. However,
40+
two other CFIs can also affect the RA state:
41+
- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
42+
- `DW_CFA_restore_state`: this CFI pops rules from this stack.
43+
44+
Example:
45+
46+
| CFI | Effect on RA state |
47+
| ------------------------------ | ------------------------------ |
48+
| (default) | 0 |
49+
| DW_CFA_AARCH64_negate_ra_state | 0 -> 1 |
50+
| DW_CFA_remember_state | 1 pushed to the stack |
51+
| DW_CFA_AARCH64_negate_ra_state | 1 -> 0 |
52+
| DW_CFA_restore_state | 0 -> 1 (popped from the stack) |
53+
54+
The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
55+
is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
56+
57+
### Where are these CFIs needed?
58+
59+
Whenever two consecutive instructions have different RA states, the unwinder must
60+
be informed of the change. This typically occurs during pointer signing or
61+
authentication. If adjacent instructions differ in RA state but neither signs
62+
nor authenticates the return address, they must belong to different control flow
63+
paths. One is part of an execution path with signed RA, the other is part of a
64+
path with an unsigned RA.
65+
66+
In the example below, the first BasicBlock ends in a conditional branch, and
67+
jumps to two different BasicBlocks, each with their own authentication, and
68+
return. The instructions on the border of the second and third BasicBlock have
69+
different RA states. The `ret` at the end of the second BasicBlock is in unsigned
70+
state. The start of the third BasicBlock is after the `paciasp` in the control
71+
flow, but before the authentication. In this case, a negate-ra-state is needed
72+
at the end of the second BasicBlock.
73+
74+
```
75+
+----------------+
76+
| paciasp |
77+
| |
78+
| b.cc |
79+
+--------+-------+
80+
|
81+
+----------------+
82+
| |
83+
| +--------v-------+
84+
| | |
85+
| | autiasp |
86+
| | ret | // RA: unsigned
87+
| +----------------+
88+
+----------------+
89+
|
90+
+--------v-------+ // RA: signed
91+
| |
92+
| autiasp |
93+
| ret |
94+
+----------------+
95+
```
96+
97+
> [!important]
98+
> The unwinder does not follow the control flow graph. It reads unwind
99+
> information in the layout order.
100+
101+
Because these locations are dependent on how the function layout looks,
102+
negate-ra-state CFIs will become invalid during BasicBlock reordering.
103+
104+
## Solution design
105+
106+
The implementation introduces two new passes:
107+
1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs
108+
in the input binary
109+
2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after
110+
optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
111+
places: wherever there is a state change between two consecutive instructions
112+
in the layout order.
113+
114+
To track metadata on individual instructions, the `MCAnnotation` class was
115+
extended. These also have helper functions in `MCPlusBuilder`.
116+
117+
### Saving annotations at CFI reading
118+
119+
CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
120+
At this point, we add MCAnnotations about negate-ra-state, remember-state and
121+
restore-state CFIs to the instructions they refer to. This is to not interfere
122+
with the CFI processing that already happens in BOLT (e.g. remember-state and
123+
restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).
124+
125+
As we add the MCAnnotations *to instructions*, we have to account for the case
126+
where the function starts with a CFI altering the RA state. As CFIs modify the RA
127+
state of the instructions before them, we cannot add the annotation to the first
128+
instruction.
129+
This special case is handled by adding an `initialRAState` bool to each BinaryFunction.
130+
If the `Offset` the CFI refers to is zero, we don't store an annotation, but set
131+
the `initialRAState` in `FillCFIInfoFor`. This information is then used in
132+
`MarkRAStates`.
133+
134+
### Binaries without DWARF info
135+
136+
In some cases, the DWARF tables are stripped from the binary. These programs
137+
usually have some other unwind-mechanism.
138+
These passes only run on functions that include at least one negate-ra-state CFI.
139+
This avoids processing functions that do not use Pointer Authentication, or on
140+
functions that use Pointer Authentication, but do not have DWARF info.
141+
142+
In summary:
143+
- pointer auth is not used: no change, the new passes do not run.
144+
- pointer auth is used, but DWARF info is stripped: no change, the new passes
145+
do not run.
146+
- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
147+
negate-ra-state CFI.
148+
149+
### MarkRAStates pass
150+
151+
This pass runs before optimizations reorder anything.
152+
153+
It processes MCAnnotations generated during the CFI reading stage to check if
154+
instructions have either of the three CFIs that can modify RA state:
155+
- negate-ra-state,
156+
- remember-state,
157+
- restore-state.
158+
159+
Then it adds new MCAnnotations to each instruction, indicating their RA state.
160+
Those annotations are:
161+
- Signed,
162+
- Unsigned.
163+
164+
Below is a simple example, that shows the two different type of annotations:
165+
what we have before the pass, and after it.
166+
167+
| Instruction | Before | After |
168+
| ----------------------------- | --------------- | -------- |
169+
| paciasp | negate-ra-state | unsigned |
170+
| stp x29, x30, [sp, #-0x10]! | | signed |
171+
| mov x29, sp | | signed |
172+
| ldp x29, x30, [sp], #0x10 | | signed |
173+
| autiasp | negate-ra-state | signed |
174+
| ret | | unsigned |
175+
176+
##### Error handling in MarkRAState Pass:
177+
178+
Whenever the MarkRAStates pass finds inconsistencies in the current
179+
BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT
180+
will not optimize this function but will emit it unchanged in the original section
181+
(`.bolt.org.text`).
182+
183+
The inconsistencies are as follows:
184+
- finding a `pac*` instruction when already in signed state
185+
- finding an `aut*` instruction when already in unsigned state
186+
- finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`.
187+
188+
Users will be informed about the number of ignored functions in the pass, the
189+
exact functions ignored, and the found inconsistency.
190+
191+
### InsertNegateRAStatePass
192+
193+
This pass runs after optimizations. It performns the _inverse_ of MarkRAState pa s:
194+
1. it reads the RA state annotations attached to the instructions, and
195+
2. whenever the state changes, it adds a PseudoInstruction that holds an
196+
OpNegateRAState CFI.
197+
198+
##### Covering newly generated instructions:
199+
200+
Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have
201+
to know what RA state these have.
202+
203+
The current solution has the `inferUnknownStates` function to cover these, using
204+
a fairly simple strategy: unknown states inherit the last known state.
205+
206+
This will be updated to a more robust solution.
207+
208+
> [!important]
209+
> As issue #160989 describes, unwind info is incorrect in stubs with multiple callers.
210+
> For this same reason, we cannot generate correct pac-specific unwind info: the signess
211+
> of the _incorrect_ return address is meaningless.
212+
213+
### Optimizations requiring special attention
214+
215+
Marking states before optimizations ensure that instructions can be moved around
216+
freely. The only special case is function splitting. When a function is split,
217+
the split part becomes a new function in the emitted binary. For unwinding to
218+
work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
219+
this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
220+
we have to do this manually in InsertNegateRAStatePass. Here, if the split part
221+
starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
222+
to indicate this.
223+
224+
## Option to disallow the feature
225+
226+
The feature can be guarded with the `--update-branch-prediction` flag, which is
227+
on by default. If the flag is set to false, and a function
228+
`containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error.

bolt/include/bolt/Core/BinaryFunction.h

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,11 @@ class BinaryFunction {
148148
PF_MEMEVENT = 4, /// Profile has mem events.
149149
};
150150

151+
void setContainedNegateRAState() { HadNegateRAState = true; }
152+
bool containedNegateRAState() const { return HadNegateRAState; }
153+
void setInitialRAState(bool State) { InitialRAState = State; }
154+
bool getInitialRAState() { return InitialRAState; }
155+
151156
/// Struct for tracking exception handling ranges.
152157
struct CallSite {
153158
const MCSymbol *Start;
@@ -218,6 +223,12 @@ class BinaryFunction {
218223
/// Current state of the function.
219224
State CurrentState{State::Empty};
220225

226+
/// Indicates if the Function contained .cfi-negate-ra-state. These are not
227+
/// read from the binary. This boolean is used when deciding to run the
228+
/// .cfi-negate-ra-state rewriting passes on a function or not.
229+
bool HadNegateRAState{false};
230+
bool InitialRAState{false};
231+
221232
/// A list of symbols associated with the function entry point.
222233
///
223234
/// Multiple symbols would typically result from identical code-folding
@@ -1640,6 +1651,51 @@ class BinaryFunction {
16401651

16411652
void setHasInferredProfile(bool Inferred) { HasInferredProfile = Inferred; }
16421653

1654+
/// Find corrected offset the same way addCFIInstruction does it to skip NOPs.
1655+
std::optional<uint64_t> getCorrectedCFIOffset(uint64_t Offset) {
1656+
assert(!Instructions.empty());
1657+
auto I = Instructions.lower_bound(Offset);
1658+
if (Offset == getSize()) {
1659+
assert(I == Instructions.end() && "unexpected iterator value");
1660+
// Sometimes compiler issues restore_state after all instructions
1661+
// in the function (even after nop).
1662+
--I;
1663+
Offset = I->first;
1664+
}
1665+
assert(I->first == Offset && "CFI pointing to unknown instruction");
1666+
if (I == Instructions.begin())
1667+
return {};
1668+
1669+
--I;
1670+
while (I != Instructions.begin() && BC.MIB->isNoop(I->second)) {
1671+
Offset = I->first;
1672+
--I;
1673+
}
1674+
return Offset;
1675+
}
1676+
1677+
void setInstModifiesRAState(uint8_t CFIOpcode, uint64_t Offset) {
1678+
std::optional<uint64_t> CorrectedOffset = getCorrectedCFIOffset(Offset);
1679+
if (CorrectedOffset) {
1680+
auto I = Instructions.lower_bound(*CorrectedOffset);
1681+
I--;
1682+
1683+
switch (CFIOpcode) {
1684+
case dwarf::DW_CFA_AARCH64_negate_ra_state:
1685+
BC.MIB->setNegateRAState(I->second);
1686+
break;
1687+
case dwarf::DW_CFA_remember_state:
1688+
BC.MIB->setRememberState(I->second);
1689+
break;
1690+
case dwarf::DW_CFA_restore_state:
1691+
BC.MIB->setRestoreState(I->second);
1692+
break;
1693+
default:
1694+
assert(0 && "CFI Opcode not covered by function");
1695+
}
1696+
}
1697+
}
1698+
16431699
void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
16441700
assert(!Instructions.empty());
16451701

bolt/include/bolt/Core/MCPlus.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,12 @@ class MCAnnotation {
7272
kLabel, /// MCSymbol pointing to this instruction.
7373
kSize, /// Size of the instruction.
7474
kDynamicBranch, /// Jit instruction patched at runtime.
75-
kGeneric /// First generic annotation.
75+
kRASigned, /// Inst is in a range where RA is signed.
76+
kRAUnsigned, /// Inst is in a range where RA is unsigned.
77+
kRememberState, /// Inst has rememberState CFI.
78+
kRestoreState, /// Inst has restoreState CFI.
79+
kNegateState, /// Inst has OpNegateRAState CFI.
80+
kGeneric, /// First generic annotation.
7681
};
7782

7883
virtual void print(raw_ostream &OS) const = 0;

0 commit comments

Comments
 (0)