You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2024-11-25-cheri-myths-safety-critical.markdown
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ I understand some of the rationale behind these claims, but the claims themselve
11
11
The claims are usually of the form 'our code may contain memory-safety bugs but they don't impact the safety-critical parts and so it's better to leave them'.
12
12
This is a dangerous view, as I will explain below.
13
13
14
-
## Why are memory-safety errors bad?
14
+
## Why are memory-safety errors so bad?
15
15
16
16
Before we go into detail about what CHERI gives and why this may or may not be problematic, it's worth reconsidering why memory safety bugs are bad.
17
17
We often quote the Microsoft and Google numbers that 70% of vulnerabilities come from memory-safety bugs but if you dig deeper you'll see that the severity of memory-safety bugs is often disproportionately high.
@@ -30,9 +30,10 @@ if (uid = 0)
30
30
The programmer thought that they were checking the user ID was the root user, but in fact they were assigning the root user ID to the `uid` variable.
31
31
This kind of bug is often a critical vulnerability because it can directly lead to privilege elevation.
32
32
You can tell, because you can look at where the `uid` variable is used later and follow the control flow throughout the source code of the program.
33
+
For those who don't read C: `==` compares two values and evaluates to zero or one depending on whether they're equal, `=` assigns the right value to the left but evaluates to the right value, which the `if` statement then compares to zero (any non-zero value is true in C).
33
34
34
35
With a memory-safety bug, you cannot do this.
35
-
If you write outside the bounds of a buffer, or through a dangling pointer, then you have stepped outside of the language's abstract machine.
36
+
If you write outside the bounds of a buffer, or through a dangling pointer, then you have stepped outside of the language's *abstract machine* and into the world of *undefined behaviour*.
36
37
The compiler will assume that this cannot happen when optimising and so may:
37
38
38
39
- Reorder loads and stores, for example by pulling your memory-safety bug out of a loop.
@@ -93,8 +94,8 @@ CHERI lets you enforce properties even in the unsafe dialects of safe languages,
93
94
To answer the complaint about CHERI trapping, it's worth considering what happens if you have a memory-safety bug today.
94
95
You will see one of the following:
95
96
96
-
- It's deterministically benign, reads some constant or write over something that's never read.
97
-
- It's deterministically out of a mapped / accessible region will be caught by your MMU / MPU and trap (so we are not worse)
97
+
- It's deterministically 'benign', reads some constant or write over something that's never read.
98
+
- It's deterministically out of a mapped / accessible region will be caught by your MMU / MPU and trap.
98
99
- It's deterministically leaking information or corrupting other state.
99
100
- It's data dependent and so will nondeterministically do one of the above three options.
100
101
@@ -107,7 +108,7 @@ Any change to the compiler, or the global memory layout, may turn them from beni
107
108
Modern compilers support reproducible builds, but unless you're using all of the options to enable them then even recompiling the same source with the same compiler may trigger these issues.
108
109
109
110
Memory safety bugs exist outside of the language abstract machine and so may do a different non-deterministic thing.
110
-
Moving to a different SoC with a different memory layout and a different set of instructions may change these benign bugs into data corruption that affects safety-critical operation.
111
+
Moving to a different SoC with a different memory layout and a different set of instructions may change these so-called benign bugs into data corruption that affects safety-critical operation.
111
112
It is absolutely not acceptable to claim that you have a safety-critical system if that safety depends on behaviour that is not specified.
112
113
113
114
In the second case, the only change is the trap reason and the fact that the trap on a CHERI system gives you more information.
@@ -147,7 +148,7 @@ CHERIoT RTOS provides three ways for a compartment to handle CHERI failures (and
147
148
For stateless compartments, the last is often the right approach: if something goes wrong, let the caller know.
148
149
This is often the best fit approach for safety-critical systems, where things should *never* go wrong and you've got static analysis receipts to demonstrate this.
149
150
150
-
If you're relying on the notion of a 'benign' error, the first kind of error handler helps you emulate the current behaviour (though I don't recommend it if you actually care about correctness!).
151
+
If you're relying on the notion of a 'benign' error, the first kind of error handler can (among other things) be used to emulate the current behaviour (though I don't recommend it if you actually care about correctness!).
151
152
You can decode the current instruction and determine whether it's a load or store.
152
153
If it's a store, you can skip it, if it's a load you can also put a zero in the target register.
153
154
Ideally, you'd also write some telemetry that would let you fix the bug.
0 commit comments