Skip to content

Commit 1125472

Browse files
committed
docs: update klee verification pipeline
1 parent b9dca0d commit 1125472

File tree

1 file changed

+121
-19
lines changed

1 file changed

+121
-19
lines changed

docs/KLEEVerification.md

Lines changed: 121 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@
1010
4. [Tool Design: `patchir-klee-harness`](#4-tool-design-patchir-klee-harness)
1111
5. [Generated Harness Structure](#5-generated-harness-structure)
1212
6. [Predicate-to-IR Translation](#6-predicate-to-ir-translation)
13+
- [6.1 Targets](#61-targets)
14+
- [6.2 Values and Constants](#62-values-and-constants)
15+
- [6.3 Translation by Predicate Kind](#63-translation-by-predicate-kind)
16+
- [6.4 Metadata String Format](#64-metadata-string-format)
1317
7. [YAML Override and Configuration](#7-yaml-override-and-configuration)
1418
8. [Runtime Contract Support](#8-runtime-contract-support)
1519
9. [Open Questions and Next Steps](#9-open-questions-and-next-steps)
@@ -46,13 +50,18 @@ When a contract is violated, KLEE produces a **concrete counterexample input**
4650

4751
Patchestry encodes contracts at two levels:
4852

49-
**Static contracts** — encoded as LLVM metadata strings on call instructions:
53+
**Static contracts** — encoded as two-operand LLVM MDNode metadata on instructions.
54+
`patchir-cir2llvm` emits a `!static_contract` MDNode with two operands: a tag
55+
string (`"static_contract"`) and the serialized contract string:
5056

5157
```llvm
5258
!static_contract !56
53-
!56 = !{!"preconditions=[{...}], postconditions=[{...}]"}
59+
!56 = !{!"static_contract", !"preconditions=[{...}], postconditions=[{...}]"}
5460
```
5561

62+
The harness parser must extract the **second operand** of the MDNode to obtain
63+
the contract string (see `tools/patchir-cir2llvm/main.cpp` lines 432-440).
64+
5665
**Runtime contracts** — compiled C functions merged into the IR module:
5766

5867
```llvm
@@ -67,7 +76,7 @@ Both contract types can be present in the patched `.ll` output and must be handl
6776

6877
```text
6978
Firmware Binary
70-
→ [Ghidra 12.0.1] → P-Code JSON
79+
→ [Ghidra] → P-Code JSON
7180
→ [patchir-decomp] → .cir
7281
→ [patchir-transform -spec] → patched.cir
7382
→ [patchir-cir2llvm -S] → patched.ll (with !static_contract metadata)
@@ -98,9 +107,27 @@ patchir-klee-harness \
98107
# when omitted
99108
```
100109

101-
### 4.2 Contract Source
110+
### 4.2 Shared Predicate Infrastructure
111+
112+
The SeaHorn integration (`analysis/seahorn/`) and this KLEE harness tool both
113+
need to parse `!static_contract` metadata and translate predicates into
114+
verification-tool-specific IR. To avoid duplicating this logic, the following
115+
should be extracted into a shared library (e.g. `patchestry_contract_ir`):
116+
117+
- **Metadata parsing**: reading `!static_contract` MDNodes and deserializing
118+
them into in-memory `Predicate` structures.
119+
- **Predicate-to-IR translation**: converting `PredicateKind` values (range,
120+
non-null, alignment, expr, etc.) into LLVM IR `icmp`/`and`/`or` sequences.
121+
- **Function-level contract collection**: walking a module to associate
122+
contracts with their enclosing functions.
123+
124+
Each verification backend then only needs a thin driver layer: SeaHorn emits
125+
`__VERIFIER_assert`/`__VERIFIER_assume`, while KLEE emits
126+
`klee_assume`/`abort()`.
127+
128+
### 4.3 Contract Source
102129

103-
Contracts are always read from the `!static_contract` metadata strings embedded on call instructions by `patchir-cir2llvm`, using a local `parseStaticContractString()` helper that attributes each contract to its enclosing LLVM function.
130+
Contracts are always read from the `!static_contract` metadata strings embedded on call instructions by `patchir-cir2llvm`, using the shared metadata parsing helpers described above.
104131

105132
If `-spec <yaml>` is provided, the `contracts:` section of the spec is also parsed (reusing `ContractSpec` and `Predicate` types from `include/patchestry/YAML/ContractSpec.hpp`) and merged with the metadata-derived contracts. Spec entries take precedence over metadata for the same function, allowing per-deployment overrides without re-running the full transform pipeline.
106133

@@ -127,10 +154,19 @@ For a patched function `int f(int arg0, char* arg1)` with:
127154
- **Precondition:** `arg0 >= 0`
128155
- **Postcondition:** `return_value in [0, 32]`
129156

157+
**Note on pointer width:** The harness must derive integer widths from the
158+
module's `DataLayout` rather than hardcoding `i64`. For 32-bit firmware targets
159+
(e.g. `ARM:LE:32:Cortex`), the size parameter to `klee_make_symbolic` and the
160+
argument to `klee_assume` should use the target's native pointer-sized integer
161+
(e.g. `i32`). The examples below use `i64` for readability; the implementation
162+
must use `DataLayout::getIntPtrType()` to select the correct width.
163+
130164
The tool generates the following LLVM IR appended to the patched module:
131165

132166
```llvm
133167
; ─── KLEE runtime declarations (added once per module) ───────────────────────
168+
; NOTE: size_t and assume argument widths are derived from DataLayout;
169+
; i64 is shown here for a 64-bit target. For 32-bit targets these become i32.
134170
declare void @klee_make_symbolic(ptr, i64, ptr)
135171
declare void @klee_assume(i64)
136172
declare void @abort() noreturn
@@ -184,27 +220,93 @@ done:
184220

185221
## 6. Predicate-to-IR Translation
186222

187-
The following table defines how each predicate kind in the contract spec is translated into harness IR:
223+
This section defines how each predicate kind from `Contract.td` is lowered to KLEE harness IR. Preconditions emit `klee_assume` (constraining symbolic inputs); postconditions emit `if (!cond) abort()` (KLEE reports the violation).
224+
225+
### 6.1 Targets
226+
227+
Targets identify what a predicate applies to. The YAML spec and serialized metadata use different spellings; the parser normalizes both:
228+
229+
| YAML | Serialized (in `!static_contract`) | Operand in IR |
230+
|---|---|---|
231+
| `arg0` .. `argN` | `Arg(0)` .. `Arg(N)` | Loaded value of the Nth symbolic argument |
232+
| `return_value` | `ReturnValue` | `%result` from `call @f(...)` |
233+
| `symbol` + `symbol: "name"` | `Symbol(@name)` | `load @name` from the module |
234+
235+
### 6.2 Values and Constants
236+
237+
String values in `relation.value` and `range.min`/`max` resolve as follows:
238+
239+
- Numeric literals (`"0"`, `"-1"`, `"0x1000"`) — parsed to `ConstantInt` matching the target type.
240+
- `"NULL"` / `"null"``ConstantPointerNull` for pointers; zero for integers.
241+
- Named constants (`"USB_MAX_PACKET_SIZE"`) — looked up as a module global; diagnostic + skip if absent. C `#define` values are not available as LLVM symbols and must be pre-resolved in the YAML spec.
242+
243+
### 6.3 Translation by Predicate Kind
244+
245+
Each subsection shows the IR pattern per target. `V` denotes the loaded target value; `PRE` = `klee_assume(cond)`, `POST` = `br i1 cond, %done, %fail` where `%fail` calls `abort()`.
246+
247+
#### `nonnull`
248+
249+
| Target | PRE | POST |
250+
|---|---|---|
251+
| `Arg(N)` ptr | No-op (harness allocates stack buffer; always non-null) | `icmp eq V, null` → abort (pointer-to-pointer output args) |
252+
| `ReturnValue` | N/A | `icmp eq %result, null` → abort |
253+
| `Symbol(@s)` | `klee_assume(load @s != null)` | `icmp eq load @s, null` → abort |
254+
255+
#### `relation`
256+
257+
Maps `relation` field to `icmp` (signed by default; unsigned for pointer-width / unsigned types):
258+
259+
| Relation | `icmp` | | Relation | `icmp` |
260+
|---|---|---|---|---|
261+
| `eq` | `eq` | | `gt` | `sgt` / `ugt` |
262+
| `neq` | `ne` | | `gte` | `sge` / `uge` |
263+
| `lt` | `slt` / `ult` | | `none` | no IR (existence check) |
264+
| `lte` | `sle` / `ule` | | | |
265+
266+
For `Arg(N)`: `PRE = klee_assume(icmp <rel> V, const)`. For `ReturnValue` / `Symbol(@s)`: `POST = if (!(icmp <rel> V, const)) abort()`. `relation: none` emits no IR — it is an existence assertion only.
267+
268+
#### `range`
269+
270+
Emits two `icmp` instructions ANDed together:
271+
272+
```
273+
%lo = icmp sge V, resolved_min
274+
%hi = icmp sle V, resolved_max
275+
%ok = and i1 %lo, %hi
276+
```
277+
278+
`Arg(N)` → PRE (`klee_assume`). `ReturnValue` / `Symbol(@s)` → POST (`abort` on `!%ok`).
279+
280+
#### `alignment`
281+
282+
Checks `(ptr & (align-1)) == 0` using `ptrtoint` + `and` + `icmp eq`:
283+
284+
| Target | PRE | POST |
285+
|---|---|---|
286+
| `Arg(N)` ptr | `klee_assume(...)` | N/A |
287+
| `Symbol(@s)` | `klee_assume(...)` | N/A |
288+
| `ReturnValue` | N/A | N/A (not meaningful) |
289+
290+
#### `expr`
291+
292+
Free-form C-like expression string (e.g., `"size > 0 && size < MAX_LEN"`). Translation depends on complexity:
293+
294+
- **Simple** (`"arg0 != 0"`) — lowered as a `relation` predicate.
295+
- **Compound** (`"a > 0 && a < 100"`) — parsed into `icmp` + `and`/`or` tree.
296+
- **Member access** (`"dev->state == CONFIGURED"`) — emits `getelementptr` + `load`; requires struct type metadata.
297+
- **Unresolvable** — emits a warning comment in output IR; predicate is skipped.
188298

189-
| Predicate Kind | Target | Precondition IR | Postcondition IR |
190-
|---|---|---|---|
191-
| `nonnull` | `Arg(N)` pointer | No assume needed (stack buffer is always non-null) | N/A |
192-
| `nonnull` | `ReturnValue` | N/A | `if (result == null) abort()` |
193-
| `relation` | `Arg(N)` | `klee_assume(icmp <rel> arg_N, value)` | N/A |
194-
| `relation` | `ReturnValue` | N/A | `if (!(icmp <rel> result, value)) abort()` |
195-
| `range` | `Arg(N)` | `klee_assume(arg_N >= min && arg_N <= max)` | N/A |
196-
| `range` | `ReturnValue` | N/A | `if (!(result >= min && result <= max)) abort()` |
197-
| `alignment` | `Arg(N)` pointer | `klee_assume(((uintptr_t)ptr & (align-1)) == 0)` | N/A |
299+
PRE: `klee_assume(compiled_result)`. POST: `if (!compiled_result) abort()`.
198300

199-
### 6.1 Metadata String Format
301+
### 6.4 Metadata String Format
200302

201-
`patchir-cir2llvm`'s `serializeStaticContract()` produces the following string format (from `tools/patchir-cir2llvm/main.cpp` lines 136–231):
303+
`serializeStaticContract()` (`tools/patchir-cir2llvm/main.cpp:136–231`) produces:
202304

203305
```text
204-
"preconditions=[{id=\"dest_size_nonzero\", kind=relation, target=Arg(1), relation=neq, value=0}], postconditions=[{id=return_value_range, kind=range, target=ReturnValue, range=[min=0, max=32]}]"
306+
"preconditions=[{id=\"...\", kind=relation, target=Arg(1), relation=neq, value=0}], postconditions=[{id=..., kind=range, target=ReturnValue, range=[min=0, max=32]}]"
205307
```
206308

207-
`parseStaticContractString()` in `MetadataParser.hpp` parses this with a hand-written lexer: splits on `preconditions=[` / `postconditions=[`, extracts `{...}` blocks, then parses `key=value` pairs within each block. Results map to `ParsedPredicate` structs that mirror the existing `::contracts::PredicateAttr` field names.
309+
Fields per `{...}` block: `id` (always), `kind` (always), `target` (all except `expr`), plus kind-specific fields — `relation`+`value`, `range=[min=..., max=...]`, `align`, `expr`, or `symbol` (when target is `Symbol`). `parseStaticContractString()` in `MetadataParser.hpp` lexes this into `ParsedPredicate` structs.
208310

209311
---
210312

0 commit comments

Comments
 (0)