Skip to content

Commit b83df56

Browse files
committed
docs: update klee verification pipeline
1 parent fa45f3b commit b83df56

File tree

1 file changed

+122
-58
lines changed

1 file changed

+122
-58
lines changed

docs/KLEEVerification.md

Lines changed: 122 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,6 @@
22

33
---
44

5-
## Table of Contents
6-
7-
1. [Executive Summary](#1-executive-summary)
8-
2. [Background and Motivation](#2-background-and-motivation)
9-
3. [Existing Pipeline and Gap Analysis](#3-existing-pipeline-and-gap-analysis)
10-
4. [Tool Design: `patchir-klee-harness`](#4-tool-design-patchir-klee-harness)
11-
5. [Generated Harness Structure](#5-generated-harness-structure)
12-
6. [Predicate-to-IR Translation](#6-predicate-to-ir-translation)
13-
7. [YAML Override and Configuration](#7-yaml-override-and-configuration)
14-
8. [Runtime Contract Support](#8-runtime-contract-support)
15-
9. [Open Questions and Next Steps](#9-open-questions-and-next-steps)
16-
17-
---
18-
195
## 1. Executive Summary
206

217
Patchestry's `patchir-transform` + `patchir-cir2llvm` pipeline embeds formal contracts (preconditions, postconditions, and invariants) into patched LLVM IR as `!static_contract` metadata strings. A SeaHorn integration exists in `analysis/seahorn/` using `__VERIFIER_assert` / `__VERIFIER_assume`. What is currently **missing** is a standalone tool that bridges these contracts to KLEE-based symbolic execution.
@@ -46,13 +32,18 @@ When a contract is violated, KLEE produces a **concrete counterexample input**
4632

4733
Patchestry encodes contracts at two levels:
4834

49-
**Static contracts** — encoded as LLVM metadata strings on call instructions:
35+
**Static contracts** — encoded as two-operand LLVM MDNode metadata on instructions.
36+
`patchir-cir2llvm` emits a `!static_contract` MDNode with two operands: a tag
37+
string (`"static_contract"`) and the serialized contract string:
5038

5139
```llvm
5240
!static_contract !56
53-
!56 = !{!"preconditions=[{...}], postconditions=[{...}]"}
41+
!56 = !{!"static_contract", !"preconditions=[{...}], postconditions=[{...}]"}
5442
```
5543

44+
The harness parser must extract the **second operand** of the MDNode to obtain
45+
the contract string (see `tools/patchir-cir2llvm/main.cpp` lines 432-440).
46+
5647
**Runtime contracts** — compiled C functions merged into the IR module:
5748

5849
```llvm
@@ -67,13 +58,13 @@ Both contract types can be present in the patched `.ll` output and must be handl
6758

6859
```text
6960
Firmware Binary
70-
→ [Ghidra 12.0.1] → P-Code JSON
61+
→ [Ghidra] → P-Code JSON
7162
→ [patchir-decomp] → .cir
7263
→ [patchir-transform -spec] → patched.cir
7364
→ [patchir-cir2llvm -S] → patched.ll (with !static_contract metadata)
7465
```
7566

76-
No tool currently exists to automatically generate KLEE harnesses from patched IR. Manual harness creation is error-prone, does not scale, and requires deep knowledge of both the contract format and KLEE's API. The proposed verification tool will close this gap.
67+
The planned **`patchir-klee-harness`** tool will read the `!static_contract` metadata from patched IR, generate a symbolic driver (`@klee_main_<func>()`) that makes all function arguments symbolic, injects `klee_assume` for preconditions, and asserts postconditions and invariants on every return path, producing a self-contained harness ready for KLEE execution.
7768

7869
```text
7970
patched.ll
@@ -98,11 +89,29 @@ patchir-klee-harness \
9889
# when omitted
9990
```
10091

101-
### 4.2 Contract Source
92+
### 4.2 Shared Predicate Infrastructure
93+
94+
The SeaHorn integration and KLEE harness tool both need to parse
95+
`!static_contract` metadata and translate predicates into
96+
verification-tool-specific IR. To avoid duplicating this logic, the following
97+
should be extracted into a shared library (e.g. `patchestry_contract_ir`):
98+
99+
- **Metadata parsing**: reading `!static_contract` MDNodes and deserializing
100+
them into in-memory `Predicate` structures.
101+
- **Predicate-to-IR translation**: converting `PredicateKind` values (range,
102+
non-null, alignment, expr, etc.) into LLVM IR `icmp`/`and`/`or` sequences if needed.
103+
- **Function-level contract collection**: walking a module to associate
104+
contracts with their enclosing functions.
102105

103-
Contracts are always read from the `!static_contract` metadata strings embedded on call instructions by `patchir-cir2llvm`, using a local `parseStaticContractString()` helper that attributes each contract to its enclosing LLVM function.
106+
Each verification backend then only needs a thin driver layer: SeaHorn emits
107+
`__VERIFIER_assert`/`__VERIFIER_assume`, while KLEE emits
108+
`klee_assume`/`abort()`.
104109

105-
If `-spec <yaml>` is provided, the `contracts:` section of the spec is also parsed (reusing `ContractSpec` and `Predicate` types from `include/patchestry/YAML/ContractSpec.hpp`) and merged with the metadata-derived contracts. Spec entries take precedence over metadata for the same function, allowing per-deployment overrides without re-running the full transform pipeline.
110+
### 4.3 Contract Source
111+
112+
Contracts are always read from the `!static_contract` metadata strings embedded on call instructions by `patchir-cir2llvm`, using the shared metadata parsing helpers described above.
113+
114+
If `-spec <yaml>` is provided, the `contracts:` section of the spec is also parsed and merged with the metadata-derived contracts. Spec entries take precedence over metadata for the same function, allowing per-deployment overrides without re-running the full transform pipeline.
106115

107116
### 4.3 Function Discovery
108117

@@ -127,10 +136,19 @@ For a patched function `int f(int arg0, char* arg1)` with:
127136
- **Precondition:** `arg0 >= 0`
128137
- **Postcondition:** `return_value in [0, 32]`
129138

139+
**Note on pointer width:** The harness must derive integer widths from the
140+
module's `DataLayout` rather than hardcoding `i64`. For 32-bit firmware targets
141+
(e.g. `ARM:LE:32:Cortex`), the size parameter to `klee_make_symbolic` and the
142+
argument to `klee_assume` should use the target's native pointer-sized integer
143+
(e.g. `i32`). The examples below use `i64` for readability; the implementation
144+
must use `DataLayout::getIntPtrType()` to select the correct width.
145+
130146
The tool generates the following LLVM IR appended to the patched module:
131147

132148
```llvm
133149
; ─── KLEE runtime declarations (added once per module) ───────────────────────
150+
; NOTE: size_t and assume argument widths are derived from DataLayout;
151+
; i64 is shown here for a 64-bit target. For 32-bit targets these become i32.
134152
declare void @klee_make_symbolic(ptr, i64, ptr)
135153
declare void @klee_assume(i64)
136154
declare void @abort() noreturn
@@ -184,31 +202,97 @@ done:
184202

185203
## 6. Predicate-to-IR Translation
186204

187-
The following table defines how each predicate kind in the contract spec is translated into harness IR:
205+
This section defines how each predicate kind from `Contract.td` is lowered to KLEE harness IR. Preconditions emit `klee_assume` (constraining symbolic inputs); postconditions emit `if (!cond) abort()` (KLEE reports the violation).
206+
207+
### 6.1 Targets
208+
209+
Targets identify what a predicate applies to. The YAML spec and serialized metadata use different spellings; the parser normalizes both:
210+
211+
| YAML | Serialized (in `!static_contract`) | Operand in IR |
212+
|---|---|---|
213+
| `arg0` .. `argN` | `Arg(0)` .. `Arg(N)` | Loaded value of the Nth symbolic argument |
214+
| `return_value` | `ReturnValue` | `%result` from `call @f(...)` |
215+
| `symbol` + `symbol: "name"` | `Symbol(@name)` | `load @name` from the module |
216+
217+
### 6.2 Values and Constants
218+
219+
String values in `relation.value` and `range.min`/`max` resolve as follows:
220+
221+
- Numeric literals (`"0"`, `"-1"`, `"0x1000"`) — parsed to `ConstantInt` matching the target type.
222+
- `"NULL"` / `"null"``ConstantPointerNull` for pointers; zero for integers.
223+
- Named constants (`"USB_MAX_PACKET_SIZE"`) — looked up as a module global; diagnostic + skip if absent. C `#define` values are not available as LLVM symbols and must be pre-resolved in the YAML spec.
224+
225+
### 6.3 Translation by Predicate Kind
226+
227+
Each subsection shows the IR pattern per target. `V` denotes the loaded target value; `PRE` = `klee_assume(cond)`, `POST` = `br i1 cond, %done, %fail` where `%fail` calls `abort()`.
228+
229+
#### `nonnull`
230+
231+
| Target | PRE | POST |
232+
|---|---|---|
233+
| `Arg(N)` ptr | No-op (harness allocates stack buffer; always non-null) | `icmp eq V, null` → abort (pointer-to-pointer output args) |
234+
| `ReturnValue` | N/A | `icmp eq %result, null` → abort |
235+
| `Symbol(@s)` | `klee_assume(load @s != null)` | `icmp eq load @s, null` → abort |
236+
237+
#### `relation`
238+
239+
Maps `relation` field to `icmp` (signed by default; unsigned for pointer-width / unsigned types):
188240

189-
| Predicate Kind | Target | Precondition IR | Postcondition IR |
190-
|---|---|---|---|
191-
| `nonnull` | `Arg(N)` pointer | No assume needed (stack buffer is always non-null) | N/A |
192-
| `nonnull` | `ReturnValue` | N/A | `if (result == null) abort()` |
193-
| `relation` | `Arg(N)` | `klee_assume(icmp <rel> arg_N, value)` | N/A |
194-
| `relation` | `ReturnValue` | N/A | `if (!(icmp <rel> result, value)) abort()` |
195-
| `range` | `Arg(N)` | `klee_assume(arg_N >= min && arg_N <= max)` | N/A |
196-
| `range` | `ReturnValue` | N/A | `if (!(result >= min && result <= max)) abort()` |
197-
| `alignment` | `Arg(N)` pointer | `klee_assume(((uintptr_t)ptr & (align-1)) == 0)` | N/A |
241+
| Relation | `icmp` | | Relation | `icmp` |
242+
|---|---|---|---|---|
243+
| `eq` | `eq` | | `gt` | `sgt` / `ugt` |
244+
| `neq` | `ne` | | `gte` | `sge` / `uge` |
245+
| `lt` | `slt` / `ult` | | `none` | no IR (existence check) |
246+
| `lte` | `sle` / `ule` | | | |
198247

199-
### 6.1 Metadata String Format
248+
For `Arg(N)`: `PRE = klee_assume(icmp <rel> V, const)`. For `ReturnValue` / `Symbol(@s)`: `POST = if (!(icmp <rel> V, const)) abort()`. `relation: none` emits no IR — it is an existence assertion only.
200249

201-
`patchir-cir2llvm`'s `serializeStaticContract()` produces the following string format (from `tools/patchir-cir2llvm/main.cpp` lines 136–231):
250+
#### `range`
251+
252+
Emits two `icmp` instructions ANDed together:
253+
254+
```
255+
%lo = icmp sge V, resolved_min
256+
%hi = icmp sle V, resolved_max
257+
%ok = and i1 %lo, %hi
258+
```
259+
260+
`Arg(N)` → PRE (`klee_assume`). `ReturnValue` / `Symbol(@s)` → POST (`abort` on `!%ok`).
261+
262+
#### `alignment`
263+
264+
Checks `(ptr & (align-1)) == 0` using `ptrtoint` + `and` + `icmp eq`:
265+
266+
| Target | PRE | POST |
267+
|---|---|---|
268+
| `Arg(N)` ptr | `klee_assume(...)` | N/A |
269+
| `Symbol(@s)` | `klee_assume(...)` | N/A |
270+
| `ReturnValue` | N/A | N/A (not meaningful) |
271+
272+
#### `expr`
273+
274+
Free-form C-like expression string (e.g., `"size > 0 && size < MAX_LEN"`). Translation depends on complexity:
275+
276+
- **Simple** (`"arg0 != 0"`) — lowered as a `relation` predicate.
277+
- **Compound** (`"a > 0 && a < 100"`) — parsed into `icmp` + `and`/`or` tree.
278+
- **Member access** (`"dev->state == CONFIGURED"`) — emits `getelementptr` + `load`; requires struct type metadata.
279+
- **Unresolvable** — emits a warning comment in output IR; predicate is skipped.
280+
281+
PRE: `klee_assume(compiled_result)`. POST: `if (!compiled_result) abort()`.
282+
283+
### 6.4 Metadata String Format
284+
285+
`serializeStaticContract()` (`tools/patchir-cir2llvm/main.cpp:136–231`) produces:
202286

203287
```text
204-
"preconditions=[{id=\"dest_size_nonzero\", kind=relation, target=Arg(1), relation=neq, value=0}], postconditions=[{id=return_value_range, kind=range, target=ReturnValue, range=[min=0, max=32]}]"
288+
"preconditions=[{id=\"...\", kind=relation, target=Arg(1), relation=neq, value=0}], postconditions=[{id=..., kind=range, target=ReturnValue, range=[min=0, max=32]}]"
205289
```
206290

207-
`parseStaticContractString()` in `MetadataParser.hpp` parses this with a hand-written lexer: splits on `preconditions=[` / `postconditions=[`, extracts `{...}` blocks, then parses `key=value` pairs within each block. Results map to `ParsedPredicate` structs that mirror the existing `::contracts::PredicateAttr` field names.
291+
Fields per `{...}` block: `id` (always), `kind` (always), `target` (all except `expr`), plus kind-specific fields — `relation`+`value`, `range=[min=..., max=...]`, `align`, `expr`, or `symbol` (when target is `Symbol`). `parseStaticContractString()` in `MetadataParser.hpp` lexes this into `ParsedPredicate` structs.
208292

209293
---
210294

211-
## 7. YAML Override and Configuration
295+
## 7. YAML Override and Configuration (Optional)
212296

213297
A minimal optional section can be added to the existing patch YAML. If absent, the tool uses `contracts:` and `meta_contracts:` as-is.
214298

@@ -219,26 +303,6 @@ klee_harness:
219303
buffer_sizes: # per-arg buffer size override (default: 64 bytes)
220304
1: 256 # arg index → byte count
221305
```
222-
223-
### 7.1 YAML Type Definitions
224-
225-
New types are added to `include/patchestry/YAML/KleeSpec.hpp`:
226-
227-
```cpp
228-
struct KleeArgSpec {
229-
unsigned arg_index;
230-
unsigned buffer_size; // bytes for pointer args
231-
};
232-
233-
struct KleeHarnessSpec {
234-
std::string function;
235-
std::string contract;
236-
std::vector<KleeArgSpec> buffer_sizes;
237-
};
238-
```
239-
240-
`llvm::yaml::MappingTraits` implementations are provided in `lib/patchestry/YAML/KleeSpec.cpp`, following the pattern established in `include/patchestry/YAML/BaseSpec.hpp`.
241-
242306
---
243307
244308
## 8. Runtime Contract Support
@@ -268,7 +332,7 @@ Before writing the output module, `patchir-klee-harness` scans for declared-but-
268332

269333
| Symbol | Injected Stub Behavior |
270334
|---|---|
271-
| `__patchestry_assert_fail(ptr, ptr, i32, ptr)` | `call void @abort(); unreachable` |
335+
| `__patchestry_assert_fail(...)` | `call void @abort(); unreachable` |
272336

273337
These stubs are **only injected if the symbol has no existing definition**, so they never override user-defined stubs. Injection happens once per output module regardless of how many target functions are processed.
274338

@@ -288,6 +352,6 @@ Both contract types are handled in the same `klee_main_<funcname>()` driver. A f
288352
## 9. Open Questions and Next Steps
289353

290354
- **Path explosion mitigation.** Functions with deep loop nests or large switch tables may cause KLEE to time out. Investigate `--max-depth`, `--max-time`, and search-heuristic flags to keep exploration tractable for typical firmware patches.
291-
- **Composite struct contracts.** The current predicate set covers scalars, pointers, and ranges. Contracts that refer to individual struct fields (e.g., "field `len` must equal `strlen(buf)`") will require extending `ParsedPredicate` with a field-path accessor.
355+
- **Composite struct contracts.** The current predicate set covers scalars, pointers, and ranges. Contracts that refer to individual struct fields (e.g., "field `len` must equal `strlen(buf)`") will require extending it.
292356
- **Multi-function call-chain verification.** The current design generates one driver per function. Verifying contracts across a call chain (caller preconditions imply callee preconditions) would require an interprocedural harness mode.
293357
- **SeaHorn convergence.** Evaluate whether the harness generator can emit both KLEE and SeaHorn drivers from the same contract metadata, reducing duplicate tooling in `analysis/`.

0 commit comments

Comments
 (0)