You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
9.[Open Questions and Next Steps](#9-open-questions-and-next-steps)
16
-
17
-
---
18
-
19
5
## 1. Executive Summary
20
6
21
7
Patchestry's `patchir-transform` + `patchir-cir2llvm` pipeline embeds formal contracts (preconditions, postconditions, and invariants) into patched LLVM IR as `!static_contract` metadata strings. A SeaHorn integration exists in `analysis/seahorn/` using `__VERIFIER_assert` / `__VERIFIER_assume`. What is currently **missing** is a standalone tool that bridges these contracts to KLEE-based symbolic execution.
@@ -46,13 +32,18 @@ When a contract is violated, KLEE produces a **concrete counterexample input**
46
32
47
33
Patchestry encodes contracts at two levels:
48
34
49
-
**Static contracts** — encoded as LLVM metadata strings on call instructions:
35
+
**Static contracts** — encoded as two-operand LLVM MDNode metadata on instructions.
36
+
`patchir-cir2llvm` emits a `!static_contract` MDNode with two operands: a tag
37
+
string (`"static_contract"`) and the serialized contract string:
No tool currently exists to automatically generate KLEE harnesses from patched IR. Manual harness creation is error-prone, does not scale, and requires deep knowledge of both the contract format and KLEE's API. The proposed verification tool will close this gap.
67
+
The planned **`patchir-klee-harness`** tool will read the `!static_contract` metadata from patched IR, generate a symbolic driver (`@klee_main_<func>()`) that makes all function arguments symbolic, injects `klee_assume` for preconditions, and asserts postconditions and invariants on every return path, producing a self-contained harness ready for KLEE execution.
77
68
78
69
```text
79
70
patched.ll
@@ -98,11 +89,29 @@ patchir-klee-harness \
98
89
# when omitted
99
90
```
100
91
101
-
### 4.2 Contract Source
92
+
### 4.2 Shared Predicate Infrastructure
93
+
94
+
The SeaHorn integration and KLEE harness tool both need to parse
95
+
`!static_contract` metadata and translate predicates into
96
+
verification-tool-specific IR. To avoid duplicating this logic, the following
97
+
should be extracted into a shared library (e.g. `patchestry_contract_ir`):
98
+
99
+
-**Metadata parsing**: reading `!static_contract` MDNodes and deserializing
non-null, alignment, expr, etc.) into LLVM IR `icmp`/`and`/`or` sequences if needed.
103
+
-**Function-level contract collection**: walking a module to associate
104
+
contracts with their enclosing functions.
102
105
103
-
Contracts are always read from the `!static_contract` metadata strings embedded on call instructions by `patchir-cir2llvm`, using a local `parseStaticContractString()` helper that attributes each contract to its enclosing LLVM function.
106
+
Each verification backend then only needs a thin driver layer: SeaHorn emits
107
+
`__VERIFIER_assert`/`__VERIFIER_assume`, while KLEE emits
108
+
`klee_assume`/`abort()`.
104
109
105
-
If `-spec <yaml>` is provided, the `contracts:` section of the spec is also parsed (reusing `ContractSpec` and `Predicate` types from `include/patchestry/YAML/ContractSpec.hpp`) and merged with the metadata-derived contracts. Spec entries take precedence over metadata for the same function, allowing per-deployment overrides without re-running the full transform pipeline.
110
+
### 4.3 Contract Source
111
+
112
+
Contracts are always read from the `!static_contract` metadata strings embedded on call instructions by `patchir-cir2llvm`, using the shared metadata parsing helpers described above.
113
+
114
+
If `-spec <yaml>` is provided, the `contracts:` section of the spec is also parsed and merged with the metadata-derived contracts. Spec entries take precedence over metadata for the same function, allowing per-deployment overrides without re-running the full transform pipeline.
106
115
107
116
### 4.3 Function Discovery
108
117
@@ -127,10 +136,19 @@ For a patched function `int f(int arg0, char* arg1)` with:
127
136
-**Precondition:**`arg0 >= 0`
128
137
-**Postcondition:**`return_value in [0, 32]`
129
138
139
+
**Note on pointer width:** The harness must derive integer widths from the
140
+
module's `DataLayout` rather than hardcoding `i64`. For 32-bit firmware targets
141
+
(e.g. `ARM:LE:32:Cortex`), the size parameter to `klee_make_symbolic` and the
142
+
argument to `klee_assume` should use the target's native pointer-sized integer
143
+
(e.g. `i32`). The examples below use `i64` for readability; the implementation
144
+
must use `DataLayout::getIntPtrType()` to select the correct width.
145
+
130
146
The tool generates the following LLVM IR appended to the patched module:
131
147
132
148
```llvm
133
149
; ─── KLEE runtime declarations (added once per module) ───────────────────────
150
+
; NOTE: size_t and assume argument widths are derived from DataLayout;
151
+
; i64 is shown here for a 64-bit target. For 32-bit targets these become i32.
134
152
declare void @klee_make_symbolic(ptr, i64, ptr)
135
153
declare void @klee_assume(i64)
136
154
declare void @abort() noreturn
@@ -184,31 +202,97 @@ done:
184
202
185
203
## 6. Predicate-to-IR Translation
186
204
187
-
The following table defines how each predicate kind in the contract spec is translated into harness IR:
205
+
This section defines how each predicate kind from `Contract.td` is lowered to KLEE harness IR. Preconditions emit `klee_assume` (constraining symbolic inputs); postconditions emit `if (!cond) abort()` (KLEE reports the violation).
206
+
207
+
### 6.1 Targets
208
+
209
+
Targets identify what a predicate applies to. The YAML spec and serialized metadata use different spellings; the parser normalizes both:
210
+
211
+
| YAML | Serialized (in `!static_contract`) | Operand in IR |
212
+
|---|---|---|
213
+
|`arg0` .. `argN`|`Arg(0)` .. `Arg(N)`| Loaded value of the Nth symbolic argument |
214
+
|`return_value`|`ReturnValue`|`%result` from `call @f(...)`|
215
+
|`symbol` + `symbol: "name"`|`Symbol(@name)`|`load @name` from the module |
216
+
217
+
### 6.2 Values and Constants
218
+
219
+
String values in `relation.value` and `range.min`/`max` resolve as follows:
220
+
221
+
- Numeric literals (`"0"`, `"-1"`, `"0x1000"`) — parsed to `ConstantInt` matching the target type.
222
+
-`"NULL"` / `"null"` — `ConstantPointerNull` for pointers; zero for integers.
223
+
- Named constants (`"USB_MAX_PACKET_SIZE"`) — looked up as a module global; diagnostic + skip if absent. C `#define` values are not available as LLVM symbols and must be pre-resolved in the YAML spec.
224
+
225
+
### 6.3 Translation by Predicate Kind
226
+
227
+
Each subsection shows the IR pattern per target. `V` denotes the loaded target value; `PRE` = `klee_assume(cond)`, `POST` = `br i1 cond, %done, %fail` where `%fail` calls `abort()`.
|`lt`|`slt` / `ult`||`none`| no IR (existence check) |
246
+
|`lte`|`sle` / `ule`||||
198
247
199
-
### 6.1 Metadata String Format
248
+
For `Arg(N)`: `PRE = klee_assume(icmp <rel> V, const)`. For `ReturnValue` / `Symbol(@s)`: `POST = if (!(icmp <rel> V, const)) abort()`. `relation: none` emits no IR — it is an existence assertion only.
200
249
201
-
`patchir-cir2llvm`'s `serializeStaticContract()` produces the following string format (from `tools/patchir-cir2llvm/main.cpp` lines 136–231):
250
+
#### `range`
251
+
252
+
Emits two `icmp` instructions ANDed together:
253
+
254
+
```
255
+
%lo = icmp sge V, resolved_min
256
+
%hi = icmp sle V, resolved_max
257
+
%ok = and i1 %lo, %hi
258
+
```
259
+
260
+
`Arg(N)` → PRE (`klee_assume`). `ReturnValue` / `Symbol(@s)` → POST (`abort` on `!%ok`).
`parseStaticContractString()` in `MetadataParser.hpp` parses this with a hand-written lexer: splits on `preconditions=[` / `postconditions=[`, extracts `{...}` blocks, then parses `key=value` pairs within each block. Results map to `ParsedPredicate` structs that mirror the existing `::contracts::PredicateAttr` field names.
291
+
Fields per `{...}` block: `id` (always), `kind` (always), `target` (all except `expr`), plus kind-specific fields — `relation`+`value`, `range=[min=..., max=...]`, `align`, `expr`, or `symbol` (when target is `Symbol`). `parseStaticContractString()` in `MetadataParser.hpp` lexes this into `ParsedPredicate` structs.
208
292
209
293
---
210
294
211
-
## 7. YAML Override and Configuration
295
+
## 7. YAML Override and Configuration (Optional)
212
296
213
297
A minimal optional section can be added to the existing patch YAML. If absent, the tool uses `contracts:` and `meta_contracts:` as-is.
New types are added to `include/patchestry/YAML/KleeSpec.hpp`:
226
-
227
-
```cpp
228
-
struct KleeArgSpec {
229
-
unsigned arg_index;
230
-
unsigned buffer_size; // bytes for pointer args
231
-
};
232
-
233
-
struct KleeHarnessSpec {
234
-
std::string function;
235
-
std::string contract;
236
-
std::vector<KleeArgSpec> buffer_sizes;
237
-
};
238
-
```
239
-
240
-
`llvm::yaml::MappingTraits`implementations are provided in `lib/patchestry/YAML/KleeSpec.cpp`, following the pattern established in `include/patchestry/YAML/BaseSpec.hpp`.
241
-
242
306
---
243
307
244
308
## 8. Runtime Contract Support
@@ -268,7 +332,7 @@ Before writing the output module, `patchir-klee-harness` scans for declared-but-
These stubs are **only injected if the symbol has no existing definition**, so they never override user-defined stubs. Injection happens once per output module regardless of how many target functions are processed.
274
338
@@ -288,6 +352,6 @@ Both contract types are handled in the same `klee_main_<funcname>()` driver. A f
288
352
## 9. Open Questions and Next Steps
289
353
290
354
- **Path explosion mitigation.** Functions with deep loop nests or large switch tables may cause KLEE to time out. Investigate `--max-depth`, `--max-time`, and search-heuristic flags to keep exploration tractable for typical firmware patches.
291
-
- **Composite struct contracts.** The current predicate set covers scalars, pointers, and ranges. Contracts that refer to individual struct fields (e.g., "field `len` must equal `strlen(buf)`") will require extending `ParsedPredicate` with a field-path accessor.
355
+
- **Composite struct contracts.** The current predicate set covers scalars, pointers, and ranges. Contracts that refer to individual struct fields (e.g., "field `len` must equal `strlen(buf)`") will require extending it.
292
356
- **Multi-function call-chain verification.** The current design generates one driver per function. Verifying contracts across a call chain (caller preconditions imply callee preconditions) would require an interprocedural harness mode.
293
357
- **SeaHorn convergence.** Evaluate whether the harness generator can emit both KLEE and SeaHorn drivers from the same contract metadata, reducing duplicate tooling in `analysis/`.
0 commit comments