-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The following section presents klean, a tool to generate Lean 4 programs from kompiled K definitions.
Code Generation
The following describes the technical details of the code generator.
Code Generation Workflow
The starting point of code generation is a kompiled definition.kore file. After parsing the definition, as an optimization step, the model is minimized to only contain sorts, symbols, and rule axioms that are relevant with respect to rewrite rules. These sorts, symbols, and axioms are then mapped to Lean 4 concepts. To support this, parts of the Lean 4 syntax, in particular modules and declarations, are modeled as dataclass-es whose __str__ method serializes the object to Lean 4 code (implementation). The model is not complete; i.e., not all syntactic elements are modeled, and not all facets of a modeled syntactic element are represented. Notably, terms are not represented structurally.
The rest of the section describes the Lean 4 artifacts generated by klean in detail.
Prelude
The code generator defines a prelude that provides an interpretation (and in some cases, an axiomatic interface) for relevant sorts and hooked functions from K's domains.md. The prelude is included verbatim in each project generated by klean.
Sorts
Each non-primitive K sort is mapped to a Lean 4 type definition.
Cell sorts are mapped to structure-s. If the cell has no subcells (and therefore, wraps a single value), the field name is val.
| K | Lean 4 |
|---|---|
|
|
If the cell has subcells, each field name is derived from the sort of a corresponding subcell.
| K | Lean 4 |
|---|---|
|
|
List, Set, and Map are special-cased (i.e., syntax is not mapped directly), and are also mapped as structure.
| K | Lean 4 |
|---|---|
|
|
|
|
|
|
Cell collections follow the same pattern, except the element / key-value types are cell types.
| K | Lean 4 |
|---|---|
|
|
All other (non-primitive) sorts are translated as inductive, where the constructors are induced by subsort and symbol productions.
| K | Lean 4 |
|---|---|
|
|
Symbol names that contain special characters are quoted using guillemets (« »).
| K | Lean 4 |
|---|---|
|
|
In the generated program, declarations are ordered according to the sort dependency relation in two steps.
- First, sorts are partitioned so that any two sorts of a given class depend on each other. Each such class induces a
mutualcommand in the generated program. - Classes are then ordered topologically.
Injections
The polymorphic injection and retraction functions are defined in the prelude as follows.
class Inj (From To : Type) : Type where
inj (x : From) : To
retr (x : To) : Option From
def inj {From To : Type} [inst : Inj From To] := inst.inj
def retr {From To : Type} [inst : Inj From To] := inst.retrFor each pair of sorts S1, S2 such that S1 is a subsort of S2, an instance Inj S1 S2 is generated. It ensures that inj is transitive, and that inj_* constructors can be retracted into a more specific supersort. This latter is used for subsort matching on the left-hand side of K function rules.
| K | Lean 4 |
|---|---|
|
|
Functions
For each K function symbol not defined in the prelude, a Lean 4 function is declared. In order to support non-total functions, the result type is Option.
If the symbol has no corresponding function rules, it is mapped as an axiom.
axiom «_^Int_;» (x0 : SortInt) (x1 : SortInt) : Option SortIntOtherwise, a def is generated that applies the functions generated for each function rule in priority order until the first some result .
| K | Lean 4 |
|---|---|
|
|
The definitions are implemented in the Option monad to handle non-matching patterns on the left-hand side, unsatisfied preconditions (guard _Val0) and undefined subterms (let _Val3 <- «_/Int_» _Val2 I2).
If a definition depends on an axiom, the noncomputable modifier is applied accordingly.
Special handling is necessary to transform complex patterns on the left-hand side of rules supported by the K Framework but not Lean 4, namely, subsort matching, K collection patterns, and for admitting non-unique variables in a pattern. Roughly, processing of such patterns is performed in the following steps.
- First, all variables1 in the pattern are renamed to be unique, while tracking equivalence between them. Each pair of such variables, say,
xandy, will conceptually induce a piece of Lean 4 pattern matching code of the form| ..., x, ..., y, ... => match x == y with | true => ...
- Then, subsort and collection patterns are recursively abstracted with fresh variables. Each such pair of variable and pattern, say,
xandt, will conceptually induce a piece of Lean 4 pattern matching code of the formwhere| ..., x, ... => match f x with | t' => ...
fis a function that implements the logic of matchingt, andt'a pattern capturing a successful match ofxintot. - The pattern matching pieces are topologically ordered based on the data dependencies between them. For example, in a K pattern
listHeadInSet(ListItem(X:KItem) _:List, SetItem(X) _:Set),Xis matched in the list, which the nesting ofmatchexpressions have to reflect (the following snippet simplifies some function names for the ease of exposition):| p1, p2 => match list_head p1 with | some x => match in_set p2 x with | true => ...
The following are examples for the transformation output of each type of complex pattern.
| K | Lean 4 |
|---|---|
|
|
|
|
|
|
Similarly to sorts, function and function rule definitions are partitioned and topologically ordered with respect to their dependency relation. In the case of mutually recursive definitions, the burden of proving termination (i.e., finding suitable terms for termination_by / decreasing_by) is on the user.
Rewrite Relation
The rewrite relation is represented as a dependent type
inductive Rewrites : SortGeneratedTopCell → SortGeneratedTopCell → Propwhere each constructor, except for a special constructor tran encoding the transitivity of the relation, represents a rewrite rule. In order to prove a step w.r.t. a given rewrite rule, the requires clause and the definedness of subterms have to be proven.
| K | Lean 4 |
|---|---|
|
|
In the Rewrites type definition, rule priorities are not taken into consideration; hence, the generated relation is an overapproximation of the relation induced by the actual rewrite rules.
Potential Uses
The K-to-Lean generator opens several technical directions beyond equivalence verification, which it is used for in this project. One of the promising directions is addressing SMT solver limitations — current verification sometimes falls short on nonlinear arithmetic, modular operations with large bit widths, cryptographic primitives, and complex loop invariants that are common in EVM opcodes. Lean's dependent types and proof tactics could handle these cases where SMT solvers struggle, with the K backend providing the interface and the initial problem setup and Lean filling verification gaps interactively.
Another valuable application is proving correctness of the lemmas and simplification rules added to speed up K proofs, as well as the REVM proofs implemented in scope of this project, to provide stronger mathematical guarantees of their correctness.
Footnotes
-
To be more precise, all variables that after processing the pattern will end up on the left-hand side. For example, in a pattern
SetItem(X:KItem) S:Set,Xwill not be part of the Lean 4 pattern, therefore no renaming is necessary. ↩