Skip to content

The Lean 4 Backend #74

@tothtamas28

Description

@tothtamas28

The following section presents klean, a tool to generate Lean 4 programs from kompiled K definitions.

https://github.com/runtimeverification/k/blob/40d1aa8505043451a08a7436ed076e7d85311756/pyk/src/pyk/klean/README.md

Code Generation

The following describes the technical details of the code generator.

Code Generation Workflow

The starting point of code generation is a kompiled definition.kore file. After parsing the definition, as an optimization step, the model is minimized to only contain sorts, symbols, and rule axioms that are relevant with respect to rewrite rules. These sorts, symbols, and axioms are then mapped to Lean 4 concepts. To support this, parts of the Lean 4 syntax, in particular modules and declarations, are modeled as dataclass-es whose __str__ method serializes the object to Lean 4 code (implementation). The model is not complete; i.e., not all syntactic elements are modeled, and not all facets of a modeled syntactic element are represented. Notably, terms are not represented structurally.

The rest of the section describes the Lean 4 artifacts generated by klean in detail.

Prelude

The code generator defines a prelude that provides an interpretation (and in some cases, an axiomatic interface) for relevant sorts and hooked functions from K's domains.md. The prelude is included verbatim in each project generated by klean.

Sorts

Each non-primitive K sort is mapped to a Lean 4 type definition.

Cell sorts are mapped to structure-s. If the cell has no subcells (and therefore, wraps a single value), the field name is val.

K Lean 4

configuration
  <statusCode> .StatusCode </statusCode>


structure SortStatusCodeCell : Type where
  val : SortStatusCode

If the cell has subcells, each field name is derived from the sort of a corresponding subcell.

K Lean 4

configuration
  <callState>
    <program>    .Bytes     </program>
    <jumpDests>  .Bytes     </jumpDests>
    <id>         .Account   </id>
    <caller>     .Account   </caller>
    <callData>   .Bytes     </callData>
    <callValue>  0          </callValue>
    <wordStack>  .WordStack </wordStack>
    <localMem>   .Bytes     </localMem>
    <pc>         0          </pc>
    <gas>        0:Gas      </gas>
    <memoryUsed> 0          </memoryUsed>
    <callGas>    0:Gas      </callGas>
    <static>     false      </static>
    <callDepth>  0          </callDepth>
  </callState>


structure SortCallStateCell : Type where
  program : SortProgramCell
  jumpDests : SortJumpDestsCell
  id : SortIdCell
  caller : SortCallerCell
  callData : SortCallDataCell
  callValue : SortCallValueCell
  wordStack : SortWordStackCell
  localMem : SortLocalMemCell
  pc : SortPcCell
  gas : SortGasCell
  memoryUsed : SortMemoryUsedCell
  callGas : SortCallGasCell
  static : SortStaticCell
  callDepth : SortCallDepthCell

List, Set, and Map are special-cased (i.e., syntax is not mapped directly), and are also mapped as structure.

K Lean 4

syntax List ::= List List       [...]
syntax List ::= ".List"         [...]
syntax List ::= ListItem(KItem) [...]


structure SortList : Type where
  coll : List SortKItem


syntax Set ::= Set Set        [...]
syntax Set ::= ".Set"         [...]
syntax Set ::= SetItem(KItem) [...]


structure SortSet : Type where
  coll : List SortKItem


syntax Map ::= Map Map           [...]
syntax Map ::= ".Map"            [...]
syntax Map ::= KItem "|->" KItem [...]


structure SortMap : Type where
  coll : List (SortKItem × SortKItem)

Cell collections follow the same pattern, except the element / key-value types are cell types.

K Lean 4

configuration
  <accounts>
    <account multiplicity="*" type="Map">
      <acctID>           0                  </acctID>
      <balance>          0                  </balance>
      <code>             .Bytes:AccountCode </code>
      <storage>          .Map               </storage>
      <origStorage>      .Map               </origStorage>
      <transientStorage> .Map               </transientStorage>
      <nonce>            0                  </nonce>
    </account>
  </accounts>


structure SortAccountsCell : Type where
  val : SortAccountCellMap
 
structure SortAccountCellMap : Type where
  coll : List (SortAcctIDCell × SortAccountCell)
 
structure SortAccountCell : Type where
  acctID : SortAcctIDCell
  balance : SortBalanceCell
  code : SortCodeCell
  storage : SortStorageCell
  origStorage : SortOrigStorageCell
  transientStorage : SortTransientStorageCell
  nonce : SortNonceCell

All other (non-primitive) sorts are translated as inductive, where the constructors are induced by subsort and symbol productions.

K Lean 4

syntax JSON::= Bool | Int | String
             | "null"           [symbol(JSONnull)]
             | JSONKey ":" JSON [symbol(JSONEntry)]
             | "{" JSONs "}"    [symbol(JSONObject)]
             | "[" JSONs "]"    [symbol(JSONList)]


inductive SortJSON : Type where
  | inj_SortBool (x : SortBool) : SortJSON
  | inj_SortInt (x : SortInt) : SortJSON
  | inj_SortString (x : SortString) : SortJSON
  | JSONnull : SortJSON
  | JSONEntry (x0 : SortJSONKey) (x1 : SortJSON) : SortJSON
  | JSONObject (x0 : SortJSONs) : SortJSON
  | JSONList (x0 : SortJSONs) : SortJSON

Symbol names that contain special characters are quoted using guillemets (« »).

K Lean 4

syntax InternalOp ::= "#next" "[" MaybeOpCode "]" [symbol(#next)]


inductive SortInternalOp : Type where
  | «#next» (x0 : SortMaybeOpCode) : SortInternalOp

In the generated program, declarations are ordered according to the sort dependency relation in two steps.

  1. First, sorts are partitioned so that any two sorts of a given class depend on each other. Each such class induces a mutual command in the generated program.
  2. Classes are then ordered topologically.

Injections

The polymorphic injection and retraction functions are defined in the prelude as follows.

class Inj (From To : Type) : Type where
  inj (x : From) : To
  retr (x : To) : Option From

def inj {From To : Type} [inst : Inj From To] := inst.inj
def retr {From To : Type} [inst : Inj From To] := inst.retr

For each pair of sorts S1, S2 such that S1 is a subsort of S2, an instance Inj S1 S2 is generated. It ensures that inj is transitive, and that inj_* constructors can be retracted into a more specific supersort. This latter is used for subsort matching on the left-hand side of K function rules.

K Lean 4

syntax KItem ::= JSON
syntax JSON::= Bool | Int | String


inductive SortKItem : Type where
  | inj_SortBool (x : SortBool) : SortKItem
  | inj_SortInt (x : SortInt) : SortKItem
  | inj_SortString (x : SortString) : SortKItem
  | inj_SortJSON (x : SortJSON) : SortKItem
 
inductive SortJSON : Type where
  | inj_SortBool (x : SortBool) : SortJSON
  | inj_SortInt (x : SortInt) : SortJSON
  | inj_SortString (x : SortString) : SortJSON
 
instance : Inj SortJSON SortKItem where
  inj
    | SortJSON.inj_SortBool x => SortKItem.inj_SortBool x
    | SortJSON.inj_SortInt x => SortKItem.inj_SortInt x
    | SortJSON.inj_SortString x => SortKItem.inj_SortString x
    | x => SortKItem.inj_SortJSON x
  retr
    | SortKItem.inj_SortBool x => some (SortJSON.inj_SortBool x)
    | SortKItem.inj_SortInt x => some (SortJSON.inj_SortInt x)
    | SortKItem.inj_SortString x => some (SortJSON.inj_SortInt x)
    | SortKItem.inj_SortJSON x => some x
    | _ => none

Functions

For each K function symbol not defined in the prelude, a Lean 4 function is declared. In order to support non-total functions, the result type is Option.

If the symbol has no corresponding function rules, it is mapped as an axiom.

axiom «_^Int_;» (x0 : SortInt) (x1 : SortInt) : Option SortInt

Otherwise, a def is generated that applies the functions generated for each function rule in priority order until the first some result .

K Lean 4

syntax Int ::= Int "up/Int" Int [function, total, symbol(up/Int)]
 
rule _I1 up/Int I2 => 0
  requires I2 <=Int 0
 
rule  I1 up/Int 1  => I1
 
rule _I1 up/Int 0  => 0
 
rule  I1 up/Int I2 => (I1 +Int (I2 -Int 1)) /Int I2
  requires 1 <Int I2


def _091b7da : SortInt → SortInt → Option SortInt
  | _I1, I2 => do
    let _Val0 <- «_<=Int_» I2 0
    guard _Val0
    return 0
 
def _50d266e : SortInt → SortInt → Option SortInt
  | I1, 1 => some I1
  | _, _ => none
 
def _5321d80 : SortInt → SortInt → Option SortInt
  | _I1, 0 => some 0
  | _, _ => none
 
def _e985b28 : SortInt → SortInt → Option SortInt
  | I1, I2 => do
    let _Val0 <- «_<Int_» 1 I2
    let _Val1 <- «_-Int_» I2 1
    let _Val2 <- «_+Int_» I1 _Val1
    let _Val3 <- «_/Int_» _Val2 I2
    guard _Val0
    return _Val3
 
def «up/Int» (x0 : SortInt) (x1 : SortInt) : Option SortInt :=
  (_091b7da x0 x1) <|>
  (_50d266e x0 x1) <|>
  (_5321d80 x0 x1) <|>
  (_e985b28 x0 x1)

The definitions are implemented in the Option monad to handle non-matching patterns on the left-hand side, unsatisfied preconditions (guard _Val0) and undefined subterms (let _Val3 <- «_/Int_» _Val2 I2).

If a definition depends on an axiom, the noncomputable modifier is applied accordingly.

Special handling is necessary to transform complex patterns on the left-hand side of rules supported by the K Framework but not Lean 4, namely, subsort matching, K collection patterns, and for admitting non-unique variables in a pattern. Roughly, processing of such patterns is performed in the following steps.

  1. First, all variables1 in the pattern are renamed to be unique, while tracking equivalence between them. Each pair of such variables, say, x and y, will conceptually induce a piece of Lean 4 pattern matching code of the form
    | ..., x, ..., y, ... => match x == y with
      | true => ...
  2. Then, subsort and collection patterns are recursively abstracted with fresh variables. Each such pair of variable and pattern, say, x and t, will conceptually induce a piece of Lean 4 pattern matching code of the form
    | ..., x, ... => match f x with
      | t' => ...
    where f is a function that implements the logic of matching t, and t' a pattern capturing a successful match of x into t.
  3. The pattern matching pieces are topologically ordered based on the data dependencies between them. For example, in a K pattern listHeadInSet(ListItem(X:KItem) _:List, SetItem(X) _:Set), X is matched in the list, which the nesting of match expressions have to reflect (the following snippet simplifies some function names for the ease of exposition):
    | p1, p2 => match list_head p1 with
      | some x => match in_set p2 x with
        | true => ...

The following are examples for the transformation output of each type of complex pattern.

K Lean 4

syntax Foo  ::= Int | "foo"
syntax Bool ::= "foo?" "(" KItem ")" [function, total, symbol(int?)]
 
rule foo?(_:Foo) => true
rule foo?(_)     => false [owise]


def _950d09f : SortKItem → Option SortBool
  | _Pat0 => match (@retr SortFoo SortKItem) _Pat0 with
    | some _Gen0 => some true
    | _ => none
 
def _a65c46b : SortKItem → Option SortBool
  | _Gen0 => some false
 
def int? (x0 : SortKItem) : Option SortBool :=
  (_950d09f x0) <|> (_a65c46b x0)


syntax KItem ::= head(List) [function, symbol(head)]
 
rule head(ListItem(HEAD) _) => HEAD


noncomputable def _eb9118d : SortList → Option SortKItem
  | _Pat0 => match (ListHook SortKItem).split _Pat0.coll 1 0 with
    | some ([HEAD], _Gen0, []) => some HEAD
    | _ => none
 
noncomputable def head (x0 : SortList) : Option SortKItem := _eb9118d x0


syntax Bool ::= Int "eqInt" Int [function, total, symbol(eqInt)]
 
rule I eqInt I => true
rule _ eqInt _ => false [owise]


def _b5d6a8f : SortInt → SortInt → Option SortBool
  | I, _Uniq0 => match I == _Uniq0 with
    | true => some true
    | _ => none
 
def _48f6b23 : SortInt → SortInt → Option SortBool
  | _Gen0, _Gen1 => some false
 
def eqInt (x0 : SortInt) (x1 : SortInt) : Option SortBool :=
  (_b5d6a8f x0 x1) <|> (_48f6b23 x0 x1)

Similarly to sorts, function and function rule definitions are partitioned and topologically ordered with respect to their dependency relation. In the case of mutually recursive definitions, the burden of proving termination (i.e., finding suitable terms for termination_by / decreasing_by) is on the user.

Rewrite Relation

The rewrite relation is represented as a dependent type

inductive Rewrites : SortGeneratedTopCell → SortGeneratedTopCell → Prop

where each constructor, except for a special constructor tran encoding the transitivity of the relation, represents a rewrite rule. In order to prove a step w.r.t. a given rewrite rule, the requires clause and the definedness of subterms have to be proven.

K Lean 4

module COUNTER
  imports INT
  rule [dec]: <k> X:Int => X -Int 1 </k> requires 0 <Int X
endmodule


inductive Rewrites : SortGeneratedTopCell → SortGeneratedTopCell → Prop where
  | tran
    {s1 s2 s3 : SortGeneratedTopCell}
    (t1 : Rewrites s1 s2)
    (t2 : Rewrites s2 s3)
    : Rewrites s1 s3
  | COUNTER_dec
    {X _Val1 : SortInt}
    {_DotVar0 : SortGeneratedCounterCell}
    {_Val0 : SortBool}
    (defn_Val0 : «_<Int_» 0 X = some _Val0)
    (defn_Val1 : «_-Int_» X 1 = some _Val1)
    (req : _Val0 = true)
    : Rewrites {
      k := {
        val := SortK.kseq ((@inj SortInt SortKItem) X) SortK.dotk
      },
      generatedCounter := _DotVar0
    } {
      k := {
        val := SortK.kseq ((@inj SortInt SortKItem) _Val1) SortK.dotk
      },
      generatedCounter := _DotVar0
    }

In the Rewrites type definition, rule priorities are not taken into consideration; hence, the generated relation is an overapproximation of the relation induced by the actual rewrite rules.

Potential Uses

The K-to-Lean generator opens several technical directions beyond equivalence verification, which it is used for in this project. One of the promising directions is addressing SMT solver limitations — current verification sometimes falls short on nonlinear arithmetic, modular operations with large bit widths, cryptographic primitives, and complex loop invariants that are common in EVM opcodes. Lean's dependent types and proof tactics could handle these cases where SMT solvers struggle, with the K backend providing the interface and the initial problem setup and Lean filling verification gaps interactively.

Another valuable application is proving correctness of the lemmas and simplification rules added to speed up K proofs, as well as the REVM proofs implemented in scope of this project, to provide stronger mathematical guarantees of their correctness.

Footnotes

  1. To be more precise, all variables that after processing the pattern will end up on the left-hand side. For example, in a pattern SetItem(X:KItem) S:Set, X will not be part of the Lean 4 pattern, therefore no renaming is necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions