Add Lean 4 backend [AI-assisted]#40
Open
septract wants to merge 94 commits intorems-project:masterfrom
Open
Conversation
Add a new backend targeting Lean 4, enabling lem to generate .lean files from semantic definitions. This follows the same architecture as the existing Coq backend (custom LeanBackend module via functor). New files: - src/lean_backend.ml: Main backend (~1600 lines) - lean-lib/LemLib.lean: Runtime support library - library/lean_constants: Lean 4 reserved words Target registration across: ast.ml, target.ml, target.mli, parser.mly, main.ml (-lean flag), target_trans.ml, backend.ml, backend.mli, process_file.ml (.lean output with /- -/ comments and import LemLib). Library declarations (declare lean target_rep) added to all standard library files: basic_classes, bool, maybe, num, list, set, map, string, either, relation, sorting, word, machine_word, tuple, function, set_extra, map_extra, set_helpers, assert_extra. Key Lean 4 adaptations: - Bool/List/Nat capitalized types via target_rep type declarations - Constructor patterns use dot notation (.Red, .some, etc.) - Comments converted from (* *) to /- -/ - Records use structure/where syntax - Match arms use | pat => expr (no end keyword) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generate Lean 4 'instance : Inhabited T' for each type definition, mirroring Coq's 'Definition T_default' generation. This ensures default values are available for all user-defined types. Also raise proper errors for Typ_with_sort in pat_typ and typ, matching Coq's behavior instead of silently passing through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Lean 4 to all documentation: README, manual (introduction, invocation, backends, language grammar, backend linking, typeclasses), and the Ott grammar definition. Create new backend_lean.md manual page. Add missing declare lean target_rep entries for nth (list_extra.lem), ord and chr (string_extra.lem) to complete library parity with Coq. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add lean to target lists for wordFromInteger, wordFromNumeral, wordToHex in machine_word.lem. Add lean target_rep for choose and exclude lean from choose lemmas/asserts in set_extra.lem. Exclude lean from THE_spec lemma in function_extra.lem. Add lean-libs target to library/Makefile and leantests target to tests/backends/Makefile. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n backend - Add Meta_utf8 variant to output.ml to preserve UTF-8 bytes (×, →, etc.) instead of double-encoding through of_latin1 - Fix to_rope_help_block to use of_string for Format output, preventing double-encoding of Unicode characters in block-formatted output - Add flatten_newlines utility to collapse newlines in output trees - Disable block formatting for Lean backend (Lean 4 is whitespace-sensitive) - Replace break_hint_space with explicit spaces in App, Infix, If, Fun, Case - Add 'open TypeName' after inductive types for constructor scoping - Add 'open ClassName' after class definitions for method scoping - Remove dot-prefix on constructors in expression/pattern position - Fix pattern constructor argument spacing (concat emp -> concat space) - Fix 'let' keyword spacing (letx -> let x) - Expand LemLib with set/map operations, ordering, and utility functions - Add Lake project setup for lean-lib (lakefile.lean, lean-toolchain) - Add lean-test Lake project for end-to-end compilation testing - Expand lean_constants with ~30 missing Lean 4 reserved words - Add lean-libs target and Lean paths to Makefile install/distrib/clean 5 of 7 test files now compile: Types, Classes2, Classes3, Pats, Pats3. Remaining issues: Exps (set BEq instances), Coq_test (mutual inductives with varying parameters), Classes3 (target-specific code leaking). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Filter target-specific class methods from Lean output: class bodies
now skip methods annotated for other backends ({hol}, {coq}, etc.)
- Filter corresponding instance methods when class method is not
target-visible for Lean
- Add 'deriving BEq' to inductive types and structures when all
constructor/field types support it (no function-typed args)
- Skip 'deriving BEq' for mutual blocks to avoid cross-reference issues
- Handle mutual inductives with heterogeneous parameter counts by
converting parameters to indices (Type 1 universe), with implicit
bindings in constructors
- Use sorry for Inhabited defaults of mutual recursive types
- Export SetType.setElemCompare in Pervasives_extra for bare usage
All 7 test files now compile: Types, Classes2, Classes3, Pats, Pats3,
Exps, Coq_test (previously 5/7).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace all assert false in lean_backend.ml with descriptive error messages - Add Typ_backend handling in typ and indreln_typ (previously unreachable crash) - Fix sort_by_ordering bug: .EQ => false (mergeSort expects strict <, not <=) - Add @[inline] to 19 trivial wrapper functions in LemLib.lean - Add module-level documentation to LemLib.lean, Pervasives_extra.lean, lakefile.lean - Add 13 missing Lean 4 keywords to lean_constants - Fix flatten_newlines to recurse into Core nodes in output.ml - Add setEqualBy doc comment noting sorted-input precondition Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix lean_backend.ml assert/lemma/theorem emission: - assert -> #eval with Bool check (runtime verification) - lemma/theorem -> by decide (proof-time verification) - Create tests/comprehensive/ directory structure - Add Makefile, run_tests.sh, lakefile.lean, expected_failures.txt - Add Pervasives_extra.lean stub for test compilation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions Test suite covers: let bindings, function patterns, pattern matching edge cases, type features, constructors, expression edge cases, higher-order functions, either/maybe types, set/map operations, comprehensions, modules, type classes, inductive relations, mutual recursion, do notation, target- specific declarations, infix operators, scope/shadowing, strings/chars, numeric formats, assertions/lemmas, records, reserved words, comments, and stress testing. Backend changes: - lean_backend.ml: assert -> #eval Bool check (runtime verification), lemma/theorem -> by decide (proof-time verification) - process_file.ml: auxiliary .lean files now import their main module Results: 21/25 test files compile and pass Lake build, with 103 #eval assertions verifying runtime correctness. 4 files are expected failures (comprehensions, inductive relations, sets/maps, stress - all due to missing BEq instances or syntax issues). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix indreln Prop ascription: remove ': Prop' from constructor antecedents that confused Lean's elaborator for subsequent type references - Add 'export SetType (setElemCompare)' to comprehensive Pervasives_extra - Enable all 4 formerly-expected-failure tests in lakefile (comprehensions, indreln, sets_maps, stress_large) — all now compile and pass - Clear expected_failures.txt (no remaining failures) - Track lake-manifest.json for comprehensive test project - Expand .gitignore: Lean build artifacts, generated .lean files in tests/backends/ and tests/comprehensive/, .claude/, _opam/, lean-lib/.lake/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation - Fix typ dropping Typ_app type arguments (e.g. List Nat → List) - Fix setFromList/setFromListBy reversed output order (foldl → foldr) - Change Comp_binding/Setcomp silent comments to proper errors - Change pattern catch-alls from silent comments to proper errors - Add lean-libs to Makefile libs_phase_2 target - Add Pats3 to backends leantests target with build rule - Add fmapUnion and fmapElements to LemLib - Add test_typ_args.lem regression test (4 assertions, all pass) All tests pass: 57/57 comprehensive jobs, 19/19 backend jobs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…c type vars - Native do-notation: remove pipeline desugaring, emit Lean 4 do blocks with proper indentation and whitespace handling - Vector literals: render L_vector as prefix+bits (e.g. 0b1010) - Vector patterns: P_vector renders as list patterns with .toList on match expression; P_vectorC raises clear error (no backend supports it) - Numeric type variables: fix sorry/errors in class definitions, instance declarations, and type class constraints — all now emit (n : Nat) - Default values: use 'default' instead of 'sorry' for Typ_wild/Typ_var - LemLib: add lowercase 'vector' type alias for Lean's Vector - New test: test_vectors.lem (vector expressions + pattern matching) All tests pass: 27/27 comprehensive (59 lake jobs), 19/19 backends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…build system
Phase 1 — Backend bugs (lean_backend.ml):
- Fix string literal escaping: escape \, \n, \t, \0, \r (not just quotes)
- Fix let_type_variables: Nvar gets {n : Nat} not {n : Type}
- Fix Indreln: emit removal comment when not targeted for Lean
- Fix VectorSub: correct skips'' whitespace typo
- Fix indreln_typ: space for multi-arg types (ts <> [] not ts = 1)
- Fix theorem: explicit space after keyword
- Fix assert names: escape through lean_string_escape
- Fix Do handler: wrap in (do ...) parens for indentation isolation
- Fix Typ_app/Typ_backend: conditional space for zero-arg types
- Fix P_cons: parenthesize in fun_pattern context
- Fix default_value for Typ_var: use sorry (avoids missing Inhabited)
Phase 2 — LemLib fixes (LemLib.lean):
- Fix setEqualBy: order-independent mutual subset check
- Fix setCompareBy: sort both lists before comparing
- Fix setCase: 4th arg is plain value, not function (matches Lem sig)
- Fix chooseAndSplit: partition by comparison, not just head/tail
- Fix fmapEqualBy: key param from LemOrdering to Bool
- Add apply, integerSqrt, rationalNumerator/Denominator, realSqrt/Floor/Ceiling, intAbs, listGet?/listGet\!
- Add DecidableEq to LemOrdering
- Fix gen_pow_aux: total with termination_by/decreasing_by
- Fix sort_by_ordering: stable (.EQ => true)
Phase 3 — Build system:
- Add Classes2, Classes3, Coq_test to leantests Makefile
- Add nomatch, nofun, infix/infixl/infixr, prefix, postfix to lean_constants
- Fix README: Lean 4.28.0 (not 4.x)
New regression test: test_audit_regressions.lem (string escaping, cons
patterns, set equality — 6 assertions).
All 28 comprehensive tests pass, 19 backend jobs compile.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ces, add lean-libs Fix Bool.<-> resolution error that blocked `make lean-libs`: - Add e_env fallback to search_module_suffix in target_binding.ml When typeclass resolution macros synthesize definitions with narrow local environments (missing imported modules in m_env), fall back to looking up module paths directly in the global e_env registry - Fix orderingEqual target_rep: `decide` is wrong (expects Prop), use infix `==` since LemOrdering derives BEq - Revert Comp_binding/Setcomp to comment output (matches Coq backend) Improve Inhabited instance generation (lean_backend.ml): - Use `default` for type variables in Inhabited context (not sorry) - For mutual types, find safe constructors whose args don't reference other mutual types, reducing sorry usage - Collect type/class namespace opens for auxiliary file generation Add lean-lib generated library files (58 files from make lean-libs) Add pairEqual and maybeEqualBy to LemLib.lean Add runtime assertions to 6 existing test files Add test_cross_module.lem regression test (9 assertions) 29/29 comprehensive tests pass, 19/19 backend jobs pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests - Add is_lean_pattern_match in patterns.ml that rejects P_num_add, triggering guard-based desugaring instead of invalid Lean 4 syntax - Add 14 int32/int64 bitwise functions to LemLib with two's complement conversion (int32Lnot/Lor/Lxor/Land/Lsl/Lsr/Asr, same for int64) - Add missing library functions: naturalOfString, integerDiv_t, integerRem_t, integerRem_f, THE with target_reps in .lem files - Fix type/value namespace collision: rename_top_level.ml seeds constant renaming with type names for Lean so functions avoid type names - Fix self-referential Inhabited: generate_default_values detects recursive types without base cases and uses sorry - Add Add/Sub/Mul/Div/Mod/Neg/Pow/Min/Max/Abs/Append to lean_constants to avoid ambiguity with Lean stdlib type classes - Expand backend tests: Record_test, Op, Let_rec, Indreln2 (11 total) - Fix test .lem files: add type annotations for Num.Numeral resolution, convert tabs to spaces in let_rec.lem All 11 backend tests and 29 comprehensive tests pass (90 Lake jobs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Expand backend_lean.md: add auxiliary files, recursive definitions, inductive relations, BEq derivation, automatic renaming sections - Add Lake project example to compilation instructions - Fix incorrect claim about constructor dot notation (uses open TypeName) - Document Inhabited sorry behavior for recursive types without base cases - Add -auxiliary_level auto mention, matching HOL4/Isabelle docs - Fix introduction.md Lean version: 4.x -> 4.28.0 - Fix README.md Lean library entry to match other backends format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove duplicate unreachable P_record match arms in fun_pattern and def_pattern. Replace silent 'Internal Lem error' comment strings with proper exceptions that surface errors to users. Simplify generate_inhabited_instance by removing dead None branch (single types now always pass through mutual-aware path). Standardize error message format to 'Lean backend: ...' prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move generated library files under LemLib/ namespace so imports become `import LemLib.Pervasives` instead of bare `import Pervasives`, avoiding conflicts with Lean stdlib modules (Bool, List, String, etc.). Fix class name collisions (Eq, Ord) with Lean stdlib by making the renaming pipeline handle class types. Previously, class definitions were skipped in add_def_aux_entities (TODO comment), so names in lean_constants like Eq never triggered renaming. Now Eq -> Eq0, Ord -> Ord0 at all output sites: class defs, constraints, and instance declarations. Key changes: - backend_common.ml: LemLib. prefix for library modules; class_path_to_name - process_file.ml: dot-to-path conversion for Lean output files - lean_backend.ml: strip LemLib. prefix from open stmts; use class_path_to_name - types.ml/mli: type_defs_lookup_tc, type_defs_update_class - typed_ast_syntax.ml: collect class paths and methods in add_def_aux_entities - rename_top_level.ml: rename_type handles both Tc_type and Tc_class - target_trans.ml: add_used_entities_to_avoid_names handles Tc_class - lean_constants: add Ord - Pervasives_extra stub moved to lean-lib/LemLib/; test stubs removed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…reps Key changes to make generated library files compile: - Import ordering: collect imports in a ref, emit all at file top before any other content (Lean requires imports before non-import statements) - Import-open: suppress 'open' for LemLib.* modules (generated files have no namespaces; import alone brings definitions into scope) - Class exports: use 'export ClassName (methods)' instead of 'open' after class definitions so methods are visible to importing files. Filter out names that clash with Lean globals (max, min, compare). - Instance constraints: use inst_constraints from type system (fully qualified paths) instead of parsing unqualified Idents from Cs_list AST - BEq bridges: emit 'instance [Eq0 a] : BEq a' after Eq class def, and 'instance [SetType a] : BEq a' after SetType class def, so == works wherever these classes are in scope - Target reps: Ord0.compare for compare method, intAbs for integer abs functions, \!= for unsafe_structural_inequality - Remove pairEqual/maybeEqualBy from LemLib.lean (now in generated code) - Pervasives_extra stub: remove namespace wrapper (matches generated style) - lakefile: use submodules glob for full library discovery 28 of 61 library modules now build successfully (up from 0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g, cpp support - Add explicit `: Type` annotation on all non-indexed inductives to prevent Lean auto-inferring Prop (Sort 0) for single-constructor mutual types - Fix multi-clause mutual function naming to use const_ref_to_name (avoids definition/reference name mismatch e.g. test44 vs test440) - Generate SetType/Eq0/Ord0 instances for all inductive types; skip for Type 1 (heterogeneous mutual blocks) since those classes require Type - Auto-import LemLib.Pervasives_extra when Pervasives is imported, for bridge instances (NumAdd -> Add, etc.) - Include transitive namespace opens in auxiliary files (Lean open is file-local, not exported to importers) - Add MapKeyType compare method to BEq bridge derivation - Add isInequal target_rep (\!=) for basic_classes - Add One/Zero to lean_constants to avoid stdlib collisions - Replace removed List.get?/List.get\! with listGetOpt/listGetBang wrappers - Add Ord instance for Prod, set_tc, boolListFromNatural to LemLib runtime - Update Pervasives_extra stub with Lem numeric class bridges Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ;lean to 13 target group annotations in cmm.lem so the Lean backend can generate output from the same source file used by other backends. Changes are purely additive and don't affect Coq/OCaml/Isabelle output. Also add Lake project files (lakefile.lean, lean-toolchain, lake-manifest) and .gitignore .lake/ globally instead of per-directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pes, constants Backend fixes for real-world Lem projects (ppcmem-model, cpp): - Sanitize tab characters in all generated output (Lean 4 forbids tabs) - Use 'export Type (Ctor1 Ctor2)' instead of 'open Type' for inductives, so constructors are visible in importing files - Parenthesize match/if/let/fun via shared needs_parens helper, applied consistently in function args, if-conditions, and case arm bodies - Fix indreln type signatures to apply target reps (e.g. set -> List) - Resolve wildcards in fun_pattern P_typ to concrete types - Handle unit literal in fun_pattern as (_ : Unit) - Extract is_library_module predicate for OpenImportTarget - Expand lean_constants from 129 to 262 entries covering all Init types, typeclasses, and common functions (id, flip, cast, guard, etc.) - Add gen_lean_constants.lean script for regenerating the list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- natLnot: panic instead of returning 0 (NOT undefined for Nat) - naturalOfString: panic on invalid input instead of returning 0 - THE: panic instead of returning none (Hilbert choice not computable) - rationalNumerator/Denominator: panic (rationals not supported) - realSqrt/Floor/Ceiling: panic (reals not supported) - Add Nat bitwise ops (natLand, natLor, natLxor, natLsl, natLsr, natAsr) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…compilation - Class method constants: emit @method (Type) _ for bare class methods so Lean can resolve implicit type parameters (fixes Machine_word.lean) - Standalone BEq instances without [Inhabited] constraint, separate from Ord which requires Inhabited for sorry bodies - Termination annotations: use try_termination_proof (like Coq/Isabelle) to emit def instead of partial def when termination is provable - Multi-discriminant match: decompose tuple scrutinees for termination checker visibility (match l1, l2 with instead of match (l1, l2) with) - Library namespace qualification: push namespace before processing so auxiliary file opens get qualified names (Lem_Basic_classes.Eq0) - Bridge instances moved to LemLib/Bridges.lean (survives make lean-libs) - Auto-import LemLib.Bridges for non-library modules - Makefile cleanup: remove auxiliary files after lean-libs generation - Target reps: genlist, last, nat bitwise ops, int32 bitwise ops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cross-module name collision: rename_top_level.ml includes ALL env type
names in Lean constant-avoid set, not just local ones (fixes thread_trans
type/indreln collision across modules)
- Record literal type ascription: add (({ ... } : Type)) annotation using
exp_to_typ so Lean can resolve record types without context
- setChoose replaces sorry target rep: Set_extra.choose now maps to a real
function in LemLib instead of bare sorry (which can't be applied as fn)
- Propositional equality in indreln: lean_prop_equality flag makes isEqual
output = (Eq) instead of == (BEq) in antecedents; functions lack BEq
- Indreln renamed name output: uses constant_descr_to_name instead of raw
AST name, so renames like thread_trans -> thread_trans0 are reflected
- deriving BEq, Ord: simple types (non-mutual, no fn-typed args) use Lean's
deriving instead of sorry-based instances; adds [BEq a] [Ord a] constraints
on downstream SetType/Eq0/Ord0 instances for parameterized types
- Dynamic library namespace list: replaces hardcoded core_lib_ns with
computation from module environment (e_env), detecting library modules by
Coq rename presence
- String.mk -> String.ofList: fixes deprecation warning in string.lem
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix bug in type_def_indexed: types with 0 params in a heterogeneous mutual block were emitted as Type instead of Type 1, causing a Lean universe mismatch error. All types in such blocks now consistently use Type 1. New test files: - test_case_arm_nesting.lem: match/if/let/fun in case arms, as function args, in if-conditions, in list/tuple constructors (42 assertions) - test_termination.lem: declare termination_argument, multi-discriminant match with 2 and 3 scrutinees, partial def fallback (15 assertions) Enhanced existing tests: - test_pattern_edge_cases.lem: n+k patterns (fib, pred, classify), unit in tuple/let patterns (13 new assertions) - test_indreln.lem: inequality, nested fn application, ordering, and multi-rule relations in antecedents (4 new relations) - test_mutual_recursion.lem: heterogeneous param counts (caught the Type 1 bug), 3-way mutual recursion - test_audit_regressions.lem: tabs in comments, type/record defs Total: 31 tests, 231 assertions, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Honest accounting of every gap: 942 sorry stubs in Machine_word, wrong floating-point types, missing overflow semantics, 18 partial defs, incomplete target rep coverage, indreln \!= edge case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…annotations
Propositional equality in indreln antecedents now handles both the Infix
AST path (direct = / <> syntax) and the App AST path (Lem's <>
decomposition to not(isEqual x y)). Extracted check_beq_target_rep
helper to share logic between both cases. Added regression tests using
(nat -> nat) types which lack BEq and would fail without the fix.
Added {lean}-scoped termination annotations for 10 structurally recursive
library functions (map_tr, count_map, splitAtAcc, mapMaybe, mapiAux,
catMaybes, init, stringFromListAux, concat, integerOfStringHelper),
reducing partial def count from 18 to 8.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…FixedPoint Convert 5 more partial defs to total: - LemLib.lean: boolListFromNatural (n/2 division), bitSeqBinopAux (dual-list recursion) - LemLib.lean: lemStringFromNatHelper, lemStringFromNaturalHelper (n/10 division) - LemLib.lean: lemLeastFixedPoint (bounded countdown) Add Lean-only target reps in string_extra.lem and set.lem to route generated code through the total LemLib implementations. All changes are inherently Lean-scoped (declare lean target_rep / hand-written Lean). Add TODO rems-project#7: audit all pre-existing unscoped termination annotations from upstream to verify they don't affect other backends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate accreted test files into logically organized structure. Coverage audit: 526 assertions preserved exactly (0 lost). Types: test_types_basic, test_types_advanced, test_mutual_types, test_deriving Patterns: test_patterns, test_case_arm_parsing Functions: test_functions Expressions: test_expressions, test_let_bindings, test_scope_shadowing Records: test_records Classes/Instances: test_classes, test_instances Inductive relations: test_indreln Collections: test_collections, test_either_maybe Modules: test_modules Keywords: test_keywords Target reps: test_target_reps, test_target_specific Numeric: test_numeric, test_mword Strings: test_strings_chars Misc: test_misc, test_termination, test_vectors, test_stress Cross-file: 3 pairs (cross_recup, cross_field_access, cross_module) Cerberus: test_cerberus_patterns, test_cerberus_remaining Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes from Cerberus porting team: 1. Type target_rep on ALL types: declare lean target_rep type now works for both opaque and non-opaque types (e.g., types with constructors that map to external Lean types). Emits abbrev instead of inductive. Skips instance generation for types with target reps. 2. Per-file import collection: backend_common.on_cr_simple_applied callback fires when CR_simple target reps are applied during rendering. Lean backend extracts module prefix and adds import. Library target reps filtered via Coq-rename check — no imports generated for Lean stdlib or LemLib references. Replaces broken global t_env scan that caused circular dependencies. 3. Low-priority sorry instances: all sorry-based BEq/Ord/Eq0/Ord0/ SetType instances now have priority := low. Hand-written instances at default priority cleanly override them. Monomorphic types with deriving BEq/Ord get real implementations at default priority. 4. Abbreviation expansion: src_t_has_fn expands type abbreviations via type_abbrev to detect hidden function types (e.g., stateM). Remaining known issues (from Cerberus team report): - Inhabited sorry on mutual types panics at module init (eager eval) - Stale open in auxiliary files for type-rep'd types - Bogus import Operators from local module alias New tests: - test_instances.lem Section 4: runtime Eq0 assertions - test_target_reps.lem Section 6: opaque type with target_rep - Test_beq_override.lean: hand-written BEq priority override on mutual types (3 runtime assertions proving mechanism works) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion instances Four fixes from Cerberus remaining issues: A. Inhabited on mutual types: use real constructors instead of sorry. New src_t_is_directly_mutual checks only direct type references (not through List/Option containers). Indirect refs are safe because List.default=[], Option.default=none. For mutual records, use TypeName.mk with default for each field. Only truly self- referential types with no safe constructor get sorry. B. Stale open in auxiliary files: types with ANY target_rep (not just opaque) excluded from type_info, preventing open on abbrev types. C. Nested local modules: recursive collect_local_modules finds nested modules like SEU.Operators, preventing bogus imports. D. Abbreviation instance skip: generate_inhabited_instance and generate_beq_ord_instances both skip Te_abbrev types AND any type with a Lean target_rep. Eliminates sorry on type abbreviations like rel_pred, rel_set, rel (they inherit from underlying type). Regenerated LemLib: 0 active sorry (was 3 in Relation.lean + others). New tests: - test_instances.lem Section 5: mutual Inhabited with real constructors - test_target_reps.lem: non-opaque type with target_rep - test_modules.lem Section 6: nested local modules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mutual type blocks with cyclic constructor dependencies have no valid topological order for Inhabited instance emission. Replace the previous nullary-first sorting approach with Lean's mutual def block, which allows forward references between default value definitions: mutual def TypeA.default_inhabited : TypeA := MkA TypeB.default_inhabited def TypeB.default_inhabited : TypeB := BEmpty end instance : Inhabited TypeA where default := TypeA.default_inhabited instance : Inhabited TypeB where default := TypeB.default_inhabited Within mutual def blocks, direct references to other mutual types use TypeName.default_inhabited instead of default (which would require Inhabited instances that don't exist yet). Non-mutual args (List, Option, etc.) still use default since their Inhabited instances exist. Adds three test cases in test_instances.lem Section 6: - Simple 2-type wrapper/payload dependency - 4-type cycle (cycA/cycB/cycC/cycD) that breaks nullary-first sorting - 5-type Cabs-like pattern mimicking Cerberus's gnu_builtin/type_name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parameterized types with nullary constructors (e.g. forest 'a = FNil | ...) previously got sorry for their Inhabited default. Now the backend checks for nullary constructors first, which need no type variable values and thus no [Inhabited a] constraint. Types without nullary constructors still use sorry. Example: forest.default_inhabited changes from sorry to FNil. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
type_defs_rename_type in typed_ast_syntax.ml now handles both Tc_type
and Tc_class entries, enabling 'declare {lean} rename type' on classes.
Previously it only handled Tc_type, so class renames via declaration
syntax silently failed.
Adds explicit renames in library/basic_classes.lem:
declare {lean} rename type Eq = Eq0
declare {lean} rename type Ord = Ord0
This ensures Lem's Eq/Ord classes are consistently renamed to Eq0/Ord0
in Lean output, avoiding collision with Lean 4's built-in Eq (which is
a Prop, not a typeclass). The {lean} scope ensures other backends are
unaffected.
Adds eq_lookup test in test_expressions.lem verifying [Eq0 a] appears
in generated function signatures with Eq constraints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When rename_top_level doesn't set type_rename/class_rename/target_rename (observed in some build environments), the backend now checks lean_reserved_names as a fallback in type_path_to_name, class_path_to_name, and const_ref_to_name. Names colliding with Lean keywords get a '0' suffix. main.ml sets Backend_common.lean_reserved_names from the lean_constants file at startup, making the fallback available to all output paths. Adds test cases in test_keywords.lem for the three Cerberus-reported patterns: 'prefix' as a type name, 'guard' as a function name, 'show' as an instance method name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lean_reserved_names fallback in backend_common and the explicit
declare {lean} rename type Eq = Eq0 / Ord = Ord0 in basic_classes.lem
were added to work around a Cerberus team build issue (stale object
files). Since rename_top_level already handles keyword escaping via
lean_constants, both mechanisms were redundant.
Keeps: typed_ast_syntax.ml class rename support (genuine infrastructure
fix — declare rename type now works for classes), and the test cases
for prefix/guard/show keyword patterns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
550da7f to
172733f
Compare
- Merge test_cerberus_remaining.lem into test_cerberus_patterns.lem (7 sections covering mutual record updates, equality chaining, Function.const, let chains, SetType on parameterized types) - Move eq_lookup test from test_expressions.lem to test_keywords.lem (it tests Eq→Eq0 class renaming, a keyword/naming concern) - Add 7 missing test files to lakefile.lean: functions, misc, mutual_types, numeric, stress, types_advanced, types_basic (were being generated/symlinked but not compiled by Lake) 35 test files, 102 Lake jobs, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Methods like 'compare := sorry' are eagerly evaluated at module init,
causing INTERNAL PANIC before main runs. Adding explicit arguments
('compare _ _ := sorry') makes them lambda closures that only panic
when actually called at runtime. BEq already had this pattern.
Affects: Ord.compare, SetType.setElemCompare, and all Ord0 methods
(compare, isLess, isLessEqual, isGreater, isGreaterEqual) in sorry
instances for mutual/parameterized types.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… instances
New Lem declaration that tells the Lean backend to skip generating ALL
auto-instances (Inhabited, BEq, Ord, SetType, Eq0, Ord0) for a type.
Users provide instances in a hand-written .lean file instead.
This addresses the root cause of sorry init-time panics: rather than
trying to generate correct instances for arbitrarily complex types
(mutual, parametric, self-referential), users annotate types that need
hand-written instances and the backend simply skips them.
Syntax: declare {lean} skip_instances type my_type
Scoped to {lean} so other backends are unaffected. Stored as
type_skip_instances : Targetset.t in type_descr.
Implementation touches: lexer (new token), parser (new rule), ast (new
variant), types (new field), typecheck (process declaration), backend.ml
(generic output), lean_backend (skip check at 2 sites), convert_relations
(new field in type_descr constructor).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that skip_instances provides an escape hatch, the parametric type
Inhabited logic no longer needs complex safe_indirect analysis or
[Inhabited a] constraint propagation. Parameterized types use nullary
constructors when available (e.g. FNil), otherwise sorry. If sorry
panics at init, user adds 'declare {lean} skip_instances type T'.
Removes ~20 lines of constraint and cross-mutual analysis code.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…utual_name_map Merge the _mutual variants into the base functions using an optional ~mutual_name_map parameter. When empty (default), Typ_app returns 'default'. When non-empty (mutual def context), Typ_app checks the map and uses TypeName.default_inhabited for mutual type args. Eliminates 43 lines of duplicated code and dispatch boilerplate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New Lem declaration that replaces the auto-generated Inhabited default
with a user-specified Lean expression. This complements skip_instances:
- skip_instances: suppresses ALL instances, user provides them externally
- inhabited: replaces just the Inhabited default inline, no separate file
For parametric types, [Inhabited a] constraints are added automatically
when an override is present, so 'default' works for type-variable args.
Syntax: declare {lean} inhabited type my_type = `MyConstructor default`
Stored as type_inhabited_override in type_descr. The backend checks for
an override before constructor analysis or sorry fallback.
Implementation touches: lexer, parser, ast, types, typecheck, backend.ml,
lean_backend.ml, typed_ast. Test in test_instances.lem Section 9.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When both 'declare {lean} skip_instances' and 'declare {lean} inhabited'
are declared on the same type, the inhabited override must take priority
over the skip. Previously skip_inhabited_for_type returned true for any
skipped type, suppressing the override.
Adds Section 10 test: nd_act/nd_mon with both skip_instances and
inhabited, verifying Inhabited is generated while BEq/Ord are not.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…declare inhabited
Monomorphic types without safe constructors now get:
noncomputable instance (priority := low) : Inhabited T where
default := DAEMON
DAEMON is an axiom (no init code, no panic). noncomputable tells the
code generator to skip the instance. priority := low allows user
overrides. This eliminates all init-time sorry panics for monomorphic
Inhabited instances.
Parametric types keep sorry (safe — parametric instances compile as
functions, so sorry only evaluates when called, not at init).
Removes the 'declare {lean} inhabited type T = expr' feature — DAEMON
handles all monomorphic fallback cases automatically, and parametric
cases are safe with sorry. No user annotation needed.
Net -92 lines across 12 files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All Inhabited fallbacks now use noncomputable instance (priority := low) with DAEMON (or DAEMON1 for Type 1 mutual blocks). No sorry anywhere in Inhabited generation. This is uniform — no parametric vs monomorphic distinction, no special cases. Types with real constructor defaults (tier 1) go in mutual def blocks. Types with DAEMON (tier 2) go as standalone noncomputable instances outside the mutual block. When the DAEMON cascade causes downstream compile errors (a computable type containing a DAEMON type directly), the fix is skip_instances. This is intentional: these types genuinely have no computable default. Adds DAEMON1 axiom to LemLib for Type 1 universe types. Adds skip_instances to bintree-containing test types (cascade example). Updates design note with transitive cascade documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix stale comments referencing sorry for Inhabited (now DAEMON) - Replace assert false with Reporting_basic.err_general (3 sites) - Remove 16 orphaned .lean files from pre-consolidation test layout - Update backend_lean.md: remove declare inhabited docs (feature removed), document DAEMON-based Inhabited and noncomputable cascade behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…table DAEMON now uses @[implemented_by] with an unsafe unsafeCast() impl, making it computable. This fixes the incompatibility with partial def (which requires computable Inhabited) while remaining safe at the API level (unsafe is hidden inside private impl functions). Removes all noncomputable from Inhabited fallback instances. The cascade issue (downstream types failing due to noncomputable dependency) is eliminated — DAEMON instances are now fully computable. Removes skip_instances workarounds from Classes2.lem and test_classes.lem that were added solely to work around the noncomputable cascade. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simple enums (all nullary constructors) in a mutual block can safely derive BEq/Ord — they have no constructor args so no dependency on other types' BEq instances. Previously, ALL types in mutual blocks got sorry BEq/Ord regardless of whether they could derive. This fixes the Cerberus pattern where types like storage_class_specifier (6 nullary constructors) share a mutual block with cabs_expression_ (recursive). The enum gets real derived BEq; the recursive type keeps sorry. Runtime equality on the enum now works instead of panicking. Adds Section 9 test: my_specifier (all nullary) + my_expr (recursive) in a mutual block, with runtime assertions verifying BEq works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HIGH: Fix missing space separator in Typ_backend with multiple type args — concat emp changed to concat_str " " (2 sites: pat_typ, indreln_typ). MEDIUM: Fix indreln_typ dropping skips for Typ_var. Fix lean_namespace_stack not restored on exception (use Fun.protect). Fix stale comment about deferred abbrevs timing. LOW: Remove dead functions default_type_variables and tyexp_record (~25 lines). Fix stale header comment (open → export). Remove identity option match. Remove duplicate doc comment. Fix unused bindings (_bool, removed src_t). Fix Forbid closures (fun x → fun _). Add 'Lean backend:' prefix to set comprehension sorry messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace defensive catch-all patterns that silently swallow errors with explicit matches and raises, and use the type system to eliminate check-then-re-extract anti-patterns. Silent error masking fixes: - val_def: explicit Let_inline handling instead of catch-all comment - class_method_visible: raise on missing class method (was: assume visible) - val_is_visible: raise on Let_inline in instance body (was: assume visible) - Remove 5 dead catch-alls on src_t_aux/val_def (OCaml now enforces exhaustiveness) - Te_abbrev in generate_beq_ord_instances: raise instead of silent emp - Remove dead catch-all in indreln_typ (was: redundant-case warning) Type system improvements: - is_mutual_record_type (bool) -> mutual_record_path (Path.t option): eliminates re-destructuring Types.Tapp at both Record/Recup call sites and removes two 'unreachable' raises - is_abbreviation/is_record (bool) -> classify at dispatch site: type_def_abbreviation/type_def_record take pre-extracted data instead of re-destructuring Seplist.hd with 'unexpected form' raises - Mutual block abbreviation handling: single-pass fold_right extracts Te_abbrev data during partitioning; render_abbrev takes extracted tuple directly instead of re-matching with 'unreachable' raise Also: remove unused l_unk variable, stray ;; from deleted functions. lean_backend.ml now compiles with zero warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- extra_constraints_for_tyr_subst: replace try/with _ -> None with type_defs_lookup_tc (returns option, no exception swallowing) - src_t_has_fn: replace try/with _ -> false with type_defs_lookup_tc; make all src_t_aux variants explicit (Typ_wild, Typ_var, Typ_len, Typ_backend) instead of catch-all - texp_can_derive_beq: explicit Te_opaque | Te_abbrev instead of catch-all - Tuple match: extract (arity, elements) in one step instead of computing tup_arity, checking is_tuple_match bool, then re-destructuring Tup - Remove unused l_unk variable from extra_constraints_for_tyr_subst Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add tnvar_kind (Tn_A -> "Type", Tn_N -> "Nat") helper next to tnvar_to_string, replacing 4 identical inline matches - src_t_has_fn: replace try/with _ -> false with type_defs_lookup_tc (same fix as extra_constraints_for_tyr_subst in prior commit) - Make src_t_has_fn and texp_can_derive_beq exhaustive on their respective closed types Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New Lem declaration that adds an import to the generated Lean file:
declare {lean} extra_import `CerbCtypeInstances`
This makes hand-written Lean files (with real typeclass instances)
visible to generated modules that need them. Solves the compile-time
instance resolution problem: Lean resolves instances when the consuming
module compiles, so the override must be in scope at that point.
Place the declaration in the CONSUMING .lem file (not the type
definition file) to avoid circular imports.
Implementation: lexer token, parser rule, AST variant, typecheck
passthrough, lean_backend adds to lean_collected_imports. Other
backends emit the declaration as-is for human-readable targets.
Adds test: TestExtraImportHelper.lean + test_modules.lem Section 7.
Updates backend_lean.md documentation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lean 4 CSEs pure function calls, breaking target_rep functions that are
secretly effectful (mutable counters, mutable globals). The new
'effectful' declaration tells the backend to wrap each call site in
runEffectful(...), which extracts a BaseIO result and prevents CSE.
Syntax: declare {lean} effectful val fresh_int
The target_rep function should return BaseIO α. runEffectful (defined
in LemLib via axiom + @[implemented_by]) extracts the result safely.
Implementation: new 'effectful' field on const_descr (Targetset.t),
populated by Decl_effectful_decl. Lean backend checks the flag during
function application rendering and wraps in runEffectful when set.
Adds LemLib.runEffectful helper (axiom + unsafe implemented_by).
Adds Section 7 test in test_target_reps.lem with effectful target_rep.
Design note at doc/notes/2026-04-12_effectful_target_reps.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@[extern] and axiom+@[implemented_by] are BOTH CSE'd by Lean when applied to the same constant argument. The fix: runEffectful takes (Unit → BaseIO α) instead of (BaseIO α). Each call site wraps in fun () => ..., creating a fresh lambda closure. Lean does not CSE lambda allocations, so each call is evaluated independently. LemLib stays pure Lean — no C files needed. The runEffectful_impl uses unsafeCast to extract the BaseIO result, hidden behind axiom + @[implemented_by]. Removes the @[extern] declaration and native/ C directory. Updates backend to generate thunk wrapping at effectful call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a new backend targeting Lean 4, allowing Lem definitions to be exported as Lean 4 code. This brings the set of supported targets to: OCaml, Coq, HOL4, Isabelle/HOL, Lean 4, LaTeX, and HTML.
The backend is structurally modelled on the Coq backend, adapted for Lean 4 syntax and semantics. Tested against Lean 4.28.0 via the Lake build system.
What's included
src/lean_backend.ml(~2400 lines) — main backend translating Lem AST to Lean 4 syntaxlean-lib/—LemLibLean 4 runtime library (~720 lines): sets, maps, comparisons, numeric utilities,BitVecoperations for machine wordslibrary/lean_constants— Lean 4 reserved words and typeclass namesTarget_leanchecks — see "Shared code safety" below)doc/manual/backend_lean.mdand updates toREADME.mdmake lean-libstarget generating Lean library files from Lem's standard libraryscripts/lean_coverage.sh— bisect_ppx coverage script used to identify and close untested codepaths (~87% line coverage onlean_backend.ml)Test suite
BitVec), cross-module imports, and moreexamples/cpp/— C++ concurrency model (Cmm.lean, ~1930 generated lines, 34 Lake jobs, 0 errors)examples/ppcmem-model/— PowerPC memory model (10 .lem files, 10/10 compile, 43 Lake jobs, 0 errors)Notable design decisions
Meta_utf8variant inoutput.mlfor correct encoding of→,×,∀,∃export TypeName (Ctor1 Ctor2 ...)after eachinductive— Lean'sopenis file-local, soexportis needed for importers to see constructorspartial defunless adeclare {lean} termination_argument = automaticannotation is present; 10 library functions were annotated total, and 3 more redirect to total LemLib wrappers. Only 2 genuinely partial defs remain (unfoldr,leastFixedPointUnbounded)deriving BEq, Ord: auto-derived for simple types (non-mutual, no function-typed constructor args);sorry-based instances as fallback for mutual types (Lean'sderivingdoesn't support mutual inductives)mwordmaps to Lean'sBitVecwith 36 operations implemented in LemLib;int32/int64use distinct newtype wrappers (LemInt32/LemInt64)rename_top_level.mlseeds constant renaming with type names to handle collisions. Cross-module names included via fullenv.t_envscanlean_constantsincludesAdd,Sub,Mul, etc. to prevent clashes with Lean stdlib==(BEq) converted to=(Prop) in inductive relation antecedents, handling both Lem AST decomposition pathsis_lean_pattern_matchinpatterns.mltriggers guard-based desugaring instead of unsupported n+k patternsShared code safety
Changes to shared source files are guarded to only affect the Lean backend:
typed_ast_syntax.ml: class path collection inused_types— only consumed by Lean-specific code inrename_top_level.mlrename_top_level.ml:Tc_classrenaming returns early for non-Lean targetstarget_trans.ml: class names added to avoid set only when target isTarget_leanoutput.ml: block token type kept asKwd(preserves Coq/HOL/Isabelle spacing); UTF-8 encoding fix (of_string) is universal but correcttarget_binding.ml:e_envfallback only fires when primary lookup fails (low risk, all targets)Verified: all 6 non-Lean backends (OCaml, Coq, HOL, Isabelle, HTML, LaTeX) produce byte-for-byte identical output on all test files compared to master. Zero regressions.
Known limitations (won't fix)
partial defin generated library:unfoldr(user-supplied termination),leastFixedPointUnbounded(iterates until fixpoint). Correctly partial.sorry-based instances for mutual types: Leanderivingcan't handle mutual inductives. Documented with/- mutual type -/comments.nat/naturalboth map toNat,int/integerboth toInt. Inherent to Lem's design, same in all backends.BEqinstances: fromEq0,SetType,MapKeyType— three paths toBEq a. Lem typeclass design tradeoff.coq_backend_skips.lem: non-positive inductive — fundamental Lean restriction.Test plan
make -C src— compiler builds cleanlylake buildsucceeds (57 Lake jobs)lake buildinlean-lib/— LemLib compiles (33 Lake jobs)make lean-libs— all library files generated successfullyexamples/cpp/—Cmm.leangenerates and compiles (34 Lake jobs)examples/ppcmem-model/— 10/10 files generate and compile (43 Lake jobs)🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com