Add CSE verification script with KCFG minimization and K module generation #668

Stevengre · 2025-08-29T07:10:02Z

Summary

This PR adds a Python script for verifying Rust functions using KMIR, with support for KCFG minimization and K module generation.

Features

Flexible verification entry point: Can verify any function, not just main
KCFG minimization: Reduces redundant nodes in the proof graph
K module generation: Exports reusable K modules from proofs
Comprehensive reporting: Generates proof shows, summaries, and statistics

Test Results

Tested with cse_call_add1_1time.rs:

add1 function:
- Verification result: Failed (expected - has stuck branch for potential overflow)
- KCFG minimization: 7 nodes/3 edges → 6 nodes/2 edges
- Generated 80KB K module
main function:
- Verification result: Passed
- KCFG minimization: 3 nodes/1 edge (already optimal)
- Generated 53KB K module

Usage

# Verify add1 function
uv --directory kmir run python verify_add1.py add1

# Verify main function  
uv --directory kmir run python verify_add1.py main

Output Files

The script generates the following in add1_proof/:

.show - Original proof show output
.minimized.show - Minimized proof show output
.k - K module for reuse
.summary - Verification statistics
Proof data in subdirectories

Implementation Details

Uses APRProof.minimize_kcfg() to reduce proof complexity
Generates K modules via KCFG.to_module() with proper defunctionalization
Tracks minimization statistics for optimization analysis

🤖 Generated with Claude Code

- Add 5 tests with varying function call counts (1, 10, 100, 1000, 10000) - Tests evaluate composable symbolic execution with repeated function calls - All tests use loop-based implementation for scalability - Set depth limit to 50 for controlled symbolic execution 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Added pytest-timeout dependency to dev requirements - Set default 300s timeout for test-integration target in Makefile - Allows overriding timeout via TEST_ARGS parameter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

This script demonstrates how to: - Use KMIR to verify Rust functions with arbitrary entry points - Generate and save APRProof objects - Create proof show output for visualization - Support both add1 and main function verification The script correctly handles both passing (main with assertions) and failing/stuck (add1 with potential overflow) verification cases. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Added new features to the verification script: - KCFG minimization using apr_proof.minimize_kcfg() to reduce redundant nodes - K module generation from minimized KCFG using to_module() - Save both original and minimized proof show outputs - Track minimization statistics in summary file Results show: - add1 function: reduced from 7 nodes/3 edges to 6 nodes/2 edges - main function: already optimal at 3 nodes/1 edge (no reduction) - Generated K modules saved as .k files for reuse This enhancement enables more efficient proof representation and provides reusable K modules for future verification work. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Changed from Python object string representation to proper K syntax: - Use kmir.pretty_print() to format KFlatModule as valid K code - Reduced file sizes significantly (80KB→19KB, 53KB→13KB) - Generated modules now follow K language syntax with proper formatting - Modules can be directly imported/used in other K specifications The generated .k files now contain valid K module definitions with: - Proper module/endmodule structure - Correctly formatted rules with labels and priorities - K cell structure with proper XML-like notation - Valid K term syntax instead of Python repr strings 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

execution

Stevengre · 2025-08-29T09:45:52Z

The initial state that prove_rs generated is too concrete for a function call, and we need to make it abstract again after generating the rule.
The final state that ends the program execution is also not suitable for summarizing a fucntion call, we might want something with a Return.
The pretty-printer is not good at printing a syntax-correct, and should be refactored after generating the rules
The module structure is not easy to integrate summary rules.
To generate a fast decision tree in LLVM backend, we need rekompile the semantics.
Maybe we can make function call more pure by getting the value of the args first and set it into locals. For now, if we want a function call, we have to view the stack first. I want something like RValues ~> #callAux => #call .... Then, it's easier to summarize a function call without viewing the state.
need an option to know how many steps is cost to achieve the current state.

Stevengre · 2025-08-29T09:47:28Z

Here is the performance with no-summary rule of add1:

300.10s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10000time-haskell]
300.07s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell]
234.22s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-haskell]
25.33s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-haskell]
5.38s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-haskell]
3.23s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
3.09s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
2.96s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
2.33s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]

(18 durations < 0.005s hidden.  Use -vv to show these durations.)
====================================================== short test summary info =======================================================
FAILED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell] - Failed: Timeout (>300.0s) from pytest-timeout.
FAILED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10000time-haskell] - Failed: Timeout (>300.0s) from pytest-timeout.

SETUP: Macbook Air, Apple M2, 16GB, Sequoia 15.6

Stevengre · 2025-08-29T12:29:40Z

Execution time without summary rule:

======================================= slowest durations =======================================
3143.37s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell]
174.04s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-haskell]
22.40s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-haskell]
5.33s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-haskell]
3.38s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
3.21s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
2.91s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
2.49s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]

(16 durations < 0.005s hidden.  Use -vv to show these durations.)
==================================== short test summary info ====================================
FAILED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell] - ValueError: Expected ID, found: EOF
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!
=========================== 1 failed, 7 passed in 3149.25s (0:52:29) ============================

Stevengre · 2025-08-29T13:33:19Z

Execution time with summary rule

========================================================= slowest durations =========================================================
2724.73s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell]
146.96s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-haskell]
19.48s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-haskell]
4.99s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-haskell]
3.60s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
3.45s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
3.10s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
2.92s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]

(16 durations < 0.005s hidden.  Use -vv to show these durations.)
================================================== 8 passed in 2730.31s (0:45:30) ===================================================

Stevengre · 2025-08-29T13:55:32Z

Execution result without pretty_print for llvm (add1 summary):

======================================================== test session starts ========================================================
platform darwin -- Python 3.10.17, pytest-8.4.1, pluggy-1.6.0 -- /Users/steven/Desktop/projs/mir-semantics/kmir/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /Users/steven/Desktop/projs/mir-semantics/kmir
configfile: pyproject.toml
plugins: xdist-3.8.0, timeout-2.4.0, kmir-0.3.181, cov-6.2.1, mock-3.14.1, hypothesis-6.136.6, kframework-7.1.280
4 workers [4 items]
scheduling tests via WorkStealingScheduling

src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw1] [ 25%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw2] [ 50%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
[gw3] [ 75%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
[gw0] [100%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]

========================================================= slowest durations =========================================================
0.76s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
0.71s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
0.68s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
0.68s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]

(8 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================================= 4 passed in 2.11s =========================================================

Stevengre · 2025-08-29T13:58:44Z

Execution result without pretty_print for llvm (without add1 summary):

======================================================== test session starts ========================================================
platform darwin -- Python 3.10.17, pytest-8.4.1, pluggy-1.6.0 -- /Users/steven/Desktop/projs/mir-semantics/kmir/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /Users/steven/Desktop/projs/mir-semantics/kmir
configfile: pyproject.toml
plugins: xdist-3.8.0, timeout-2.4.0, kmir-0.3.181, cov-6.2.1, mock-3.14.1, hypothesis-6.136.6, kframework-7.1.280
4 workers [4 items]
scheduling tests via WorkStealingScheduling

src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw0] [ 25%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
[gw1] [ 50%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw2] [ 75%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
[gw3] [100%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]

========================================================= slowest durations =========================================================
0.92s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
0.88s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
0.87s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
0.78s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]

(8 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================================= 4 passed in 2.27s =========================================================

Stevengre · 2025-08-29T14:00:44Z

save 59 - 13 steps for each function call of add1

Stevengre · 2025-08-29T14:25:08Z

> rustc cse_call_add1_1000time.rs -o run
> time ./run
./run  0.00s user 0.00s system 70% cpu 0.007 total

Stevengre · 2025-08-29T14:27:25Z

What if we summarize the main of the 1000time and generate kore before running it with krun?

Stevengre · 2025-08-29T15:27:13Z

Actually, for the summarized program for this concrete ouput value. The time for run it is just use cat to see it?

Stevengre · 2025-08-29T16:07:57Z

run with main summary. !!!!!!

(kmir) ➜  mir-semantics git:(jh/cse) ✗ time python3 sum.py 1000 --start-symbol main
python3 sum.py 1000 --start-symbol main  4.50s user 0.16s system 99% cpu 4.668 total

Stevengre · 2025-08-29T16:52:58Z

run without main summary:

(kmir) ➜  mir-semantics git:(jh/cse) ✗ time python3 sum.py 1000 --start-symbol main
python3 sum.py 1000 --start-symbol main  1309.30s user 269.67s system 64% cpu 40:38.26 total

Stevengre and others added 12 commits August 29, 2025 01:18

updated expected

9e9502f

add original generated summary

f526873

Update kmir.md

b0ba4b0

Abstract <start-symbol>. This can be different for different program

ad6ee22

execution

depth where the summary rule should match for add1_1time.state

4f83e0d

Create add1-original for comparison

b23815e

hard to manage the file dependency, give up

8b187ce

save all the files

72a771c

last commit draft

6e9dd70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CSE verification script with KCFG minimization and K module generation #668

Add CSE verification script with KCFG minimization and K module generation #668

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025 •

edited

Loading

Uh oh!

Stevengre commented Aug 29, 2025 •

edited

Loading

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025 •

edited

Loading

Uh oh!

Stevengre commented Aug 29, 2025 •

edited

Loading

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025 •

edited

Loading

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Uh oh!

Add CSE verification script with KCFG minimization and K module generation #668

Are you sure you want to change the base?

Add CSE verification script with KCFG minimization and K module generation #668

Uh oh!

Conversation

Stevengre commented Aug 29, 2025

Summary

Features

Test Results

Usage

Output Files

Implementation Details

Uh oh!

Stevengre commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Stevengre commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Aug 29, 2025

Uh oh!

Uh oh!

Stevengre commented Aug 29, 2025 •

edited

Loading

Stevengre commented Aug 29, 2025 •

edited

Loading

Stevengre commented Aug 29, 2025 •

edited

Loading

Stevengre commented Aug 29, 2025 •

edited

Loading

Stevengre commented Aug 29, 2025 •

edited

Loading