Skip to content

Conversation

Stevengre
Copy link
Contributor

Summary

This PR adds a Python script for verifying Rust functions using KMIR, with support for KCFG minimization and K module generation.

Features

  • Flexible verification entry point: Can verify any function, not just main
  • KCFG minimization: Reduces redundant nodes in the proof graph
  • K module generation: Exports reusable K modules from proofs
  • Comprehensive reporting: Generates proof shows, summaries, and statistics

Test Results

Tested with cse_call_add1_1time.rs:

  • add1 function:
    • Verification result: Failed (expected - has stuck branch for potential overflow)
    • KCFG minimization: 7 nodes/3 edges → 6 nodes/2 edges
    • Generated 80KB K module
  • main function:
    • Verification result: Passed
    • KCFG minimization: 3 nodes/1 edge (already optimal)
    • Generated 53KB K module

Usage

# Verify add1 function
uv --directory kmir run python verify_add1.py add1

# Verify main function  
uv --directory kmir run python verify_add1.py main

Output Files

The script generates the following in add1_proof/:

  • .show - Original proof show output
  • .minimized.show - Minimized proof show output
  • .k - K module for reuse
  • .summary - Verification statistics
  • Proof data in subdirectories

Implementation Details

  1. Uses APRProof.minimize_kcfg() to reduce proof complexity
  2. Generates K modules via KCFG.to_module() with proper defunctionalization
  3. Tracks minimization statistics for optimization analysis

🤖 Generated with Claude Code

Stevengre and others added 12 commits August 29, 2025 01:18
- Add 5 tests with varying function call counts (1, 10, 100, 1000, 10000)
- Tests evaluate composable symbolic execution with repeated function calls
- All tests use loop-based implementation for scalability
- Set depth limit to 50 for controlled symbolic execution

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Added pytest-timeout dependency to dev requirements
- Set default 300s timeout for test-integration target in Makefile
- Allows overriding timeout via TEST_ARGS parameter

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
This script demonstrates how to:
- Use KMIR to verify Rust functions with arbitrary entry points
- Generate and save APRProof objects
- Create proof show output for visualization
- Support both add1 and main function verification

The script correctly handles both passing (main with assertions) and
failing/stuck (add1 with potential overflow) verification cases.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added new features to the verification script:
- KCFG minimization using apr_proof.minimize_kcfg() to reduce redundant nodes
- K module generation from minimized KCFG using to_module()
- Save both original and minimized proof show outputs
- Track minimization statistics in summary file

Results show:
- add1 function: reduced from 7 nodes/3 edges to 6 nodes/2 edges
- main function: already optimal at 3 nodes/1 edge (no reduction)
- Generated K modules saved as .k files for reuse

This enhancement enables more efficient proof representation and
provides reusable K modules for future verification work.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Changed from Python object string representation to proper K syntax:
- Use kmir.pretty_print() to format KFlatModule as valid K code
- Reduced file sizes significantly (80KB→19KB, 53KB→13KB)
- Generated modules now follow K language syntax with proper formatting
- Modules can be directly imported/used in other K specifications

The generated .k files now contain valid K module definitions with:
- Proper module/endmodule structure
- Correctly formatted rules with labels and priorities
- K cell structure with proper XML-like notation
- Valid K term syntax instead of Python repr strings

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@Stevengre
Copy link
Contributor Author

Stevengre commented Aug 29, 2025

  • The initial state that prove_rs generated is too concrete for a function call, and we need to make it abstract again after generating the rule.
  • The final state that ends the program execution is also not suitable for summarizing a fucntion call, we might want something with a Return.
  • The pretty-printer is not good at printing a syntax-correct, and should be refactored after generating the rules
  • The module structure is not easy to integrate summary rules.
  • To generate a fast decision tree in LLVM backend, we need rekompile the semantics.
  • Maybe we can make function call more pure by getting the value of the args first and set it into locals. For now, if we want a function call, we have to view the stack first. I want something like RValues ~> #callAux => #call .... Then, it's easier to summarize a function call without viewing the state.
  • need an option to know how many steps is cost to achieve the current state.

@Stevengre
Copy link
Contributor Author

Stevengre commented Aug 29, 2025

Here is the performance with no-summary rule of add1:

300.10s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10000time-haskell]
300.07s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell]
234.22s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-haskell]
25.33s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-haskell]
5.38s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-haskell]
3.23s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
3.09s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
2.96s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
2.33s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]

(18 durations < 0.005s hidden.  Use -vv to show these durations.)
====================================================== short test summary info =======================================================
FAILED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell] - Failed: Timeout (>300.0s) from pytest-timeout.
FAILED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10000time-haskell] - Failed: Timeout (>300.0s) from pytest-timeout.

SETUP: Macbook Air, Apple M2, 16GB, Sequoia 15.6

@Stevengre
Copy link
Contributor Author

Execution time without summary rule:

======================================= slowest durations =======================================
3143.37s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell]
174.04s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-haskell]
22.40s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-haskell]
5.33s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-haskell]
3.38s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
3.21s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
2.91s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
2.49s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]

(16 durations < 0.005s hidden.  Use -vv to show these durations.)
==================================== short test summary info ====================================
FAILED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell] - ValueError: Expected ID, found: EOF
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!
=========================== 1 failed, 7 passed in 3149.25s (0:52:29) ============================

@Stevengre
Copy link
Contributor Author

Execution time with summary rule

========================================================= slowest durations =========================================================
2724.73s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-haskell]
146.96s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-haskell]
19.48s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-haskell]
4.99s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-haskell]
3.60s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
3.45s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
3.10s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
2.92s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]

(16 durations < 0.005s hidden.  Use -vv to show these durations.)
================================================== 8 passed in 2730.31s (0:45:30) ===================================================

@Stevengre
Copy link
Contributor Author

Stevengre commented Aug 29, 2025

Execution result without pretty_print for llvm (add1 summary):

======================================================== test session starts ========================================================
platform darwin -- Python 3.10.17, pytest-8.4.1, pluggy-1.6.0 -- /Users/steven/Desktop/projs/mir-semantics/kmir/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /Users/steven/Desktop/projs/mir-semantics/kmir
configfile: pyproject.toml
plugins: xdist-3.8.0, timeout-2.4.0, kmir-0.3.181, cov-6.2.1, mock-3.14.1, hypothesis-6.136.6, kframework-7.1.280
4 workers [4 items]
scheduling tests via WorkStealingScheduling

src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw1] [ 25%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw2] [ 50%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
[gw3] [ 75%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
[gw0] [100%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]

========================================================= slowest durations =========================================================
0.76s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
0.71s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
0.68s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
0.68s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]

(8 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================================= 4 passed in 2.11s =========================================================

@Stevengre
Copy link
Contributor Author

Stevengre commented Aug 29, 2025

Execution result without pretty_print for llvm (without add1 summary):

======================================================== test session starts ========================================================
platform darwin -- Python 3.10.17, pytest-8.4.1, pluggy-1.6.0 -- /Users/steven/Desktop/projs/mir-semantics/kmir/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /Users/steven/Desktop/projs/mir-semantics/kmir
configfile: pyproject.toml
plugins: xdist-3.8.0, timeout-2.4.0, kmir-0.3.181, cov-6.2.1, mock-3.14.1, hypothesis-6.136.6, kframework-7.1.280
4 workers [4 items]
scheduling tests via WorkStealingScheduling

src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw0] [ 25%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]
[gw1] [ 50%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
[gw2] [ 75%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
[gw3] [100%] PASSED src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]

========================================================= slowest durations =========================================================
0.92s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1000time-llvm]
0.88s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-100time-llvm]
0.87s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-10time-llvm]
0.78s call     src/tests/integration/test_integration.py::test_exec_smir[cse-call-add1-1time-llvm]

(8 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================================= 4 passed in 2.27s =========================================================

@Stevengre
Copy link
Contributor Author

save 59 - 13 steps for each function call of add1

@Stevengre
Copy link
Contributor Author

> rustc cse_call_add1_1000time.rs -o run
> time ./run
./run  0.00s user 0.00s system 70% cpu 0.007 total

@Stevengre
Copy link
Contributor Author

What if we summarize the main of the 1000time and generate kore before running it with krun?

@Stevengre
Copy link
Contributor Author

Actually, for the summarized program for this concrete ouput value. The time for run it is just use cat to see it?

@Stevengre
Copy link
Contributor Author

Stevengre commented Aug 29, 2025

run with main summary. !!!!!!

(kmir) ➜  mir-semantics git:(jh/cse) ✗ time python3 sum.py 1000 --start-symbol main
python3 sum.py 1000 --start-symbol main  4.50s user 0.16s system 99% cpu 4.668 total

@Stevengre
Copy link
Contributor Author

run without main summary:

(kmir) ➜  mir-semantics git:(jh/cse) ✗ time python3 sum.py 1000 --start-symbol main
python3 sum.py 1000 --start-symbol main  1309.30s user 269.67s system 64% cpu 40:38.26 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant