Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,12 @@ The Python layer (`kmir.py`) bridges between SMIR JSON and K semantics:
3. Executes via K framework's `KProve`/`KRun` interfaces

### Intrinsic Functions
Intrinsic functions (like `black_box`) don't have regular function bodies. They're handled by:
Intrinsic functions (like `black_box`, `raw_eq`) don't have regular function bodies. They're handled by:
1. Python: `_make_function_map` adds `IntrinsicFunction` entries to function map
2. K: Special rules in `kmir.md` execute intrinsics via `#execIntrinsic`

**See `docs/dev/adding-intrinsics.md` for detailed implementation guide.**

## Testing Patterns

### prove-rs Tests
Expand All @@ -118,9 +120,14 @@ Tests in `kmir/src/tests/integration/data/prove-rs/` follow this pattern:
## Development Workflow

### Before Starting Any Task
1. Read README and documentation in docs/ directory first
2. Study existing development patterns and conventions
3. Understand the codebase structure before making changes
1. **Always read relevant documentation first**:
- Check `docs/` directory for guides on specific topics
- Review existing implementations of similar features
- Study test patterns in `kmir/src/tests/`
2. **Understand existing patterns**:
- Look at recent PRs for implementation examples
- Check how similar features are implemented
- Follow established conventions in the codebase

### Modifying K Semantics
1. Edit `.md` files in `kmir/src/kmir/kdist/mir-semantics/`
Expand All @@ -133,10 +140,7 @@ Tests in `kmir/src/tests/integration/data/prove-rs/` follow this pattern:
3. Test with `make test-unit`

### Adding Intrinsic Support
1. Update `_make_function_map` in `kmir.py` to recognize intrinsic
2. Add `IntrinsicFunction` constructor in `mono.md`
3. Add execution rules in `kmir.md` under `#execIntrinsic`
4. Add test in `prove-rs/` directory
See `docs/dev/adding-intrinsics.md` for complete guide with examples.

## Debugging Tips

Expand Down
204 changes: 163 additions & 41 deletions docs/dev/adding-intrinsics.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,195 @@
# Adding Intrinsics

## Overview

This guide explains how to add support for new intrinsic functions in KMIR. Intrinsics are compiler built-in functions that don't have regular MIR bodies and require special semantic rules.

## Architecture

Intrinsics are handled with direct operand passing:
1. **Direct Operand Passing**: `#execIntrinsic(symbol("name"), ARGS, DEST)` receives operands directly
2. **Pattern Matching**: Rules match on specific operand patterns for each intrinsic
3. **Operand Evaluation**: Each intrinsic rule handles its own operand evaluation as needed

See implementation in [kmir.md](../../kmir/src/kmir/kdist/mir-semantics/kmir.md)

## Development Workflow

### Step 1: Create Test File
Create `tests/rust/intrinsic/your_intrinsic.rs`:

```rust
fn main() {
let result = your_intrinsic(args);
assert_eq!(result, expected);
}
Create a Rust test file in `kmir/src/tests/integration/data/exec-smir/intrinsic/` that uses your intrinsic and verifies its behavior with assertions.

### Step 2: Add Test to Integration Suite

Add entry to `EXEC_DATA` in [test_integration.py](../../kmir/src/tests/integration/test_integration.py):

```python
(
'your_intrinsic',
EXEC_DATA_DIR / 'intrinsic' / 'your_intrinsic.smir.json',
EXEC_DATA_DIR / 'intrinsic' / 'your_intrinsic.state',
65, # Start with small depth, increase if needed
),
```

### Step 2: Generate SMIR and Verify Intrinsic Detection
### Step 3: Generate Initial State (Will Show Stuck Point)

```bash
# Generate SMIR JSON
make generate-tests-smir
# Generate initial state showing where execution gets stuck
make test-integration TEST_ARGS="-k 'exec_smir and your_intrinsic' --update-expected-output"

# This will create your_intrinsic.state showing execution stuck at:
# #execIntrinsic(symbol("your_intrinsic"), DEST)

# Update expected outputs and verify intrinsic is detected
make test-unit TEST_ARGS="--update-expected-output"
# Save this for comparison
cp kmir/src/tests/integration/data/exec-smir/intrinsic/your_intrinsic.state \
your_intrinsic.state.initial
```

Check `tests/expected/unit/test_smir/test_function_symbols/your_intrinsic.expected.json` to confirm the intrinsic appears as `IntrinsicSym`.
### Step 4: Implement K Semantics Rule

### Step 3: Run Initial Integration Test
Add rules to [kmir.md](../../kmir/src/kmir/kdist/mir-semantics/kmir.md).

To find implementation patterns for your intrinsic:
```bash
# Run test and update expected output (will show stuck at intrinsic call)
make test-integration TEST_ARGS="-k your_intrinsic --update-expected-output"
# Search for existing intrinsic implementations
grep -A10 "#execIntrinsic(symbol" kmir/src/kmir/kdist/mir-semantics/kmir.md

# Backup the initial state for comparison
cp tests/expected/integration/test_exec_smir/intrinsic_your_intrinsic.state \
tests/expected/integration/test_exec_smir/intrinsic_your_intrinsic.state.backup
# Look for helper functions and evaluation patterns
grep -B2 -A5 "seqstrict" kmir/src/kmir/kdist/mir-semantics/kmir.md
```

### Step 4: Implement K Rule
Edit `kmir/src/kmir/kdist/mir-semantics/kmir.md`:
Study existing intrinsics like `black_box` (simple value operation) and `raw_eq` (reference dereferencing with helper functions) as reference implementations.

```k
rule <k> #execIntrinsic(mirString("your_intrinsic"), ARGS, DEST) =>
/* your implementation */
... </k>
```
### Step 5: Add Documentation

Document your intrinsic in the K semantics file with its implementation.

### Step 6: Rebuild and Test

### Step 5: Rebuild and Test
```bash
# Rebuild K semantics
make build

# Run test again
make test-integration TEST_ARGS="-k your_intrinsic --update-expected-output"
# Run test again and update the state with working implementation
make test-integration TEST_ARGS="-k 'exec_smir and your_intrinsic' --update-expected-output"

# Compare results
diff tests/expected/integration/test_exec_smir/intrinsic_your_intrinsic.state.backup \
tests/expected/integration/test_exec_smir/intrinsic_your_intrinsic.state
# Compare to see the progress
diff your_intrinsic.state.initial \
kmir/src/tests/integration/data/exec-smir/intrinsic/your_intrinsic.state
```

The diff should show progress past the intrinsic call if implementation is correct.
### Step 7: Verify Both Backends

### Step 6: Verify Results
Ensure the test completes successfully and the intrinsic behaves as expected.

## Example: black_box
```bash
# Test with both LLVM and Haskell backends
make test-integration TEST_ARGS="-k 'exec_smir and your_intrinsic'"

Initial state (before rule):
# Both should pass with consistent results
# If not, you may need backend-specific state files
```
#setUpCalleeData ( IntrinsicFunction ( mirString ( "black_box" ) ) , ...)

## Examples

For implementation examples, see existing intrinsics in [kmir.md](../../kmir/src/kmir/kdist/mir-semantics/kmir.md).

## Common Patterns

### Pattern 1: Direct Operand Evaluation
Use when the intrinsic needs to evaluate its operands directly.

### Pattern 2: Reference Dereferencing with #withDeref
Use the `#withDeref` helper function to add dereferencing to operands.

### Pattern 3: seqstrict for Automatic Evaluation
Use `[seqstrict(2,3)]` attribute to automatically evaluate operand arguments.

### Pattern 4: Pattern Matching on Operands
Match specific operand patterns directly in the `#execIntrinsic` rule.

## Testing Best Practices

1. **Start Simple**: Test with primitive types first
2. **Save Initial State**: Keep the stuck state for comparison
3. **Use Correct Test Filter**: Always use `exec_smir and your_intrinsic` to ensure correct test runs
4. **Check Both Backends**: Ensure LLVM and Haskell produce same results
5. **Document Limitations**: Note what cases aren't handled yet
6. **Create Issue for Future Work**: Track enhancements needed (like #666 for `raw_eq`)

## Debugging Tips

### Check Execution State
```bash
# See where execution is stuck with verbose output
make test-integration TEST_ARGS="-k 'exec_smir and your_intrinsic' -vv"

# Look for the K cell content to see what values are present
```

After implementing rule:
### Understanding the State File
The state file shows the complete execution state. Key sections to check:
- `<k>`: Shows current execution point
- `<locals>`: Shows local variable values
- `<functions>`: Should contain `IntrinsicFunction(symbol("your_intrinsic"))`

### Verify Intrinsic Recognition
```bash
# Check SMIR JSON to confirm intrinsic is recognized
cat kmir/src/tests/integration/data/exec-smir/intrinsic/your_intrinsic.smir.json | grep -A5 your_intrinsic
```
Program continues execution with value 11 passed through
```

## Common Issues

### Issue: Wrong test runs
**Solution**: Use `-k 'exec_smir and your_intrinsic'` to ensure `test_exec_smir` runs, not other tests.

### Issue: "Function not found"
**Solution**: The intrinsic should be automatically recognized if it appears in SMIR. Check the SMIR JSON to confirm.

### Issue: Execution stuck at `#execIntrinsic`
**Solution**: Your rule pattern doesn't match. Check:
- The exact intrinsic name in the symbol
- Value types on K cell (use `ListItem(VAL:Value)` to match any value)
- Number of arguments expected

### Issue: Recursion/infinite loop
**Solution**: Use helper functions to separate evaluation stages, avoid calling `#execIntrinsic` within itself.

### Issue: Different backend results
**Solution**:
- Increase execution depth if needed
- Check for backend-specific evaluation order issues
- May need backend-specific expected files (`.llvm.state`, `.haskell.state`)

### Issue: Test timeout
**Solution**:
- Start with smaller depth (e.g., 65 instead of 1000)
- Optimize your rule to avoid unnecessary computation
- Check for infinite loops in your implementation

## Important Notes

### Direct Operand Passing
The current architecture passes operands directly to intrinsic rules:
- **Direct Access**: Rules receive operands directly in `#execIntrinsic`
- **Custom Evaluation**: Each intrinsic controls its own operand evaluation
- **Better Indexing**: K can better index rules with explicit operand patterns

### When to Use Helper Functions
Always use a helper function when:
- You need automatic operand evaluation (with `seqstrict`)
- The logic is complex enough to benefit from separation
- You want to transform operands before evaluation (like with `#withDeref`)

### Testing Strategy
1. Write the test with expected behavior first
2. Generate initial state to see where it gets stuck
3. Implement the minimal rule needed
4. Update state to verify progress
5. Iterate to handle edge cases
6. Document limitations for future work

## References

- Recent intrinsic PRs in repository history
- [Rust Intrinsics Documentation](https://doc.rust-lang.org/std/intrinsics/)
46 changes: 45 additions & 1 deletion kmir/src/kmir/kdist/mir-semantics/kmir.md
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,51 @@ its argument to the destination without modification.

```k
// Black box intrinsic implementation - identity function
rule <k> #execIntrinsic(symbol("black_box"), ARG .Operands, DEST) => #setLocalValue(DEST, ARG) ... </k>
rule <k> #execIntrinsic(symbol("black_box"), ARG:Operand .Operands, DEST)
=> #setLocalValue(DEST, ARG)
... </k>
```

#### Raw Eq (`std::intrinsics::raw_eq`)

The `raw_eq` intrinsic performs byte-by-byte equality comparison of the memory contents pointed to by two references.
It returns a boolean value indicating whether the referenced values are equal. The implementation dereferences the
provided References to access the underlying values, then compares them using K's built-in equality operator.
This intrinsic is typically used for low-level memory comparison operations where type-specific equality methods
are not suitable.

**Current Limitations:**
The current implementation only handles the simple case where References point to values of the same type.
More complex scenarios require additional testing and implementation work:
- References to different types with the same memory representation
- References to composite types (structs, arrays, enums)
- References with different alignments or padding

Handling different types may require converting values to their byte representations before comparison,
which will need to be addressed when such use cases are encountered.

```k
// Raw eq intrinsic - byte-by-byte equality comparison of referenced values

// Helper function to append projectionElemDeref to an operand
syntax Operand ::= #withDeref(Operand) [function, total]
rule #withDeref(operandCopy(place(LOCAL, PROJ)))
=> operandCopy(place(LOCAL, projectionElemDeref PROJ))
rule #withDeref(operandMove(place(LOCAL, PROJ)))
=> operandMove(place(LOCAL, projectionElemDeref PROJ))
rule #withDeref(OP) => OP [owise]
Comment on lines +658 to +664
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should append, not prepend... it has to use the appendP helper function (defined somewhere in data.md)


// Handle raw_eq intrinsic by dereferencing operands
rule <k> #execIntrinsic(symbol("raw_eq"), ARG1:Operand ARG2:Operand .Operands, PLACE)
=> #execRawEq(PLACE, #withDeref(ARG1), #withDeref(ARG2))
... </k>

// Execute raw_eq with operand evaluation via seqstrict
syntax KItem ::= #execRawEq(Place, KItem, KItem) [seqstrict(2,3)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syntax KItem ::= #execRawEq(Place, KItem, KItem) [seqstrict(2,3)]
syntax KItem ::= #execRawEq(Place, Evaluation, Evaluation) [seqstrict(2,3)]

rule <k> #execRawEq(DEST, VAL1:Value, VAL2:Value)
=> #setLocalValue(DEST, BoolVal(VAL1 ==K VAL2))
Copy link
Member

@jberthold jberthold Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtlety here that we have to address at some point: The VAL1 and VAL2 may be "equal" in our Value sort but not actually in Rust's low-level representation. For example, if we have

enum MyEnum {
    A(u64),
    B(i64),
}

and

struct MyStruct(u64);

then A(42) and MyStruct(42) would both be Aggregate(variantIdx(0), ListItem(Integer(42, 64, false))) in kmir but the byte layout has to store a tag for MyEnum to decide which variant it is. (or the other way round, we store a redundant variantIdx(0) for structs that is not needed in rustcs byte layout)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could compare the layouts (stable-mir-json is extracting them but they are not currently in the MIR-AST in K), but that will become complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what variantIdx(0) is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To distinguish this case, we need to store the original type of the Aggregate? Maybe I can add a test case for this situation.

As I mentioned this issue #666, should we tackle all the features in this PR? What kind of case can be moved to the furture work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(PS. Apologies. I'm trying to type in English directly without AI rephrasing for faster response. But I may use some wrong grammar and words.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could compare the layouts (stable-mir-json is extracting them but they are not currently in the MIR-AST in K), but that will become complex.

Maybe we should have a way to provide python hooks for K?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this feature will be used in p-token, @dkcumming? If not, maybe we should move this to future work.

And for this case, I'm curious about the real ouput of this raw_eq check. If this will return a false, then what's the purpose when they use raw_eq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first version, we could only implement raw_eq for operands whose TypeInfo is the same.
(or maybe not the type ID, but the type when looked up in the type table, but there might be other IDs in that type recursively).
The type ID is stored with the local but we don't have it for values with projection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this issue touches the deep about what our architecture should look like, including:

  1. how to tackle type info? --- If we use @ehildenb 's plan to traverse the types with its real identifier before sending it to the intial state, this problem should be easy to tackle. just store the name of the place that the reference pointed to. And we don't need to handle the type table anymore.
  2. what the reference type should be? --- should we have the information about the place that the reference pointed to, or just deref and check the type?
  3. Is Enum and Struct will be the same structure?

BTW, it looks weird with raw_eq in p-token to me, because it tries to compare things in bytes. Is that a problem caused by the float? (Or just because it uses so many unsafe ...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code in p-token compares arrays of bytes ([u8; 8] is how an "amount" is stored). Just guessing here, though.

... </k>

```

### Stopping on Program Errors
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#![feature(core_intrinsics)]

use std::intrinsics::raw_eq;

// NOTE: This test demonstrates the raw_eq intrinsic implementation.
// Known issue: Haskell backend produces more verbose symbolic execution output
// compared to LLVM backend due to different handling of symbolic branches.
// Both backends correctly implement the intrinsic but output format differs.
Comment on lines +5 to +8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? Then the test should not pass because both HS backend and LLVM backend are tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just changed the depth to investigate what the problem it is. And the current depth + 1 (or 3) will generate different states from different backends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the LLVM backend is taking one branch and the HS backend explores more than one branch (non-deterministic branching). That should be investigated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that is a switchBranch. Maybe invetigate this in another PR? What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a problem with one of the new rules. Maybe related to using KItem instead of Evaluation. If it branches a lot we should probably not merge it without understanding why. The p-token proofs would start branching, too


fn main() {
let a = 42;
let b = 42;

// Test raw_eq intrinsic with identical values
let result = unsafe { raw_eq(&a, &b) };

// This assertion should pass as both values are identical
assert!(result);
}

Large diffs are not rendered by default.

Loading