Skip to content

Latest commit

 

History

History
307 lines (239 loc) · 11.8 KB

File metadata and controls

307 lines (239 loc) · 11.8 KB

WARP - WASM Analysis & Reconstruction Platform

WARP is currently a Soroban-focused WASM decompiler that transforms compiled smart contracts into human-readable Rust source code.

The broader Analysis & Reconstruction Platform direction remains the long-term roadmap, but decompilation quality and reconstruction fidelity are the main priorities today.

The Problem

Soroban smart contracts compile to WebAssembly (WASM) binaries. Existing generic WASM tools (wasm2wat, wasm-decompile) produce output that is essentially useless for understanding contract logic:

// What wasm2wat gives you:
(func $7 (param i64 i64 i64) (result i64)
  local.get 0
  local.get 1
  i64.const 4294967297
  call $0
  i64.const 2
  call $3 ...)

// What wasm-decompile gives you:
function transfer(a: long, b: long, c: long): long {
  return f_3(f_0(a, b, 4294967297), 2);
}

These tools have no understanding of Soroban's type system, host functions, SDK patterns, or the rich metadata embedded in the WASM binary. They can't tell you that 4294967297 is actually Symbol("Balance"), that call $0 is env.storage().persistent().get(), or that the function transfers tokens between two addresses.

What WARP Produces

#[contracttype]
pub enum DataKey {
    Admin,
    Balance,
    TxCount,
}

#[contract]
pub struct Contract;

#[contractimpl]
impl Contract {
    pub fn deposit(env: Env, from: Address, amount: i128) {
        from.require_auth();
        let balance = env.storage().instance().get(&symbol_short!("Balance"));
        env.storage().instance().set(&symbol_short!("Balance"), &(balance + amount));
    }
}

Contract spec metadata provides function names, parameter names and types, and struct/enum definitions. Soroban host calls are recovered to idiomatic SDK patterns. Expression inlining and dead code elimination produce clean, readable output.

How It Works

WARP uses xDSL (a Python MLIR framework) to progressively raise the WASM binary through multiple abstraction levels, leveraging Soroban-specific knowledge at each stage:

WASM Binary (.wasm)
    │
    ▼  Parse + lift to SSA
WasmSSA Dialect (MLIR)          ← WASM instructions in SSA form
    │
    ▼  Optimization passes       ← Stack frame hiding, local promotion,
    │                               constant folding, expression inlining,
    │                               dead code elimination
    │
    ▼  Recognize Soroban semantics
Soroban Dialect (MLIR)          ← Host calls, symbol decoding, env recognition
    │
    ▼  Emit Rust
Rust Source Code (.rs)          ← Human-readable output

Why This Works for Soroban

Soroban WASM binaries contain a goldmine of metadata that generic decompilers ignore:

Source What It Provides
contractspecv0 custom section Function names, parameter names & types, struct/enum definitions, error variants
contractenvmetav0 section Env interface version → exact host function mapping
WASM import table ~150 known host functions with documented signatures
Val tag bits Runtime type information encoded in every 64-bit value
SDK compilation patterns Recognizable instruction sequences for storage, auth, events, etc.

A generic decompiler sees call $3 with i64 arguments. WARP sees env.storage().persistent().set(&key, &value) with full type annotations.

Architecture

The project defines two MLIR dialects and a Rust code emitter:

WasmSSA Dialect (dialects/wasm_ssa.py)

Represents WASM semantics in SSA form. Eliminates the WASM operand stack, making data flow explicit. Covers all integer arithmetic, comparisons, memory ops, locals/globals, control flow (block, loop, if, br, br_if), calls, and conversions.

Soroban Dialect (dialects/soroban.py)

Models Soroban-specific operations:

  • HostCallOp — Typed host function calls with module/function resolution
  • SymbolConstOp — Decoded Soroban symbol constants (symbol_short!("..."))

Optimization Passes (passes/)

Pass What It Does
stack_frame.py Removes $global_0 stack pointer alloca/dealloca boilerplate, recovers frame memory to locals
local_promotion.py Promotes local.get/local.set chains to direct SSA value use
constant_folding.py Folds constant arithmetic at compile time
dce.py Dead code elimination
env_recognition.py Resolves WASM imports to named Soroban host functions
symbol_decoding.py Decodes packed i64 symbol constants to readable strings

Rust Emitter (codegen/rust_emitter.py)

Translates the optimized MLIR to Rust source code with:

  • Expression inlining — Collapses chains of constants, arithmetic, and variable reads into compound expressions
  • Dead local elimination — Omits write-only locals and their assignments
  • Boolean condition detection — Omits != 0 tests for comparison results
  • Trailing return elision — Omits redundant return; at end of void functions
  • Assert pattern recovery — Collapses panic_fn(...); unreachable!() into panic!()
  • Soroban SDK idiomsrequire_auth(), env.storage(), env.events(), etc.
  • Contract spec integration — Uses spec metadata for parameter names, types, struct/enum definitions

Project Status

Current phase: Proof of Concept

This repository is an advanced proof of concept developed in response to the Stellar Community Fund RFP: Soroban Specialized Reverse Engineering Tool. It demonstrates the viability and technical approach of a Soroban-aware decompiler, with very promising early results.

The core pipeline (WASM parsing → SSA lifting → optimization passes → Rust emission) is functional end-to-end. Significant work is planned to bring this to production quality, including stronger type reconstruction, expanded SDK pattern recognition, 90%+ AST reconstruction accuracy, and full ecosystem integration.

Platform expansion remains on the roadmap once Soroban decompilation quality reaches production grade.

Building

Prerequisites

  • Python 3.10+
  • uv (recommended) or pip

Install

# Clone and install with uv
uv sync

# Or with pip
pip install -e ".[dev]"

Usage

Preview: hello_world.wasm decompiled output versus original source (../my-demo/contracts/hello-world/src/lib.rs).

Original (lib.rs):

/// Deposit - only admin can
pub fn deposit(env: Env, caller: Address, amount: i64) -> i64 {
    log!(&env, ">>> deposit(caller={}, amount={})", caller, amount);
    caller.require_auth();

    assert!(amount > 0, "amount must be positive");

    _check_admin(&env, &caller);
    let new_bal = _update_balance(&env, amount);
    let tx_num = _increment_tx_count(&env);

    // Application event
    env.events()
        .publish((symbol_short!("deposit"),), (amount, new_bal, tx_num));

    log!(&env, "<<< deposit DONE, balance={}, tx#{}", new_bal, tx_num);
    new_bal
}

warp generated:

pub fn deposit(env: Env, caller: Address, amount: i64) -> i64 {
    let mut local_2: i32;
    let mut local_6: i32;
    let mut local_7: i64;
    default();
    'blk0: {
        'blk1: {
            if local_6 == 1 { break 'blk1; }
            local_0 = local_7;
            if local_6 == 1 { break 'blk1; }
            local_1 = local_7;
            local_0 = local_2 + 24;
            local_7 = local_2 + 32;
            log!(local_2 + 95, ">>> deposit(caller={}, amount={})", local_2 + 24, local_2 + 32);
            local_2 + 24.require_auth();
            if 0 != 0 { break 'blk0; }
            _check_admin(local_2 + 95, local_2 + 24);
            let v28 = _update_balance(local_2 + 95, local_1);
            local_0 = v28;
            let v29 = _increment_tx_count(local_2 + 95);
            local_7 = local_0;
            let v30 = local_2 + 95.events().publish(local_2 + 80, local_2 + 56);
            local_1 = local_2 + 40;
            local_7 = local_2 + 52;
            log!(local_2 + 95, "<<< deposit DONE, balance={}, tx#{}", local_2 + 40, local_2 + 52);
            return local_2 + 80;
        }
        return;
    }
    unreachable!();
}
# Decompile to Rust
warp --emit-rust contract.wasm

# Write output to file
warp --emit-rust contract.wasm -o output.rs

# Emit MLIR (for debugging the pipeline)
warp --emit-mlir contract.wasm

# Emit WasmSSA before optimization passes
warp --emit-wasm-ssa contract.wasm

# Inspect contract metadata
warp --emit-spec contract.wasm
warp --emit-imports contract.wasm
warp --emit-exports contract.wasm
warp --emit-stats contract.wasm

# Verbose mode (shows pass pipeline)
warp --emit-rust contract.wasm -v

Testing

Install dev dependencies and run the FileCheck test suite:

# With uv (recommended)
uv sync
uv run lit tests/filecheck/ -v

# Or with pip
pip install -e ".[dev]"
lit tests/filecheck/ -v

Tests use a custom WASM text assembler (tests/tools/gen-test-wasm.py) to build .wasm binaries inline, then pipe through warp and verify output with FileCheck patterns.

Multi-Chain Vision

The architecture is extensible, but this is roadmap work after the decompilation use case is stronger. The WasmSSA layer, optimization passes, and Rust code emitter are shared infrastructure. Adding a new blockchain target requires only:

  1. A chain-specific MLIR dialect (e.g., stylus, solana)
  2. Chain-specific recognition passes
  3. Chain-specific metadata parsers
                     ┌─────────────────┐
                     │   WASM Binary    │
                     └────────┬────────┘
                              │
                     ┌────────▼────────┐
                     │  WasmSSA (shared)│
                     └────────┬────────┘
                              │
            ┌─────────────────┼─────────────────┐
            │                 │                  │
   ┌────────▼───────┐ ┌──────▼──────┐ ┌────────▼────────┐
   │ Soroban Dialect │ │   Stylus    │ │    Solana      │  ...
   └────────┬───────┘ └──────┬──────┘ └────────┬────────┘
            │                 │                  │
            └─────────────────┼─────────────────┘
                              │
                     ┌────────▼────────┐
                     │  Rust Emitter   │
                     │    (shared)     │
                     └────────┬────────┘
                              │
                     ┌────────▼────────┐
                     │  Rust Source     │
                     └─────────────────┘

Technology

Component Technology
Core framework xDSL (Python MLIR)
WASM parsing wasm-tob
XDR/spec decoding stellar-sdk
Package manager uv + hatchling
Testing LIT + FileCheck

License

Apache 2.0

Related Work

  • WABT — WebAssembly Binary Toolkit (wasm2wat, wasm-decompile)
  • xDSL — Python-native MLIR framework
  • Soroban SDK — Stellar's smart contract SDK
  • RELLIC — LLVM IR to C decompiler (similar progressive raising concept)