Skip to content

Latest commit

 

History

History
159 lines (121 loc) · 8.31 KB

File metadata and controls

159 lines (121 loc) · 8.31 KB

FAIR DSL Design Conversation — Summary for Continuation

Project Context

We are building a library to convert Jimple three-address code (as generated by SootUp) into FAIR (Fixpoint-aware Abstract Intermediate Representation), an IR for static analyses (not for the concrete program being analyzed). A FAIR program is the specialized, executable form of an analysis A on a particular program P.

The FAIR paper is dated March 14, 2026, and describes:

  • Core data-flow instructions: copy, set_top, set_bottom, set_const, apply, assume, load, store, goto, call, ret
  • Context statements: ctx_init, ctx_extend, ctx_klimit, ctx_merge
  • Fixpoint control: widen
  • A foreach/fold construct for iteration over sets (e.g. access paths)

What We Designed: A Kotlin DSL

A Kotlin-builder DSL that allows defining the semantics of a static analysis using:

  1. A lattice (induced by an abstract domain and merge operator)
  2. Data-flow functions on the Jimple expression level — how evaluation of a Jimple expression by the static analysis transforms abstract-domain values

Key Design Principle: Expression-Level, Not Statement-Level

The DSL defines flow functions keyed by Jimple expression types, not statement types. The converter handles structural statement→FAIR mapping (e.g., JAssignStmt with an InvokeExpr on the RHS wires the FAIR apply result into the LHS variable). The DSL only says what the expression evaluation means in the abstract domain.

Two-Phase Callback Architecture

Phase 1 — Conversion time (Jimple→FAIR):

  • Converter walks each Jimple statement, finds its expressions
  • Calls analysis.processExpression(expr) — a single, uniform entry point
  • Inside, matcher predicates fire as callbacks (e.g., matching { it.nameIs("getParam") })
  • First match determines which DomainOperations to embed in FAIR apply statements
  • Returns List<FairApply<T>> or empty list if expression is irrelevant

Phase 2 — Interpretation time (FAIR execution):

  • FAIR interpreter executes FairApply.execute(env)
  • This calls back into DomainOperation.execute(dest, args, env)
  • The operation reads/writes the AbstractEnvironment<T> via lattice operations

Expression Handler Type Hierarchy

Mirrors SootUp's Jimple expression tree. Each handler holds a default DomainOperation (generic, lattice-only):

ExprHandler<T>                    — abstract base (holds defaultOp)
├── UnaryExprHandler<T>           — default: IdentityOp (pass through)
│   ├── NegExprHandler<T>         — -x propagates abstract value of x
│   ├── LengthExprHandler<T>      — arr.length propagates from arr
│   ├── CastExprHandler<T>        — (T)x propagates from x
│   └── InstanceOfExprHandler<T>  — x instanceof T propagates from x
├── BinaryExprHandler<T>          — default: JoinOp (join operands)
├── InvokeExprHandler<T>          — matcher-based dispatch
└── AllocationExprHandler<T>      — default: SetTopOp (new = ⊤)
    ├── NewExprHandler<T>
    ├── NewArrayExprHandler<T>
    └── NewMultiArrayExprHandler<T>

Five Generic DomainOperation Building Blocks

All defined purely via Lattice<T> — no analysis-specific knowledge:

Operation What it does Used by default for
IdentityOp env[dest] = env[args[0]] Unary exprs (neg, cast, length, instanceof)
JoinOp env[dest] = join(env[a1], env[a2]) Binary exprs (arithmetic, bitwise, comparison)
SetTopOp env[dest] = lattice.top Allocation exprs (new), Sources
SetBottomOp env[dest] = lattice.bottom Sanitizers
CheckTopOp if leq(top, env[arg]) → report Sinks

Invoke Expression Handling (Matcher-Based)

For InvokeExpr, the handler has a list of rules, each pairing a matcher predicate with domain operations. The DSL provides three categories:

  • source: matched invoke → SetTopOp on return value or argument access path
  • sink: matched invoke → CheckTopOp on argument, with reporter callback
  • sanitizer: matched invoke → SetBottomOp on return value or argument access path
  • defaultPropagation: fallback for unmatched invocations

Taint Analysis Example (the running example)

fun defineTaintAnalysis() = fairAnalysis("TaintAnalysis", TaintLattice) {
    onInvokeExpr {
        source(matching { it.nameIs("getParam") }, sourceId = "HTTP_PARAM") {
            taintsReturnValue()
        }
        source(matching { it.nameIs("getSecret") }, sourceId = "SECRET_FIELD") {
            taintsArgument(0, AccessPath("arg0") / "secretField")
        }
        sink(matching { it.nameIs("sqlQuery") }, sinkId = "SQL") {
            checksArgument(0) { finding ->
                println("!! FINDING: ${finding.kind}${finding.detail}")
            }
        }
        sanitizer(matching { it.nameIs("sanitize") }, sanitizerId = "HTML_SANITIZE") {
            killsTaintOnReturnValue()
        }
    }
    // No onBinopExpr needed — default JoinOp is correct for taint
    // No onNegExpr/onCastExpr needed — default IdentityOp propagates taint
}

Lattice Hierarchy

Lattice<T> interface: bottom, top, join, meet, leq

FlatLattice<E>: sealed class Flat<E> = Bottom | Top | Element<E>
  - join of two distinct elements → ⊤
  - meet of two distinct elements → ⊥

TaintLattice: specializes so bottom = Untainted, top = Tainted
  - Two-point lattice: Untainted ⊑ Tainted

SootUp Jimple Expression Types Covered

Based on SootUp's sootup.core.jimple.common.expr package:

  • Binary: JAddExpr, JSubExpr, JMulExpr, JDivExpr, JRemExpr, JAndExpr, JOrExpr, JXorExpr, JShlExpr, JShrExpr, JUshrExpr, JEqExpr, JNeExpr, JLtExpr, JLeExpr, JGtExpr, JGeExpr, JCmpExpr, JCmpgExpr, JCmplExpr
  • Unary: JNegExpr, JLengthExpr
  • Cast/TypeCheck: JCastExpr, JInstanceOfExpr
  • Invoke: JStaticInvokeExpr, JVirtualInvokeExpr, JInterfaceInvokeExpr, JSpecialInvokeExpr, JDynamicInvokeExpr
  • Allocation: JNewExpr, JNewArrayExpr, JNewMultiArrayExpr

Design Decisions Made

  1. Expression-level, not statement-level: The converter handles statements; the DSL handles expressions.
  2. Callbacks all the way: The converter is oblivious to source/sink/sanitizer categories — it hands expressions to the DSL; matcher predicates fire as callbacks to classify them.
  3. FAIR apply holds a reference to a DomainOperation: At FAIR interpretation time, the interpreter calls back into the operation, which performs the actual lattice mutation.
  4. Generic defaults for all expression types: Unary=identity, binary=join, allocation=set_top. Analyses only override what they need.
  5. No special binop casing for taint: The default JoinOp already does join(left, right), which for the taint lattice means "if either is tainted, result is tainted."
  6. Source/sink/sanitizer builders compose the five generic DomainOperations: taintsReturnValue() = SetTopOp, killsTaintOnReturnValue() = SetBottomOp, checksArgument() = CheckTopOp, etc.

Open Questions / Future Work

From the FAIR paper's own open questions:

  • Modeling heap representations and pointer/callgraph analysis in FAIR
  • Handling exceptional flows
  • Backward and demand-driven analyses
  • Aliasing and strong updates
  • User feedback integration (e.g., "ignore this path")

From our DSL design:

  • Context sensitivity DSL (separate from expression handling): contextSensitivity { callSiteSensitive(k = 2) }
  • Product lattices for multi-domain analyses
  • Interval lattice for constant analysis
  • FlowDroid-compatible source/sink specification import: fromFlowDroidConfig("SourcesAndSinks.txt")
  • Matching on receiver type, not just method name
  • Taint-through for wrapper methods (beyond source/sink/sanitizer)
  • Access path model integration into lattice definition

Files Produced

  • FairDSL.kt — The complete implementation with all of the above, including a main() end-to-end demo.

FAIR Paper Reference

The FAIR paper (attached as PDF) describes the IR syntax, semantics, and examples for taint analysis, constant analysis, call-string/IFDS/IDE approaches, and polymorphic call handling. Sections 4-7 contain the running examples we referenced.