Skip to content

who2: Towards a faster expression builder#351

Draft
langston-barrett wants to merge 41 commits intomasterfrom
builder2
Draft

who2: Towards a faster expression builder#351
langston-barrett wants to merge 41 commits intomasterfrom
builder2

Conversation

@langston-barrett
Copy link
Contributor

@langston-barrett langston-barrett commented Feb 11, 2026

Motivation

When working on w4smt2, I noticed that ExprBuilder added considerable overhead to SMT queries, and the overhead grew with larger queries. Profiling revealed that the culprit was What4's normalizing datastructures such as WeightedSum, which have O(n log n) worst-case construction time. I became concerned that our higher-level tools might be paying an excessive performance penalty by going through What4.

Approach

Who2 provides an alternative expression builder that performs local rewrites and tracks abstract domains, but uses only approximately linear time by default.

Many of Who2's performance-oriented features (e.g., hash-consing) can be controlled via Cabal flags, allowing consumers to try out different features depending on their use-case. When disabled, they result in clearly dead branches that are pruned by the compiler.

Results

Benchmarks show that Who2 offers extremely low overhead but still manages to solve a substantial number of problems without consulting a solver. It would certainly be worthwhile to consider benchmarking if Who2 speeds up our actual workloads from Crucible, SAW, and GREASE.

Testing

Since the correctness of Who2 is crucial to any tools that might be built on it (and validity of benchmarking results), I have put special care into fairly extensive testing.

  • I have run who2 via w2smt2 on many SMT-Lib sample problems and verified its agreement with SMT solvers and What4.
  • Each rewrite has a corresponding unit test, and that statement is itself tested (using -- test: comments in the source). Each rewrite test is itself validated against Z3.
  • Each rewrite has a corresponding Cryptol :prove command establishing its validity, if Cryptol is available these are proved in the test suite.
  • Each constructor has a golden SMT-Lib output test which is validated by Z3, and that statement is itself tested (using -- test-smt: comments in the source).
  • I created a Hedgehog generator for a type ExprBuilderAPI. This is like the existing "template" tests.
    • There is a Hedgehog test stating that Builder reduces expressions without variables to literals.
    • There is a Hedgehog test stating that Builder never builds empty datastructures (e.g., semiring sums)
    • There is a Hedgehog test stating that ExprBuilderAPI, as interpreted by Builder, can generate every constructor of Builder's Expr. This ensures good coverage in the generator.
    • I created a TestBuilder that uses the same SymExpr type as Who2's Builder, but without rewriting and with all-top abstract domains. There is a Hedgehog test stating that these builders produce equisatisfiable predicates. I did some manual mutation testing (changing some rewrites) and confirmed that it found the issues I introduced.
  • There are property-based tests for laws of typeclass instances in the code.

There are, in fact, more test modules than source modules.

Documentation

See especially the Haddocks on:

  • Who2.Builder
  • Who2.Config
  • Who2.Expr.App

Status and future work

Only QF_BV operations are implemented at the moment.

TODO (before merging):

  • Represent or as negated and a la What4
  • Deduplicate a bit of logic between rewrites (re: PolarizedBloomSeq)
  • Rebase after merging w4smt2: A bad SMT solver #343
  • Create a pre-hashed Seq supporting appending
  • Generalize BloomSeq to a map-like datastructure, use for a more comprehensive WeightedSum-like datastructure
  • Overview of datastructures
  • Optional hash-consing
  • Optional AST normalization
  • Optional patricia-tree based datastructures
  • README with implementation status and next steps, especially solver interaction
  • Audit typeclass instances (especially horrible TemplateHaskell instances)
  • Test for generator completeness

Possible extensions:

  • Improve abstract domains
  • Support other SMT-Lib logics (integers, reals)
  • What4-like interface to solvers (currently w2smt2 just serializes to file)

I did use Claude code in building who2.

@langston-barrett
Copy link
Contributor Author

Here are some benchmarking results on a subset of QF_BV samples from SMT-LIB release 2025 (non-incremental benchmarks), collected via w4smt2-bench. In contrast to What4, Who2 adds virtually no overhead, while still solving 3/4 as many problems as What4 without consulting a solver (180 vs. 240)

=========================================
Z3:
  Average time (all): 28.978s ± 72.894s
  Median time (all): 0.127s
  Average time (sat/unsat): 7.663s ± 28.541s
  Median time (sat/unsat): 0.077s
  Total time: 32397.703s
  Timeouts: 102
  Errors: 7

Bitwuzla:
  Average time (all): 5.177s ± 31.434s
  Median time (all): 0.064s
  Average time (sat/unsat): 1.596s ± 8.733s
  Median time (sat/unsat): 0.063s
  Total time: 5788.190s
  Timeouts: 16

W4:
  Average time (all): 0.261s ± 1.775s
  Median time (all): 0.097s
  Average time (sat/unsat): 0.118s ± 0.086s
  Median time (sat/unsat): 0.095s
  Total time: 291.287s
  Unknown: 923
  Unsupported: 2
  Errors: 22

W2:
  Average time (all): 0.142s ± 0.613s
  Median time (all): 0.040s
  Average time (sat/unsat): 0.107s ± 0.145s
  Median time (sat/unsat): 0.051s
  Total time: 158.023s
  Unknown: 983
  Unsupported: 2

W2Z3:
  Average time (all): 29.407s ± 73.799s
  Median time (all): 0.152s
  Average time (sat/unsat): 7.670s ± 28.394s
  Median time (sat/unsat): 0.092s
  Total time: 32877.438s
  Timeouts: 103

W2Bitwuzla:
  Average time (all): 5.214s ± 31.396s
  Median time (all): 0.087s
  Average time (sat/unsat): 1.638s ± 8.783s
  Median time (sat/unsat): 0.086s
  Total time: 5829.549s
  Timeouts: 16

Disagreements: None

Total time: 154m 29.0s (9269.0s)
=========================================

@langston-barrett
Copy link
Contributor Author

langston-barrett commented Feb 12, 2026

I added some datastructures inspired by What4's WeightedSum and SemiRingProduct. I also added a pre-hashed sequence type. I would expect that the former would have a minor performance penality while allowing solving slightly more problems without a solver. I would expect the latter to have a substantial performance improvement. These expectations align with a rerun of the above benchmark with just w2:

W2:
  Average time (all): 0.077s ± 0.086s
  Median time (all): 0.050s
  Average time (sat/unsat): 0.100s ± 0.128s
  Median time (sat/unsat): 0.046s
  Total time: 85.775s
  Unknown: 981
  Unsupported: 2

This fully halves the average time, decides two additional problems, and only very slightly increases the median time (which is dominated by very small problems).

When working on `w4smt2`, I noticed that `ExprBuilder` added
considerable overhead to SMT queries, and the overhead grew with larger
queries. Profiling revealed that the culprit was What4's normalizing
datastructures such as `WeightedSum`, which have `O(n log n)` worst-case
construction time.

Who2 provides an alternative expression builder that performs
local rewrites and tracks abstract domains, but does not include
such heavyweight datastructures. It strives to keep construction
approximately linear. Benchmarks show that it actually slightly speeds
up SMT solvers (!!) when used as a preprocessing pass (via `w2smt2`).
Even if further data doesn't support that surprising conclusion, it
would certainly be worthwhile to consider benchmarking if Who2 speeds up
our actual workloads from Crucible, SAW, and GREASE.

Since the correctness of Who2 is crucial to any tools that might be
built on it (and validity of benchmarking results), I have put special
care into fairly extensive testing.

- I have run `who2` via `w2smt2` on many SMT-Lib sample problems and
  verified its agreement with SMT solvers and What4.
- Each rewrite has a corresponding unit test, and that statement is
  itself tested. Each rewrite test is itself validated against Z3.
- I created a Hedgehog generator for a type `ExprBuilderAPI`. This
  is like the existing "template" tests. I created a `TestBuilder`
  that uses the same `SymExpr` type as Who2's `Builder`, but without
  rewriting and with all-top abstract domains. There is a Hedgehog test
  stating that these builders produce equisatisfiable predicates. I did
  some manual mutation testing (changing some rewrites) and confirmed
  that it found the issues I introduced.
- There are property-based tests for laws of typeclass `instance`s in
  the code.

Possible extensions:

- Improve abstract domains
- Support other SMT-Lib logics (integers, reals)
- What4-like interface to solvers (currently `w2smt2` just serializes
  to file)
- Add a simple `WeightedSum`-like datastructure that has a `Seq` of
  non-constant terms, together with a single constant term
- Generalize `BloomSeq` to a map-like datastructure, use for a more
  comprehensive `WeightedSum`-like datastructure
Remove an indirect call (through `bAllocator`) on a fast path
Benchmarking show this not making too much of a difference either way,
and it's good for memory usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant