who2: Towards a faster expression builder by langston-barrett · Pull Request #351 · GaloisInc/what4

langston-barrett · 2026-02-11T19:30:53Z

Motivation

When working on w4smt2, I noticed that ExprBuilder added considerable overhead to SMT queries, and the overhead grew with larger queries. Profiling revealed that the culprit was What4's normalizing datastructures such as WeightedSum, which have O(n log n) worst-case construction time. I became concerned that our higher-level tools might be paying an excessive performance penalty by going through What4.

Approach

Who2 provides an alternative expression builder that performs local rewrites and tracks abstract domains, but uses only approximately linear time by default.

Many of Who2's performance-oriented features (e.g., hash-consing) can be controlled via Cabal flags, allowing consumers to try out different features depending on their use-case. When disabled, they result in clearly dead branches that are pruned by the compiler.

Results

Benchmarks show that Who2 offers extremely low overhead but still manages to solve a substantial number of problems without consulting a solver. It would certainly be worthwhile to consider benchmarking if Who2 speeds up our actual workloads from Crucible, SAW, and GREASE.

Testing

Since the correctness of Who2 is crucial to any tools that might be built on it (and validity of benchmarking results), I have put special care into fairly extensive testing.

I have run who2 via w2smt2 on many SMT-Lib sample problems and verified its agreement with SMT solvers and What4.
Each rewrite has a corresponding unit test, and that statement is itself tested (using -- test: comments in the source). Each rewrite test is itself validated against Z3.
Each rewrite has a corresponding Cryptol :prove command establishing its validity, if Cryptol is available these are proved in the test suite.
Each constructor has a golden SMT-Lib output test which is validated by Z3, and that statement is itself tested (using -- test-smt: comments in the source).
I created a Hedgehog generator for a type ExprBuilderAPI. This is like the existing "template" tests.
- There is a Hedgehog test stating that Builder reduces expressions without variables to literals.
- There is a Hedgehog test stating that Builder never builds empty datastructures (e.g., semiring sums)
- There is a Hedgehog test stating that ExprBuilderAPI, as interpreted by Builder, can generate every constructor of Builder's Expr. This ensures good coverage in the generator.
- I created a TestBuilder that uses the same SymExpr type as Who2's Builder, but without rewriting and with all-top abstract domains. There is a Hedgehog test stating that these builders produce equisatisfiable predicates. I did some manual mutation testing (changing some rewrites) and confirmed that it found the issues I introduced.
There are property-based tests for laws of typeclass instances in the code.

There are, in fact, more test modules than source modules.

Documentation

See especially the Haddocks on:

Who2.Builder
Who2.Config
Who2.Expr.App

Status and future work

Only QF_BV operations are implemented at the moment.

TODO (before merging):

Possible extensions:

Improve abstract domains
Support other SMT-Lib logics (integers, reals)
What4-like interface to solvers (currently w2smt2 just serializes to file)

I did use Claude code in building who2.

langston-barrett · 2026-02-12T13:44:09Z

Here are some benchmarking results on a subset of QF_BV samples from SMT-LIB release 2025 (non-incremental benchmarks), collected via w4smt2-bench. In contrast to What4, Who2 adds virtually no overhead, while still solving 3/4 as many problems as What4 without consulting a solver (180 vs. 240)

=========================================
Z3:
  Average time (all): 28.978s ± 72.894s
  Median time (all): 0.127s
  Average time (sat/unsat): 7.663s ± 28.541s
  Median time (sat/unsat): 0.077s
  Total time: 32397.703s
  Timeouts: 102
  Errors: 7

Bitwuzla:
  Average time (all): 5.177s ± 31.434s
  Median time (all): 0.064s
  Average time (sat/unsat): 1.596s ± 8.733s
  Median time (sat/unsat): 0.063s
  Total time: 5788.190s
  Timeouts: 16

W4:
  Average time (all): 0.261s ± 1.775s
  Median time (all): 0.097s
  Average time (sat/unsat): 0.118s ± 0.086s
  Median time (sat/unsat): 0.095s
  Total time: 291.287s
  Unknown: 923
  Unsupported: 2
  Errors: 22

W2:
  Average time (all): 0.142s ± 0.613s
  Median time (all): 0.040s
  Average time (sat/unsat): 0.107s ± 0.145s
  Median time (sat/unsat): 0.051s
  Total time: 158.023s
  Unknown: 983
  Unsupported: 2

W2Z3:
  Average time (all): 29.407s ± 73.799s
  Median time (all): 0.152s
  Average time (sat/unsat): 7.670s ± 28.394s
  Median time (sat/unsat): 0.092s
  Total time: 32877.438s
  Timeouts: 103

W2Bitwuzla:
  Average time (all): 5.214s ± 31.396s
  Median time (all): 0.087s
  Average time (sat/unsat): 1.638s ± 8.783s
  Median time (sat/unsat): 0.086s
  Total time: 5829.549s
  Timeouts: 16

Disagreements: None

Total time: 154m 29.0s (9269.0s)
=========================================

langston-barrett · 2026-02-12T20:28:06Z

I added some datastructures inspired by What4's WeightedSum and SemiRingProduct. I also added a pre-hashed sequence type. I would expect that the former would have a minor performance penality while allowing solving slightly more problems without a solver. I would expect the latter to have a substantial performance improvement. These expectations align with a rerun of the above benchmark with just w2:

W2:
  Average time (all): 0.077s ± 0.086s
  Median time (all): 0.050s
  Average time (sat/unsat): 0.100s ± 0.128s
  Median time (sat/unsat): 0.046s
  Total time: 85.775s
  Unknown: 981
  Unsupported: 2

This fully halves the average time, decides two additional problems, and only very slightly increases the median time (which is dominated by very small problems).

When working on `w4smt2`, I noticed that `ExprBuilder` added considerable overhead to SMT queries, and the overhead grew with larger queries. Profiling revealed that the culprit was What4's normalizing datastructures such as `WeightedSum`, which have `O(n log n)` worst-case construction time. Who2 provides an alternative expression builder that performs local rewrites and tracks abstract domains, but does not include such heavyweight datastructures. It strives to keep construction approximately linear. Benchmarks show that it actually slightly speeds up SMT solvers (!!) when used as a preprocessing pass (via `w2smt2`). Even if further data doesn't support that surprising conclusion, it would certainly be worthwhile to consider benchmarking if Who2 speeds up our actual workloads from Crucible, SAW, and GREASE. Since the correctness of Who2 is crucial to any tools that might be built on it (and validity of benchmarking results), I have put special care into fairly extensive testing. - I have run `who2` via `w2smt2` on many SMT-Lib sample problems and verified its agreement with SMT solvers and What4. - Each rewrite has a corresponding unit test, and that statement is itself tested. Each rewrite test is itself validated against Z3. - I created a Hedgehog generator for a type `ExprBuilderAPI`. This is like the existing "template" tests. I created a `TestBuilder` that uses the same `SymExpr` type as Who2's `Builder`, but without rewriting and with all-top abstract domains. There is a Hedgehog test stating that these builders produce equisatisfiable predicates. I did some manual mutation testing (changing some rewrites) and confirmed that it found the issues I introduced. - There are property-based tests for laws of typeclass `instance`s in the code. Possible extensions: - Improve abstract domains - Support other SMT-Lib logics (integers, reals) - What4-like interface to solvers (currently `w2smt2` just serializes to file) - Add a simple `WeightedSum`-like datastructure that has a `Seq` of non-constant terms, together with a single constant term - Generalize `BloomSeq` to a map-like datastructure, use for a more comprehensive `WeightedSum`-like datastructure

Remove an indirect call (through `bAllocator`) on a fast path

Benchmarking show this not making too much of a difference either way, and it's good for memory usage.

langston-barrett self-assigned this Feb 11, 2026

langston-barrett force-pushed the builder2 branch from 6bbf720 to 865bd25 Compare February 11, 2026 21:23

langston-barrett force-pushed the builder2 branch from 865bd25 to 5b0b41d Compare February 12, 2026 14:39

langston-barrett added 19 commits February 13, 2026 15:54

w4smt2: Rotations

7e679da

who2: Remove Allocator

6e07346

Remove an indirect call (through `bAllocator`) on a fast path

who2: Add tests for PolarizedBloomSeq rewrites

ea6f911

who2: Datastructures for semiring operations

a4ae0e7

who2: Sequence with precomputed hash

b95d4ed

who2: Fix warnings

62e95d1

who2: Remove unused operations

bb91749

who2: Monomorphize

d7632ce

who2: Pretty-printing

8a575a8

who2: Docs

abaee56

who2: Property test about abstract domains

482ddb3

who2: Optionally emit known-bits to solver

b52a111

who2: One more INLINE

518a71c

who2: Optional hash-consing

c3b5e55

Benchmarking show this not making too much of a difference either way, and it's good for memory usage.

who2: Unsupported exception

25c9d75

who2: Unified configuration module with Cabal flags

6008b9a

who2: Bloom* module refactoring

58e6c96

who2: Configurable AST normalization, patricia-tree data structures

e5bfab3

langston-barrett force-pushed the builder2 branch from 0c39cf1 to e5bfab3 Compare February 13, 2026 21:14

langston-barrett added 5 commits February 13, 2026 16:14

who2: README

8c3559c

who2: Instances

025db98

who2: Module hierarchy

6229346

who2: More property tests, more module renaming

e5dbedb

who2: Annotations for instances with property tests

cfd4f7e

langston-barrett added 6 commits February 14, 2026 00:01

who2: Minor improvements to property tests

746b7db

who2: Tests for semiring identities, fix semiring bug

f7593a7

who2: Test another semiring property

1e65486

who2: Recompute bloom map filter after deletion

ea735df

who2: Tests for completeness of SMT-Lib golden tests

55cc474

who2: Tests for Expr invariants

315889c

langston-barrett force-pushed the builder2 branch from b08e0cd to 315889c Compare February 14, 2026 18:22

langston-barrett added 11 commits February 14, 2026 13:53

who2: Cryptol proofs for rewrites

56ba1c1

who2: Improve property tests, test generator coverage

b26830a

who2: Reorganize tests

b9754f2

who2: Add singleton abstract domain optimization to alloc

c7e77b5

who2: Fix rol/ror bug, reinstate property tests

f310eda

who2: Test that assertions are enabled

b811d6d

who2: Assert that abstract values are singletons iff values are literals

3b7f6b0

who2: re-enable property test

bc776d2

who2: Put a TestTree in each test module

04cdbb4

who2: Remove orphan instance

fad1912

who2: Rework typeclass property tests

0fd191b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

who2: Towards a faster expression builder#351

who2: Towards a faster expression builder#351
langston-barrett wants to merge 41 commits intomasterfrom
builder2

langston-barrett commented Feb 11, 2026 •

edited

Loading

Uh oh!

langston-barrett commented Feb 12, 2026

Uh oh!

langston-barrett commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

langston-barrett commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Approach

Results

Testing

Documentation

Status and future work

Uh oh!

langston-barrett commented Feb 12, 2026

Uh oh!

langston-barrett commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

langston-barrett commented Feb 11, 2026 •

edited

Loading

langston-barrett commented Feb 12, 2026 •

edited

Loading