[ZIPT Benchmark] Z3 c3 branch — 2026-03-18 #9031
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Qf S Benchmark. A newer discussion is available at Discussion #9049. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
ZIPT Benchmark Report — Z3 c3 branch
Date: 2026-03-18
Branch: c3
Benchmark set: QF_S (50 randomly selected files from
tests/QF_S.tar.zst, total 22,172 available)Timeout: 10 seconds per benchmark (
-T:5with-tr:seqfor seq;-T:10for nseq;-t:10000for ZIPT)Build: Debug mode (CMake + Ninja,
CMAKE_BUILD_TYPE=Debug)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 7
Notable Issues
All 7 disagreements follow the same pattern: nseq returns
unsatwhile seq and ZIPT returnsat. This strongly suggests a soundness bug in thenseqsolver.The consistent pattern — nseq claims unsat, both seq and ZIPT claim sat — points to a false unsatisfiability bug in the
nseq(Nielsen-graph-based) solver. Theslog_strangerfiles are real-world web security string constraints; theinstance*files are from the AutomataArk benchmark suite. Checking one representative file (instance02344.smt2, statussatper seq and ZIPT) against nseq would confirm the bug.🐛 Crashes / Bugs — 5 files (all in ZIPT)
ZIPT fails with an error/unsupported-operation on all 5 PCP-encoding benchmarks. These files use
str.replace_allchained through multiple variables to encode Post Correspondence Problem instances. ZIPT apparently does not supportstr.replace_all.🐢 Slow Benchmarks (> 8s for any solver)
🔍 Trace Analysis: seq-fast / nseq-slow Hypotheses
Six files satisfied the criterion seq_time < 1.0s AND nseq_time > 3×seq_time AND nseq_time > 0.5s.
pcp_instance_252.smt2, pcp_instance_160.smt2, pcp_instance_347.smt2, unsolved_pcp_instance_409.smt2, unsolved_pcp_instance_381.smt2
(seq: ~0.22s (unknown), nseq: 10.011s (unknown))
These five benchmarks are PCP instances encoded with chained
str.replace_alloperations (e.g.,x_1 = str.replace_all(x_0, "2", Top_0),x_2 = str.replace_all(x_1, "3", Top_1), etc.). The seq trace shows that seq immediately recognises and simplifies the equations usingmk_eq_core(rewriter atseq_rewriter.cpp:5196), reducing eachreplace_allwith a concrete replacement string into an explicit equation within a few hundred trace entries. Although seq ultimately returnsunknown(it cannot conclude sat/unsat for the undecidable PCP), it does so quickly because its rewriter handlesstr.replace_alleagerly.The nseq solver, by contrast, hits the 10s timeout on all five. The likely explanation is that nseq's Nielsen-graph engine does not natively reduce
str.replace_allat the rewrite level — it probably expands it into an auxiliary sequence of equations or recursive constraints, producing a large Nielsen graph that exhausts the depth budget. The absence of specialisedreplace_allnormalisation in nseq means it falls back to a more expensive general-purpose enumeration, rather than the direct symbolic fixpoint that seq's rewriter employs.slog_stranger_2780_sink.smt2
(seq: 0.673s (sat), nseq: 10.011s (unknown))
The seq trace for this file (5,926 lines) shows that seq works through a regex membership constraint of the form
x_6 = " <b>" ++ sigmaStar_3and resolves it by normalising concrete string literals (the trace repeatedly simplifies" <b>","</b> "and"\SCRIPT"strings throughmk_eq_core), then callsenque_axiomfor each individual character unit, and eventually constructs amk_valuefor the satisfying assignment (the trace ends with the full model value being built). The approach is essentially symbolic character-level unfolding combined with automata alignment (seq.align.l,seq.align.rprimitives visible in the trace).nseq times out at 10s without producing a result. The benchmark contains a
str.in_reconstraint with a complex regular expression involvingstr.replaceandre.*/re.+operators encoding HTML tag sanitisation. The nseq Nielsen-graph solver appears to lack the automata-intersection machinery that seq uses to quickly determine membership; instead it likely tries to unroll there.*into word equations, generating an exponential Nielsen-graph depth.instance08497.smt2
(seq: 1.114s (sat), nseq: 10.012s (unknown))
This benchmark (from AutomataArk, status
sat) asserts that a stringXbelongs tore.++ "\n"— i.e., an HTML comment containing arbitrary characters — while simultaneously being excluded from two other patterns. The seq trace (19,780 lines) shows it working throughmk_eq_coreequations whereXis immediately collapsed to the concrete value"\n\r replacement string---->\n\n"(the rewriter identifies the only satisfying value that passes the complement-range membership tests), followed by extensiveenque_axiomcalls for individual character units.nseq times out. The complement operator
re.compin the regular expression requires computing the complement automaton; nseq's word-equation engine appears not to reduce range complements into a finite character-class representation early in search, instead branching on character membership decisions and generating a combinatorially large search tree. In contrast, seq's automata-theoretic backbone handlesre.compnatively via DFA complementation duringmk_eq_coresimplification.📋 Per-File Results (all 50 benchmarks)
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions