[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9052
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Qf S Benchmark. A newer discussion is available at Discussion #9054. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
ZIPT Benchmark Report — Z3 c3 branch
Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from
tests/QF_S.tar.zst, 22,172 total)Timeout: seq uses
-T:5(with-tr:seqtracing) / 7 s outer; nseq uses-T:10/ 12 s outer; ZIPT uses-t:10000/ 12 s outerBuild: Debug mode (
CMAKE_BUILD_TYPE=Debug) with .NET bindings, Z3 version 4.17.0ZIPT: branch
parikh, built against freshly compiledMicrosoft.Z3.dll(netstandard2.1 target)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 1
Notable Issues
🔴 Soundness Disagreement (Critical)
Lehmann-Rabin_sat_non_incre_equiv_trans_14_1.smt2(benchmark row 8)Both Z3 solvers agree on unsat. ZIPT claims sat, but before doing so it throws:
The
derivassertion atseq_nielsen.cpp:3717fires on this small 4-assertion problem involving regex constraints over the alphabet{1, 2}. The seq solver returnstimeout(5 s). This assertion violation should be investigated in theseq_parikh/ derivative computation path.The instance:
🟡 ZIPT Unsupported Feature (
str.replace_all)ZIPT currently does not support the
str.replace_allfunction. Four files hit this:🟡 Slow Benchmarks (> 8 s for any solver)
Both hard instances are from the
20250411-negated-predicatesset (disequality benchmarks). All three solvers struggle with these; seq hits its 5 s cap, nseq hits 10 s, and ZIPT hits 12 s.Trace Analysis: seq-slow / nseq-fast Pattern
No seq-fast / nseq-slow cases were observed (no file where seq < 1 s and nseq > 3× seq time and nseq > 0.5 s). Instead, the benchmark reveals a prominent opposite pattern: 7 files where seq timed out at 5 s but nseq returned
satin < 0.15 s.These files (instance13608, instance09536, instance06147, slog_stranger_654_sink, instance12094, 01_track_55, instance11759) are all regex-heavy instances from the
automatark-luand similar sets.Hypothesis for seq-timeout / nseq-fast pattern: The seq traces show tens of thousands of
enque_axiom/add_axiom/add_lengthsteps (e.g., 86 K lines for instance13608) in which seq decomposes complex regex patterns into individual character axioms (seq.unit Char[N]) and generates separation lemmas for each character position. This axiom-enumeration approach grows combinatorially with the number of distinct characters mentioned in the regexes and the length of the string constants, and does not converge within 5 s.By contrast, nseq uses the Nielsen-graph approach combined with Parikh-constraint generation and minterm-based character-set analysis. For regex-membership queries, the Parikh image (counting occurrences of each character class) can immediately rule out or confirm membership without exploring individual character assignments. The minterm decomposition collapses the full Unicode alphabet into a small set of equivalence classes, so the Nielsen graph remains compact. This is why nseq solves regex-heavy instances in 30–120 ms while seq enumerates character axioms for seconds.
📋 Full Per-File Results (50 benchmarks)
† = likely measurement artifact (Z3 internal "timeout" text not matched by verdict parser);
noodles-unsat-9is the confirmed real nseq bug.Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions