[ZIPT Benchmark] Z3 c3 branch — 2026-03-19 #9049
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Qf S Benchmark. A newer discussion is available at Discussion #9050. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-19
Branch: c3
Benchmark set: QF_S (50 randomly selected files from
tests/QF_S.tar.zst)Timeout: 10 seconds per benchmark (
-T:10for Z3;-t:10000for ZIPT)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 1
ZIPT is substantially faster on average (0.300 s vs seq's 1.095 s and nseq's 1.636 s) and gives more definitive answers (46/50 vs seq's 39/50 and nseq's 36/50). The
bugcases for seq and nseq are benchmarks containing(get-model)or(get-value ...)after anunknownresult, which causes Z3 to emit a model-unavailability error. The nseq solver times out on 8 files that seq solves — all are AutomataArk regex-heavy instances (see Trace Analysis below).Notable Issues
Soundness Disagreements (Critical)
Lehmann-Rabin_sat_non_incre_equiv_trans_16_1.smt2(20250411-hornstr-equiv): seq=unsat, nseq=unsat, ZIPT=SATBoth Z3 solvers agree on unsat but ZIPT returns SAT. Importantly, ZIPT also emitted
Exception (Final): Specified method is not supported.before printing SAT, which strongly suggests a ZIPT bug rather than a Z3 soundness error. The benchmark source marks:status unknown, so the ground truth is unverified. The formula involves a string variablevaroutconstrained by a complex regex intersection from a Lehmann-Rabin protocol CHC encoding. Verdict: likely a ZIPT bug (exception in final validation triggers a spurious SAT).Crashes / Bugs
not-contains-1-3-5-130.smt2benchmark_0358.smt2(get-value)afterunknowntriggers errorinstance14127.smt2(get-model)afterunknowntriggers errorpcp_instance_183.smt2The three seq/nseq "bug" cases are actually well-formed benchmark files that call
(get-model)/(get-value)after(check-sat), and when the solver returnsunknown, Z3 correctly errors on the subsequent model query. These are more accurately "benchmark harness" issues (the files request models that aren't available) than solver bugs, though the error exit code causes our script to classify them asbug.Slow Benchmarks (> 8s)
All 8 slow cases are nseq timeouts (nseq hits the 10 s wall-clock limit):
Trace Analysis: seq-fast / nseq-slow Hypotheses
Four files matched the criterion seq_time < 1.0 s AND nseq_time > 3 × seq_time AND nseq_time > 0.5 s. All four are from the AutomataArk benchmark family (
20230329-automatark-lu), and all involve a single string variableXconstrained by multiplestr.in_reandnot (str.in_re)assertions over complex regular expressions.Common pattern: seq solves these quickly by exploiting its automata-product intersection engine. The trace shows
propagate_in_recalls (inseq_regex.cpp:154) that build DFA/NFA products for the intersection of the positive regex constraints. When the intersection automaton becomes empty — e.g., because two regex constraints constrain the string to different lengths or character classes — seq immediately derivesunsatvia anadd_axiomcall (intheory_seq.cpp:2976) asserting the literal false. This is a constant number of deterministic automaton operations, completing in 0.168–0.297 s.Why nseq is slower: nseq (
theory_nseq) operates via the Nielsen graph, which decomposes string equations character by character and explores equation extensions. It does not have a dedicated regex-intersection procedure. For these AutomataArk instances — which are primarily regex membership constraints with no explicit concatenation equations — nseq must encode the regex constraints indirectly and explore a large Nielsen graph search space without the early-termination provided by automaton emptiness checking. The nseq solver times out (10 s) on all four files, returningunknown.Per-file hypotheses:
instance06151.smt2(seq: 0.297 s unsat, nseq: 10.010 s unknown): The formula constrainsXto simultaneously match a phone-number regex usingre.loop(4-digit blocks) and a 500-character repeated pattern\xf6\xec\xd9..., plus several negative constraints. The seq trace shows earlypropagate_in_renarrowing the length ofX: the 500-repetition pattern demands |X| ≥ ~3500, but the phone-number regex bounds |X| ≤ ~20. The length inconsistency is detected by seq's length propagation within the regex automaton product, yieldingunsatin ~10 automaton steps. nseq lacks this length-from-regex inference and exhausts its budget exploring equation splits.instance13438.smt2(seq: 0.204 s unsat, nseq: 10.010 s unknown): A URL-format regex (Host:Port...) with alphanumeric ranges must simultaneously satisfy a hostname regex and several negated patterns. The seq trace showspropagate_in_redetecting that the required first character ofX(determined by the positive regex) conflicts with a negated pattern's forced character — a character-class disjointness proof completed via the automaton product's initial states. nseq cannot efficiently reason about character-level constraints from regex membership without full equation unfolding.instance13864.smt2(seq: 0.206 s unsat, nseq: 10.010 s unknown): The positive regex demandsXstarts with specific literal text ("welcomeforToolbarHost:\n"), and the negated regex usesre.++with a quoted-string escape pattern. seq'spropagate_in_repeels the constant prefix ofXfrom the regex, collapsing the remaining regex constraints to a shorter suffix, and quickly finds an empty intersection. nseq cannot apply prefix-stripping optimizations to regex constraints.instance15521.smt2(seq: 0.168 s unsat, nseq: 10.010 s unknown): The formula hasstr.in_re X (re.++ "<" (re.* (re.comp ">")) ">")— an XML-tag pattern — combined withstr.in_re X (re.++ "/filename=" (re.* (re.comp "\n")) ".wmx/i")(URL path). seq's trace showspropagate_in_redetecting that the first character ofXmust be simultaneously<(from regex 1) and/(from regex 3): a trivial character-class conflict resolved in the first automaton step. nseq's equation-splitting approach doesn't exploit this structural first-character conflict.Per-File Results
Click to expand all 50 results
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions