[ZIPT Benchmark] ZIPT Benchmark — Z3 c3 branch (2026-03-14) #8991
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Qf S Benchmark. A newer discussion is available at Discussion #9031. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-14
Branch: c3 (commit
4c64c82cef62dfe37e2d98a02fdb70d0e69defa4)Benchmark set: QF_S — 50 randomly selected files from
tests/QF_S.tar.zst(22 172 total files)Timeout: 10 s per benchmark (
-T:10for Z3 nseq;-T:5+ outer 7 s for seq with tracing;-t:10000for ZIPT)Build: Debug (
-DCMAKE_BUILD_TYPE=Debug); ZIPT built against freshly compiledMicrosoft.Z3.dll(net10.0)Summary
Soundness disagreements (any two solvers return conflicting sat / unsat): 0
Notable Issues
Soundness Disagreements (Critical)
None. No benchmark produced a contradicting definitive verdict between any two solvers.
Crashes / Bugs
unsolved_pcp_instance_335.smt2Unsupported feature: str.replace_allpcp_instance_187.smt2Unsupported feature: str.replace_allstr.replace_allis not yet implemented in ZIPT's constraint propagator. Both files come from the20250403-pcp-stringcollection.nseq Coverage Gap (Major Performance Finding)
nseq timed out on 33 of 50 benchmarks (returning
unknownat the 10 s wall-clock limit). It produced a definitive answer on only 12 files — most of which are simple string-equality or short-regex benchmarks from theslog/trackfamilies. In contrast, seq solved 34 and ZIPT solved 47.The gap is almost entirely explained by the AutomataArk (
20230329-automatark-lu) family, which dominates the random sample: these are single-variable regex-membership problems of the form(not (str.in_re X (complex-re))). seq's mature regex-automaton engine (theory_seq::propagate_in_re) handles them quickly; nseq appears to lack equivalent regex specialisation and exhausts its budget on general Nielsen-graph search.Slow Benchmarks (> 8 s for any solver)
All 33 slow entries are nseq hitting the 10 s ceiling. The worst seq time is
instance14766.smt2at 4.086 s.Trace Analysis: seq-fast / nseq-slow Hypotheses
13 files satisfy seq_time < 1.0 s AND nseq_time > 3 × seq_time AND nseq_time > 0.5 s.
Pattern common to all 13 candidates
Every candidate is an AutomataArk-derived regex satisfiability query: a single string variable
Xconstrained by one or morestr.in_re(ornot (str.in_re ...)) assertions over large regexes encoding real-world patterns (HTML tags, HTTP headers, file paths, date formats, etc.).The seq solver trace consistently shows:
[seq] enque_axiom/deque_axiom— initial character-set axioms for the regex alphabet (dozens ofseq.unit Char[n]facts) added in the first milliseconds.[seq] propagate_in_re— automaton-intersection propagation immediately fires and emits strong length lower-bounds (assert:(>= (str.len ...) k)) and character-range constraints, sometimes directly contradicting the negated regex in a single propagation step.[seq] propagate_lit— the derived length facts are asserted at scope 0, allowing the SAT solver to close the search without backtracking.Selected file analyses
instance00166.smt2— seq=0.103 s (sat), nseq=10.010 s (unknown)The problem is
(not (str.in_re X (HTML-tag-regex))). The trace shows seq immediately enqueues character axioms for the 20+ characters in the regex alphabet (Char[60]=<,Char[62]=>, etc.), then firespropagate_in_reto derive the existence of a short satisfying string. The entire resolution takes ≈100 ms. nseq has no equivalentpropagate_in_repath; it must enumerate possible string lengths and character combinations via Nielsen-graph expansion, which does not converge within 10 s on this lexically large regex.instance04207.smt2— seq=0.578 s (sat), nseq=10.008 s (unknown)Problem:
(not (str.in_re X (FTP-header-regex))). The trace (~7 200 lines) showsmk_eq_coreestablishing the substr suffix structure (e.g.str.substr X (str.len X - 28) 28 == "…reaction.txt"…\n"), thenpropagate_in_rewith a regex that fixes a 28-char suffix. seq recognises this as unconditionally satisfiable via a short witness that avoids thewowokay[0-9]+FTP…prefix pattern. nseq cannot exploit the prefix-exclusion without a Parikh/length argument that is apparently not generated.instance13483.smt2— seq=0.829 s (unsat), nseq=10.008 s (unknown)HTTP-header problem with
str.substrconstraints (suffix must be":/\n", earlier substr must be"\r\n/smi\n"). The trace showspropagate_in_rederiving(>= (str.len X) 18)and(>= (str.len substr) 32)at scope 0 from a hex-UUID regex (re.loop[32:32] (re.union [a-f] [0-9])). These length facts combined with the fixed suffixes produce an arithmetic conflict that seq discharges in < 1 s. nseq does not generate the length bounds from the regex loop constraint and is left with under-constrained arithmetic.instance13846.smt2— seq=0.351 s (unsat), nseq=10.008 s (unknown)Similar HTTP-header structure. seq derives a lower-bound length (from a
re.loop[n:n]) that is incompatible with the fixed prefix/suffix, producing a fast refutation. nseq again appears to miss the loop-induced length inference.Lehmann-Rabin_sat_non_incre_equiv_bad_0_1.smt2— seq=0.583 s (sat), nseq=10.009 s (unknown)CHC-derived benchmark with two string variables
varout,varinboth in(re.* (re.union a b c d H E))(alphabet-Kleene-star) plus a complex boolean formula involving length and substring equality. seq handles Kleene-star membership trivially (any string over the alphabet satisfies it) and quickly finds a model. nseq's Nielsen-graph must unfold the concatenation equalities without the simplification thatre.*over the full alphabet imposes no constraint.Summary hypothesis: The root cause of all 13 slowdowns is the absence in nseq of a dedicated
propagate_in_re/automaton-intersection engine. Seq converts eachstr.in_reconstraint to a symbolic automaton, intersects automata, and propagates character/length facts directly into the arithmetic layer. nseq instead relies on Nielsen graph expansion and Parikh-based arithmetic abstraction, which are powerful for string-equality problems but do not extract the strong structural constraints that regex automata yield. Until nseq implements regex-to-automaton propagation (or integrates seq'sseq_regex.cpproutines), it will systematically time out on AutomataArk-class benchmarks.Per-File Results (50 benchmarks)
instance12064.smt2instance08907.smt2instance15675.smt2instance14766.smt2instance12646.smt2unsolved_pcp_instance_335.smt201_track_120.smt2instance13483.smt2instance12188.smt2instance01616.smt2slog_stranger_1706_sink.smt2instance01258.smt203_track_129.smt2instance11770.smt2instance08827.smt2instance13553.smt2instance00166.smt2instance02021.smt2instance10989.smt2instance07639.smt2instance04207.smt2instance13309.smt2instance13182.smt2pcp_instance_187.smt2instance04761.smt2instance07676.smt2slog_stranger_418_sink.smt2instance05341.smt2instance07992.smt2instance04793.smt2instance12730.smt2instance11360.smt2slog_stranger_4135_sink.smt2Lehmann-Rabin_sat_non_incre_equiv_bad_0_1.smt2instance02418.smt2instance13118.smt2instance13474.smt2instance13846.smt2instance15325.smt203_track_128.smt2instance09378.smt2instance15893.smt2instance00151.smt2instance11552.smt2instance07312.smt2instance01990.smt2query8824.smt2instance11865.smt2instance15451.smt2instance14922.smt2Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions