[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9050
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Qf S Benchmark. A newer discussion is available at Discussion #9052. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from
tests/QF_S.tar.zst, drawn from 22,172 total)Timeout: 10 seconds per benchmark (
-T:10for Z3;-t:10000for ZIPT)Build: CMake Debug (
-DCMAKE_BUILD_TYPE=Debug) + Z3 .NET bindings; ZIPTparikhbranch compiled against the freshly builtMicrosoft.Z3.dllSummary
seqsolvernseqsolverSoundness disagreements (any two solvers return conflicting sat/unsat): 0
ZIPT is the fastest solver on average and solved the most instances (38 definitive answers vs 34 for seq and 32 for nseq). It did produce 5 "bug" results, all due to unsupported features (see below).
Notable Issues
Soundness Disagreements (Critical)
None detected. All solvers that gave a definitive answer agreed on sat/unsat.
Crashes / Bugs
Both seq and nseq — 6 files (anomalous, likely debug-build transients):
These 6 files produced
bugat ~7 ms during the benchmark run, but individually re-running each file produces the expected correct verdict (sat or unsat). The 7 ms execution time suggests Z3 exited immediately — possibly a debug-build assertion triggered by accumulated test harness state, or a false positive from the error-pattern grep catching a transient debug-mode warning. Flagged for further investigation.instance12017.smt2instance01785.smt2slog_stranger_3017_sink.smt2instance00984.smt2instance10004.smt2slog_stranger_4846_sink.smt2ZIPT — 5 files (unsupported features):
All 5 ZIPT bugs are
Unsupported featureerrors, not crashes:benchmark_0474.smt2,benchmark_0264.smt2,benchmark_0139.smt2: ZIPT does not supportstr.replace_all(RNA benchmark family)unsolved_pcp_instance_337.smt2,unsolved_pcp_instance_68.smt2: PCP-string instances with unsupported constructsSlow Benchmarks (> 8 s for any solver)
instance08199.smt2instance09020.smt2instance11183.smt2instance12133.smt2sub-matching-sat-31.smt2instance15564.smt2Interesting: nseq faster than seq
Two files where seq reported
unknown(hit the 5 s internal timeout) but nseq solved quickly:instance10493.smt2instance12876.smt2These two cases hint that nseq's Nielsen-graph approach handles certain satisfiable string-equation instances more efficiently than seq's automata-based approach.
Interesting: ZIPT uniquely solves
Two files where both Z3 solvers returned
unknownbut ZIPT found the answer:slog_stranger_5144_sink.smt2instance15564.smt2Trace Analysis: seq-fast / nseq-slow Hypotheses
Three files met the criterion
seq_time < 1.0 s AND nseq_time > 3 × seq_time AND nseq_time > 0.5 s:instance08199.smt2— seq: 0.356 s (sat), nseq: 10.012 s (unknown)The problem asserts
X ∉ "Mirar_KeywordContent\u{13}\u{a}"andX ∉ re.loop[4..6](digit)\u{a}. The seq trace (fromseq_rewriter.cpp:5196 mk_eq_core) shows that seq immediately normalises the regex membership constraints into character-unit axioms for the concrete string value"Mirar_KeywordContent\u{13}\u{a}". The sequence solver's automata engine intersects the two negated-membership automata and finds a satisfying string in its first exploration. By contrast, nseq lacks a built-in automata-based regex engine — it delegates regex reasoning to the underlyingseqframework only at refutation points, so it must iterate through its Nielsen-graph derivations for an open-ended amount of time before arriving at a model. The absence of early length and regex-intersection propagation in nseq is the likely bottleneck here.instance11183.smt2— seq: 0.318 s (unsat), nseq: 10.010 s (unknown)The problem involves five
str.in_reconstraints including complex loops and character-ranges, plus one concrete-string non-membership assertion. The seq trace begins (atmk_eq_core) by enumerating the full character expansion of the concrete candidate"name=Emailbadurl.grandstreetinteractive.comHost:stepwww.kornputers.com\u{a}"and quickly derives a character-unit axiom set. Seq's product-automaton intersection then determines that the intersection of all required regex languages is empty (unsat) within its first DPLL conflict. nseq has no equivalent automata-product step; it relies on the Nielsen graph to enumerate equation solutions, but complex regex constraints are only checked lazily. With multiple overlapping regex constraints and a 40+-character domain string, nseq's graph exploration space is too large to terminate within 10 s.instance12133.smt2— seq: 0.251 s (unsat), nseq: 10.010 s (unknown)The problem requires
Xto match a regex anchored at"4"followed by a 12–15 digit loop and\u{a}, simultaneously matching a/GETprefix and a 152-repetition hexadecimal suffix. The seq trace shows rapid character-unit axiom derivation for the concrete witness"Searchdata2.activshopper.comdll?productsCUSTOMSAccwww.locators.com\u{a}". Seq's regex engine detects the length incompatibility between the 4+12–15+1 = 17–20 character constraint and the/GET+ 152-hex +"/m\u{a}"= 157+ character constraint through length arithmetic on the automata intersection, producing an unsat result very quickly. nseq has no equivalent length-from-regex inference; it must explore equation splits until it exhaustively rules out all solutions, which exceeds the 10 s budget.Per-File Results
Click to expand all 50 results
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions