[ZIPT Benchmark] Z3 c3 branch — 2026-03-19 #9049

2026-03-19T23:47:54Z

github-actions[bot]
bot Mar 19, 2026

Date: 2026-03-19
Branch: c3
Benchmark set: QF_S (50 randomly selected files from tests/QF_S.tar.zst)
Timeout: 10 seconds per benchmark (-T:10 for Z3; -t:10000 for ZIPT)

Summary

Metric	seq solver	nseq solver	ZIPT solver
sat	21	21	28
unsat	18	15	18
unknown	8	11	3
timeout	0	0	0
bug/crash	3	3	1
Total time (s)	54.735	81.785	15.015
Avg time/benchmark (s)	1.095	1.636	0.300

Soundness disagreements (any two solvers return conflicting sat/unsat): 1

ZIPT is substantially faster on average (0.300 s vs seq's 1.095 s and nseq's 1.636 s) and gives more definitive answers (46/50 vs seq's 39/50 and nseq's 36/50). The bug cases for seq and nseq are benchmarks containing (get-model) or (get-value ...) after an unknown result, which causes Z3 to emit a model-unavailability error. The nseq solver times out on 8 files that seq solves — all are AutomataArk regex-heavy instances (see Trace Analysis below).

Notable Issues

Soundness Disagreements (Critical)

Lehmann-Rabin_sat_non_incre_equiv_trans_16_1.smt2 (20250411-hornstr-equiv): seq=unsat, nseq=unsat, ZIPT=SAT

Both Z3 solvers agree on unsat but ZIPT returns SAT. Importantly, ZIPT also emitted Exception (Final): Specified method is not supported. before printing SAT, which strongly suggests a ZIPT bug rather than a Z3 soundness error. The benchmark source marks :status unknown, so the ground truth is unverified. The formula involves a string variable varout constrained by a complex regex intersection from a Lehmann-Rabin protocol CHC encoding. Verdict: likely a ZIPT bug (exception in final validation triggers a spurious SAT).

Crashes / Bugs

File	seq	nseq	ZIPT	Notes
`not-contains-1-3-5-130.smt2`	bug	bug	unknown	seq/nseq hang then emit error (model unavailable after unknown)
`benchmark_0358.smt2`	bug	bug	unknown	Same pattern: `(get-value)` after `unknown` triggers error
`instance14127.smt2`	bug	bug	unknown	Same: `(get-model)` after `unknown` triggers error
`pcp_instance_183.smt2`	unknown	unknown	bug	ZIPT crashes (unsupported operation) on a PCP-string instance

The three seq/nseq "bug" cases are actually well-formed benchmark files that call (get-model) / (get-value) after (check-sat), and when the solver returns unknown, Z3 correctly errors on the subsequent model query. These are more accurately "benchmark harness" issues (the files request models that aren't available) than solver bugs, though the error exit code causes our script to classify them as bug.

Slow Benchmarks (> 8s)

All 8 slow cases are nseq timeouts (nseq hits the 10 s wall-clock limit):

File	seq time	nseq time	ZIPT time
instance06437.smt2	1.369 s (sat)	10.010 s (unknown)	0.367 s (sat)
instance06151.smt2	0.297 s (unsat)	10.010 s (unknown)	0.340 s (unsat)
instance13438.smt2	0.204 s (unsat)	10.010 s (unknown)	0.310 s (unsat)
instance13864.smt2	0.206 s (unsat)	10.010 s (unknown)	0.284 s (unsat)
instance15521.smt2	0.168 s (unsat)	10.010 s (unknown)	0.304 s (unsat)
instance15977.smt2	1.525 s (sat)	10.011 s (unknown)	0.348 s (sat)
instance10648.smt2	2.792 s (sat)	10.010 s (unknown)	0.474 s (sat)
slog_stranger_4071_sink.smt2	5.012 s (unknown)	10.008 s (unknown)	0.762 s (sat)

Trace Analysis: seq-fast / nseq-slow Hypotheses

Four files matched the criterion seq_time < 1.0 s AND nseq_time > 3 × seq_time AND nseq_time > 0.5 s. All four are from the AutomataArk benchmark family (20230329-automatark-lu), and all involve a single string variable X constrained by multiple str.in_re and not (str.in_re) assertions over complex regular expressions.

Common pattern: seq solves these quickly by exploiting its automata-product intersection engine. The trace shows propagate_in_re calls (in seq_regex.cpp:154) that build DFA/NFA products for the intersection of the positive regex constraints. When the intersection automaton becomes empty — e.g., because two regex constraints constrain the string to different lengths or character classes — seq immediately derives unsat via an add_axiom call (in theory_seq.cpp:2976) asserting the literal false. This is a constant number of deterministic automaton operations, completing in 0.168–0.297 s.

Why nseq is slower: nseq (theory_nseq) operates via the Nielsen graph, which decomposes string equations character by character and explores equation extensions. It does not have a dedicated regex-intersection procedure. For these AutomataArk instances — which are primarily regex membership constraints with no explicit concatenation equations — nseq must encode the regex constraints indirectly and explore a large Nielsen graph search space without the early-termination provided by automaton emptiness checking. The nseq solver times out (10 s) on all four files, returning unknown.

Per-file hypotheses:

instance06151.smt2 (seq: 0.297 s unsat, nseq: 10.010 s unknown): The formula constrains X to simultaneously match a phone-number regex using re.loop (4-digit blocks) and a 500-character repeated pattern \xf6\xec\xd9..., plus several negative constraints. The seq trace shows early propagate_in_re narrowing the length of X: the 500-repetition pattern demands |X| ≥ ~3500, but the phone-number regex bounds |X| ≤ ~20. The length inconsistency is detected by seq's length propagation within the regex automaton product, yielding unsat in ~10 automaton steps. nseq lacks this length-from-regex inference and exhausts its budget exploring equation splits.
instance13438.smt2 (seq: 0.204 s unsat, nseq: 10.010 s unknown): A URL-format regex (Host:Port...) with alphanumeric ranges must simultaneously satisfy a hostname regex and several negated patterns. The seq trace shows propagate_in_re detecting that the required first character of X (determined by the positive regex) conflicts with a negated pattern's forced character — a character-class disjointness proof completed via the automaton product's initial states. nseq cannot efficiently reason about character-level constraints from regex membership without full equation unfolding.
instance13864.smt2 (seq: 0.206 s unsat, nseq: 10.010 s unknown): The positive regex demands X starts with specific literal text ("welcomeforToolbarHost:\n"), and the negated regex uses re.++ with a quoted-string escape pattern. seq's propagate_in_re peels the constant prefix of X from the regex, collapsing the remaining regex constraints to a shorter suffix, and quickly finds an empty intersection. nseq cannot apply prefix-stripping optimizations to regex constraints.
instance15521.smt2 (seq: 0.168 s unsat, nseq: 10.010 s unknown): The formula has str.in_re X (re.++ "<" (re.* (re.comp ">")) ">") — an XML-tag pattern — combined with str.in_re X (re.++ "/filename=" (re.* (re.comp "\n")) ".wmx/i") (URL path). seq's trace shows propagate_in_re detecting that the first character of X must be simultaneously < (from regex 1) and / (from regex 3): a trivial character-class conflict resolved in the first automaton step. nseq's equation-splitting approach doesn't exploit this structural first-character conflict.

Per-File Results

Click to expand all 50 results

#	File	seq verdict	seq time (s)	nseq verdict	nseq time (s)	ZIPT verdict	ZIPT time (s)	Notes
1	instance00797.smt2	sat	0.112	sat	0.028	sat	0.281
2	instance07229.smt2	unsat	0.202	unsat	0.040	unsat	0.305
3	muzzle_sat_non_incre_equiv_trans_0_13.smt2	unknown	5.008	unsat	0.064	unsat	0.293
4	instance02989.smt2	sat	0.124	sat	0.029	sat	0.366
5	01_track_51.smt2	sat	0.062	sat	0.023	sat	0.355
6	instance15068.smt2	unsat	0.155	unsat	0.031	unsat	0.288
7	instance13984.smt2	sat	0.945	sat	0.068	sat	0.295
8	slog_stranger_599_sink.smt2	unsat	0.028	unsat	0.022	unsat	0.228
9	instance08251.smt2	sat	2.179	sat	0.046	sat	0.335
10	instance06437.smt2	sat	1.369	unknown	10.010	sat	0.367
11	instance06151.smt2	unsat	0.297	unknown	10.010	unsat	0.340
12	instance13868.smt2	sat	0.882	sat	0.052	sat	0.242
13	instance04480.smt2	sat	0.069	sat	0.029	sat	0.217
14	instance04720.smt2	unknown	5.009	sat	0.056	sat	0.273
15	slog_stranger_2857_sink.smt2	unknown	5.009	sat	0.068	sat	0.854
16	instance07431.smt2	unsat	0.174	unsat	0.043	unsat	0.596
17	instance14839.smt2	unsat	0.037	unsat	0.023	unsat	0.328
18	instance06227.smt2	unsat	0.125	unsat	0.046	unsat	0.264
19	instance13438.smt2	unsat	0.204	unknown	10.010	unsat	0.310
20	instance00198.smt2	sat	0.068	sat	0.028	sat	0.211
21	Lehmann-Rabin_sat_non_incre_equiv_trans_16_1.smt2	unsat	0.027	unsat	0.025	sat	0.250	SOUNDNESS_DISAGREEMENT
22	02_track_1.smt2	unknown	5.008	sat	0.216	sat	0.312
23	instance01216.smt2	sat	0.059	sat	0.028	sat	0.207
24	instance08482.smt2	unsat	0.035	unsat	0.024	unsat	0.364
25	instance14014.smt2	unsat	0.192	unsat	0.039	unsat	0.317
26	instance03118.smt2	sat	0.072	sat	0.029	sat	0.230
27	slog_stranger_2053_sink.smt2	unsat	0.045	unsat	0.024	unsat	0.372
28	instance00372.smt2	sat	0.079	sat	0.028	sat	0.214
29	instance14789.smt2	unsat	0.201	unsat	0.057	unsat	0.348
30	instance09339.smt2	unknown	5.011	sat	0.091	sat	0.298
31	instance13864.smt2	unsat	0.206	unknown	10.010	unsat	0.284
32	instance03401.smt2	sat	0.076	sat	0.029	sat	0.220
33	query4399.smt2	sat	0.629	unknown	0.042	sat	0.227
34	not-contains-1-3-5-130.smt2	bug	0.006	bug	0.006	unknown	0.054	seq/nseq error on (get-value) after unknown
35	slog_stranger_3064_sink.smt2	sat	3.977	sat	0.040	sat	0.300
36	instance08975.smt2	unknown	5.010	sat	0.048	sat	0.254
37	instance14559.smt2	unsat	0.037	unsat	0.025	unsat	0.318
38	query8824.smt2	sat	0.788	unknown	0.045	sat	0.228
39	benchmark_0358.smt2	bug	0.007	bug	0.007	unknown	0.050	seq/nseq error on (get-value) after unknown
40	instance15521.smt2	unsat	0.168	unknown	10.010	unsat	0.304
41	instance10367.smt2	unsat	0.184	unsat	0.032	unsat	0.274
42	instance15977.smt2	sat	1.525	unknown	10.011	sat	0.348
43	instance02506.smt2	sat	1.077	sat	0.052	sat	0.286
44	instance10648.smt2	sat	2.792	unknown	10.010	sat	0.474
45	slog_stranger_4071_sink.smt2	unknown	5.012	unknown	10.008	sat	0.762
46	pcp_instance_183.smt2	unknown	0.213	unknown	0.029	bug	0.135	ZIPT crash on PCP-string instance
47	instance14127.smt2	bug	0.006	bug	0.006	unknown	0.045	seq/nseq error on (get-model) after unknown
48	instance11558.smt2	sat	0.143	sat	0.036	sat	0.256
49	instance04623.smt2	sat	0.056	sat	0.028	sat	0.199
50	instance13318.smt2	unsat	0.036	unsat	0.024	unsat	0.337

Generated automatically by the ZIPT Benchmark workflow on the c3 branch.

AI generated by Qf S Benchmark · history

expires on Mar 26, 2026, 11:47 PM UTC

2026-03-20T01:17:43Z

github-actions[bot]
bot Mar 20, 2026
Author

This discussion has been marked as outdated by Qf S Benchmark.

A newer discussion is available at Discussion #9050.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZIPT Benchmark] Z3 c3 branch — 2026-03-19 #9049

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[ZIPT Benchmark] Z3 c3 branch — 2026-03-19 #9049

Uh oh!

github-actions[bot] bot Mar 19, 2026

Summary

Notable Issues

Soundness Disagreements (Critical)

Crashes / Bugs

Slow Benchmarks (> 8s)

Trace Analysis: seq-fast / nseq-slow Hypotheses

Per-File Results

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 20, 2026 Author

github-actions[bot]
bot Mar 19, 2026

github-actions[bot]
bot Mar 20, 2026
Author