[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9050

2026-03-20T01:17:42Z

github-actions[bot]
bot Mar 20, 2026

Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from tests/QF_S.tar.zst, drawn from 22,172 total)
Timeout: 10 seconds per benchmark (-T:10 for Z3; -t:10000 for ZIPT)
Build: CMake Debug (-DCMAKE_BUILD_TYPE=Debug) + Z3 .NET bindings; ZIPT parikh branch compiled against the freshly built Microsoft.Z3.dll

Summary

Metric	`seq` solver	`nseq` solver	ZIPT solver
sat	20	20	24
unsat	14	12	14
unknown	10	12	6
timeout	0	0	1
bug/crash	6	6	5
Total time (s)	43.239	64.196	24.003
Avg time/benchmark (s)	0.865	1.284	0.480

Soundness disagreements (any two solvers return conflicting sat/unsat): 0

ZIPT is the fastest solver on average and solved the most instances (38 definitive answers vs 34 for seq and 32 for nseq). It did produce 5 "bug" results, all due to unsupported features (see below).

Notable Issues

Soundness Disagreements (Critical)

None detected. All solvers that gave a definitive answer agreed on sat/unsat.

Crashes / Bugs

Both seq and nseq — 6 files (anomalous, likely debug-build transients):

These 6 files produced bug at ~7 ms during the benchmark run, but individually re-running each file produces the expected correct verdict (sat or unsat). The 7 ms execution time suggests Z3 exited immediately — possibly a debug-build assertion triggered by accumulated test harness state, or a false positive from the error-pattern grep catching a transient debug-mode warning. Flagged for further investigation.

File	Retested seq	Retested nseq	Status
`instance12017.smt2`	sat	sat	Likely spurious
`instance01785.smt2`	sat	sat	Likely spurious
`slog_stranger_3017_sink.smt2`	unsat	unsat	Likely spurious
`instance00984.smt2`	sat	sat	Likely spurious
`instance10004.smt2`	sat	sat	Likely spurious
`slog_stranger_4846_sink.smt2`	timeout	timeout	Likely spurious

ZIPT — 5 files (unsupported features):

All 5 ZIPT bugs are Unsupported feature errors, not crashes:

benchmark_0474.smt2, benchmark_0264.smt2, benchmark_0139.smt2: ZIPT does not support str.replace_all (RNA benchmark family)
unsolved_pcp_instance_337.smt2, unsolved_pcp_instance_68.smt2: PCP-string instances with unsupported constructs

Slow Benchmarks (> 8 s for any solver)

File	seq	nseq	ZIPT	Notes
`instance08199.smt2`	0.356 s sat	10.012 s unknown	0.239 s sat	nseq timeout; regex-intersection instance
`instance09020.smt2`	3.983 s sat	10.010 s unknown	0.348 s sat	nseq timeout; complex regex
`instance11183.smt2`	0.318 s unsat	10.010 s unknown	0.423 s unsat	nseq timeout; regex-intersection instance
`instance12133.smt2`	0.251 s unsat	10.010 s unknown	0.380 s unsat	nseq timeout; regex-intersection instance
`sub-matching-sat-31.smt2`	5.009 s unknown	10.009 s unknown	12.010 s timeout	All three solvers fail; hardest file in run
`instance15564.smt2`	5.009 s unknown	10.011 s unknown	0.279 s sat	ZIPT uniquely solves

Interesting: nseq faster than seq

Two files where seq reported unknown (hit the 5 s internal timeout) but nseq solved quickly:

File	seq	nseq	ZIPT
`instance10493.smt2`	unknown 5.009 s	sat 0.045 s	sat 0.237 s
`instance12876.smt2`	unknown 5.008 s	sat 0.847 s	sat 0.318 s

These two cases hint that nseq's Nielsen-graph approach handles certain satisfiable string-equation instances more efficiently than seq's automata-based approach.

Interesting: ZIPT uniquely solves

Two files where both Z3 solvers returned unknown but ZIPT found the answer:

File	seq	nseq	ZIPT
`slog_stranger_5144_sink.smt2`	unknown 5.012 s	unknown 2.049 s	sat 0.968 s
`instance15564.smt2`	unknown 5.009 s	unknown 10.011 s	sat 0.279 s

Trace Analysis: seq-fast / nseq-slow Hypotheses

Three files met the criterion seq_time < 1.0 s AND nseq_time > 3 × seq_time AND nseq_time > 0.5 s:

`instance08199.smt2` — seq: 0.356 s (sat), nseq: 10.012 s (unknown)

The problem asserts X ∉ "Mirar_KeywordContent\u{13}\u{a}" and X ∉ re.loop[4..6](digit)\u{a}. The seq trace (from seq_rewriter.cpp:5196 mk_eq_core) shows that seq immediately normalises the regex membership constraints into character-unit axioms for the concrete string value "Mirar_KeywordContent\u{13}\u{a}". The sequence solver's automata engine intersects the two negated-membership automata and finds a satisfying string in its first exploration. By contrast, nseq lacks a built-in automata-based regex engine — it delegates regex reasoning to the underlying seq framework only at refutation points, so it must iterate through its Nielsen-graph derivations for an open-ended amount of time before arriving at a model. The absence of early length and regex-intersection propagation in nseq is the likely bottleneck here.

`instance11183.smt2` — seq: 0.318 s (unsat), nseq: 10.010 s (unknown)

The problem involves five str.in_re constraints including complex loops and character-ranges, plus one concrete-string non-membership assertion. The seq trace begins (at mk_eq_core) by enumerating the full character expansion of the concrete candidate "name=Emailbadurl.grandstreetinteractive.comHost:stepwww.kornputers.com\u{a}" and quickly derives a character-unit axiom set. Seq's product-automaton intersection then determines that the intersection of all required regex languages is empty (unsat) within its first DPLL conflict. nseq has no equivalent automata-product step; it relies on the Nielsen graph to enumerate equation solutions, but complex regex constraints are only checked lazily. With multiple overlapping regex constraints and a 40+-character domain string, nseq's graph exploration space is too large to terminate within 10 s.

`instance12133.smt2` — seq: 0.251 s (unsat), nseq: 10.010 s (unknown)

The problem requires X to match a regex anchored at "4" followed by a 12–15 digit loop and \u{a}, simultaneously matching a /GET prefix and a 152-repetition hexadecimal suffix. The seq trace shows rapid character-unit axiom derivation for the concrete witness "Searchdata2.activshopper.comdll?productsCUSTOMSAccwww.locators.com\u{a}". Seq's regex engine detects the length incompatibility between the 4+12–15+1 = 17–20 character constraint and the /GET + 152-hex + "/m\u{a}" = 157+ character constraint through length arithmetic on the automata intersection, producing an unsat result very quickly. nseq has no equivalent length-from-regex inference; it must explore equation splits until it exhaustively rules out all solutions, which exceeds the 10 s budget.

Per-File Results

Click to expand all 50 results

#	File	seq verdict	seq time (s)	nseq verdict	nseq time (s)	ZIPT verdict	ZIPT time (s)
1	instance05365.smt2	sat	.975	sat	.030	sat	.284
2	benchmark_0474.smt2	unknown	.901	unknown	.035	bug	.107
3	instance12017.smt2	bug	.007	bug	.007	unknown	.036
4	instance01487.smt2	sat	.086	sat	.030	sat	.216
5	instance02001.smt2	sat	.476	sat	.034	sat	.184
6	instance08853.smt2	sat	.186	sat	.040	sat	.259
7	benchmark_0264.smt2	unknown	.918	unknown	.036	bug	.107
8	instance01785.smt2	bug	.007	bug	.006	unknown	.036
9	instance10493.smt2	unknown	5.009	sat	.045	sat	.237
10	instance08199.smt2	sat	.356	unknown	10.012	sat	.239
11	slog_stranger_5418_sink.smt2	unsat	.027	unsat	.024	unsat	.214
12	instance01627.smt2	sat	.083	sat	.031	sat	.196
13	instance01442.smt2	sat	.139	sat	.037	sat	.216
14	instance15800.smt2	unsat	.036	unsat	.023	unsat	.253
15	instance09020.smt2	sat	3.983	unknown	10.010	sat	.348
16	slog_stranger_2129_sink.smt2	unsat	.034	unsat	.026	unsat	.278
17	instance10482.smt2	sat	.122	sat	.042	sat	.214
18	unsolved_pcp_instance_337.smt2	unknown	.218	unknown	.030	bug	.120
19	slog_stranger_3017_sink.smt2	bug	.007	bug	.006	unknown	.035
20	instance11775.smt2	unsat	.123	unsat	.038	unsat	.261
21	slog_stranger_574_sink.smt2	unsat	.057	unsat	.024	unsat	.533
22	instance01180.smt2	sat	.049	sat	.028	sat	.186
23	instance11183.smt2	unsat	.318	unknown	10.010	unsat	.423
24	instance11288.smt2	unsat	.222	unsat	.035	unsat	.345
25	instance12133.smt2	unsat	.251	unknown	10.010	unsat	.380
26	instance07400.smt2	unsat	.115	unsat	.052	unsat	.232
27	instance15991.smt2	unsat	.041	unsat	.030	unsat	.339
28	04_track_129.smt2	unsat	1.352	unsat	.040	unsat	.491
29	slog_stranger_2767_sink.smt2	unsat	.042	unsat	.025	unsat	.357
30	instance15832.smt2	sat	1.100	sat	.041	sat	.282
31	instance10528.smt2	unsat	.131	unsat	.043	unsat	.245
32	slog_stranger_5144_sink.smt2	unknown	5.012	unknown	2.049	sat	.968
33	instance00984.smt2	bug	.007	bug	.007	unknown	.036
34	instance00902.smt2	sat	.076	sat	.030	sat	.216
35	instance08586.smt2	unsat	.034	unsat	.024	unsat	.285
36	instance07930.smt2	sat	.042	sat	.030	sat	.316
37	instance00888.smt2	sat	.107	sat	.034	sat	.198
38	instance00301.smt2	sat	.093	sat	.031	sat	.244
39	instance12876.smt2	unknown	5.008	sat	.847	sat	.318
40	instance01358.smt2	sat	.082	sat	.029	sat	.221
41	01_track_39.smt2	sat	.374	sat	.044	sat	.277
42	instance07585.smt2	sat	.082	sat	.029	sat	.202
43	unsolved_pcp_instance_68.smt2	unknown	.207	unknown	.031	bug	.125
44	instance10004.smt2	bug	.007	bug	.007	unknown	.037
45	sub-matching-sat-31.smt2	unknown	5.009	unknown	10.009	timeout	12.010
46	instance00686.smt2	sat	.031	sat	.023	sat	.247
47	instance15564.smt2	unknown	5.009	unknown	10.011	sat	.279
48	instance00164.smt2	sat	3.760	sat	.038	sat	.221
49	benchmark_0139.smt2	unknown	.921	unknown	.036	bug	.112
50	slog_stranger_4846_sink.smt2	bug	.007	bug	.007	unknown	.038

Generated automatically by the ZIPT Benchmark workflow on the c3 branch.

AI generated by Qf S Benchmark · history

expires on Mar 27, 2026, 1:17 AM UTC

2026-03-20T07:52:52Z

github-actions[bot]
bot Mar 20, 2026
Author

This discussion has been marked as outdated by Qf S Benchmark.

A newer discussion is available at Discussion #9052.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9050

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9050

Uh oh!

github-actions[bot] bot Mar 20, 2026

Summary

Notable Issues

Soundness Disagreements (Critical)

Crashes / Bugs

Slow Benchmarks (> 8 s for any solver)

Interesting: nseq faster than seq

Interesting: ZIPT uniquely solves

Trace Analysis: seq-fast / nseq-slow Hypotheses

instance08199.smt2 — seq: 0.356 s (sat), nseq: 10.012 s (unknown)

instance11183.smt2 — seq: 0.318 s (unsat), nseq: 10.010 s (unknown)

instance12133.smt2 — seq: 0.251 s (unsat), nseq: 10.010 s (unknown)

Per-File Results

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 20, 2026 Author

github-actions[bot]
bot Mar 20, 2026

`instance08199.smt2` — seq: 0.356 s (sat), nseq: 10.012 s (unknown)

`instance11183.smt2` — seq: 0.318 s (unsat), nseq: 10.010 s (unknown)

`instance12133.smt2` — seq: 0.251 s (unsat), nseq: 10.010 s (unknown)

github-actions[bot]
bot Mar 20, 2026
Author