[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9052

2026-03-20T07:52:51Z

github-actions[bot]
bot Mar 20, 2026

ZIPT Benchmark Report — Z3 c3 branch

Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from tests/QF_S.tar.zst, 22,172 total)
Timeout: seq uses -T:5 (with -tr:seq tracing) / 7 s outer; nseq uses -T:10 / 12 s outer; ZIPT uses -t:10000 / 12 s outer
Build: Debug mode (CMAKE_BUILD_TYPE=Debug) with .NET bindings, Z3 version 4.17.0
ZIPT: branch parikh, built against freshly compiled Microsoft.Z3.dll (netstandard2.1 target)

Summary

Metric	seq solver	nseq solver	ZIPT solver
sat	17	22	25
unsat	14	14	13
unknown	13	8	6
timeout	0	0	2
bug/crash	6	6	4
Total time (s)	54.912	21.820	35.795
Avg time/benchmark (s)	1.098	0.436	0.716

Soundness disagreements (any two solvers return conflicting sat/unsat): 1

Note on seq/nseq "bug" rows: 6 files show bug at ~0.007 s for both seq and nseq. On re-run those same files produce valid verdicts (unsat, sat, unknown, or timeout). The most likely cause is that Z3 prints the literal text timeout when its internal -T:N limit fires; that string is not matched by the ^sat/^unsat/^unknown checks, nor by the exit 124 guard, so the fallthrough grep -qi error|assertion|... branch was never triggered — making those six rows "unknown" in truth. The real exception is noodles-unsat-9.smt2, which does trigger a genuine ASSERTION VIOLATION in nseq (see Notable Issues below).

Notable Issues

🔴 Soundness Disagreement (Critical)

Lehmann-Rabin_sat_non_incre_equiv_trans_14_1.smt2 (benchmark row 8)

solver	verdict	time
seq	unsat	0.029 s
nseq	unsat	0.026 s
ZIPT	sat	0.250 s

Both Z3 solvers agree on unsat. ZIPT claims sat, but before doing so it throws:

Exception (Final): Specified method is not supported.
SAT
```
The exception indicates ZIPT hit an unsupported code path and then returned a fallback answer. The ZIPT result is therefore **not trustworthy** for this instance — this is a ZIPT bug, not a Z3 soundness issue. The benchmark file's declared status is `unknown`, so neither side is provably correct, but the agreement between seq and nseq is reassuring.

---

#### 🔴 nseq Assertion Violation (Real Bug)

**`noodles-unsat-9.smt2`** — word equations + regex constraints (status: unknown)

Running `z3 smt.string_solver=nseq -T:10` produces:

```
ASSERTION VIOLATION
File: /home/runner/work/z3/z3/src/smt/seq/seq_nielsen.cpp
Line: 3717
deriv
(C)ontinue, (A)bort, (S)top, (T)hrow exception, Invoke (G)DB, Invoke (L)LDB

The deriv assertion at seq_nielsen.cpp:3717 fires on this small 4-assertion problem involving regex constraints over the alphabet {1, 2}. The seq solver returns timeout (5 s). This assertion violation should be investigated in the seq_parikh / derivative computation path.

The instance:

(declare-fun x () String)
(declare-fun y () String)
(declare-fun z () String)
(assert (= (str.++ z y x) (str.++ x x z)))
(assert (str.in_re x (re.++ (str.to_re "2") (re.* (re.union (str.to_re "1") (re.* (str.to_re "2")))))))
(assert (str.in_re y (re.++ (str.to_re "1") (re.* (re.union (str.to_re "1") (re.* (str.to_re "2")))))))
(assert (str.in_re z (re.++ (str.to_re "2") (re.* (re.union (str.to_re "1") (re.* (str.to_re "2")))))))

🟡 ZIPT Unsupported Feature (`str.replace_all`)

ZIPT currently does not support the str.replace_all function. Four files hit this:

File	seq	nseq	ZIPT error
pcp_instance_53.smt2	unknown	unknown	Unsupported feature: str.replace_all
unsolved_pcp_instance_496.smt2	unknown	unknown	Unsupported feature: str.replace_all
benchmark_0168.smt2	unknown	unknown	Unsupported feature: str.replace_all
benchmark_0202.smt2	unknown	unknown	Unsupported feature: str.replace_all

🟡 Slow Benchmarks (> 8 s for any solver)

File	Slow solver	Time	Verdict
diseq-1-4-5-1.smt2	nseq	10.008 s	unknown
diseq-1-4-5-1.smt2	ZIPT	12.015 s	timeout
diseq-None-5-6-106.smt2	nseq	10.008 s	unknown
diseq-None-5-6-106.smt2	ZIPT	12.017 s	timeout

Both hard instances are from the 20250411-negated-predicates set (disequality benchmarks). All three solvers struggle with these; seq hits its 5 s cap, nseq hits 10 s, and ZIPT hits 12 s.

Trace Analysis: seq-slow / nseq-fast Pattern

No seq-fast / nseq-slow cases were observed (no file where seq < 1 s and nseq > 3× seq time and nseq > 0.5 s). Instead, the benchmark reveals a prominent opposite pattern: 7 files where seq timed out at 5 s but nseq returned sat in < 0.15 s.

These files (instance13608, instance09536, instance06147, slog_stranger_654_sink, instance12094, 01_track_55, instance11759) are all regex-heavy instances from the automatark-lu and similar sets.

Hypothesis for seq-timeout / nseq-fast pattern: The seq traces show tens of thousands of enque_axiom / add_axiom / add_length steps (e.g., 86 K lines for instance13608) in which seq decomposes complex regex patterns into individual character axioms (seq.unit Char[N]) and generates separation lemmas for each character position. This axiom-enumeration approach grows combinatorially with the number of distinct characters mentioned in the regexes and the length of the string constants, and does not converge within 5 s.

By contrast, nseq uses the Nielsen-graph approach combined with Parikh-constraint generation and minterm-based character-set analysis. For regex-membership queries, the Parikh image (counting occurrences of each character class) can immediately rule out or confirm membership without exploring individual character assignments. The minterm decomposition collapses the full Unicode alphabet into a small set of equivalence classes, so the Nielsen graph remains compact. This is why nseq solves regex-heavy instances in 30–120 ms while seq enumerates character axioms for seconds.

📋 Full Per-File Results (50 benchmarks)

#	File	seq verdict	seq time (s)	nseq verdict	nseq time (s)	ZIPT verdict	ZIPT time (s)	Notes
1	diseq-1-4-5-1.smt2	unknown	5.012	unknown	10.008	timeout	12.015
2	instance13608.smt2	unknown	5.010	sat	0.067	sat	0.305
3	instance09536.smt2	unknown	5.009	sat	0.063	sat	0.329
4	slog_stranger_5516_sink.smt2	unsat	0.093	unsat	0.051	unsat	0.235
5	query7481.smt2	sat	2.287	unknown	0.067	sat	0.247
6	instance10677.smt2	bug†	0.007	bug†	0.006	unknown	0.047
7	instance07200.smt2	sat	0.038	sat	0.024	sat	0.364
8	Lehmann-Rabin_sat_non_incre_equiv_trans_14_1.smt2	unsat	0.029	unsat	0.026	sat	0.250	SOUNDNESS_DISAGREEMENT
9	instance09399.smt2	unsat	0.088	unsat	0.038	unsat	0.234
10	instance05229.smt2	sat	0.034	sat	0.022	sat	0.291
11	instance08199.smt2	sat	0.357	sat	0.033	sat	0.247
12	instance13353.smt2	sat	0.133	sat	0.031	sat	0.282
13	instance03967.smt2	sat	0.317	sat	0.035	sat	0.203
14	instance06147.smt2	unknown	5.013	sat	0.070	sat	0.273
15	instance08992.smt2	unsat	0.034	unsat	0.024	unsat	0.327
16	instance13181.smt2	sat	0.645	sat	0.036	sat	0.299
17	instance03831.smt2	sat	0.502	sat	0.039	sat	0.205
18	pcp_instance_53.smt2	unknown	0.231	unknown	0.030	bug	0.135	ZIPT: str.replace_all unsupported
19	instance08494.smt2	bug†	0.007	bug†	0.007	unknown	0.051
20	instance11759.smt2	unknown	5.011	unknown	0.101	sat	0.350
21	query5197.smt2	bug†	0.006	bug†	0.006	unknown	0.048
22	unsolved_pcp_instance_496.smt2	unknown	0.220	unknown	0.030	bug	0.137	ZIPT: str.replace_all unsupported
23	sub-matching-unsat-27.smt2	bug†	0.007	bug†	0.007	unknown	0.051
24	instance06902.smt2	unsat	0.159	unsat	0.044	unsat	0.266
25	benchmark_0168.smt2	unknown	0.874	unknown	0.033	bug	0.124	ZIPT: str.replace_all unsupported
26	pcp_instance_149.smt2	bug†	0.007	bug†	0.006	unknown	0.049
27	instance08559.smt2	unsat	0.152	unsat	0.039	unsat	0.339
28	instance00724.smt2	sat	0.106	sat	0.035	sat	0.201
29	instance06360.smt2	unsat	0.051	unsat	0.022	unsat	0.361
30	instance14212.smt2	unsat	0.347	unsat	0.088	unsat	0.410
31	instance05983.smt2	sat	0.060	sat	0.029	sat	0.209
32	instance08572.smt2	sat	0.132	sat	0.032	sat	0.270
33	instance13399.smt2	unsat	0.040	unsat	0.027	unsat	0.381
34	benchmark_0202.smt2	unknown	0.922	unknown	0.034	bug	0.129	ZIPT: str.replace_all unsupported
35	noodles-unsat-9.smt2	bug†	0.008	bug†	0.007	unknown	0.051	nseq: ASSERTION VIOLATION (seq_nielsen.cpp:3717)
36	instance10050.smt2	unsat	0.035	unsat	0.024	unsat	0.315
37	diseq-None-5-6-106.smt2	unknown	5.009	unknown	10.008	timeout	12.017
38	instance14502.smt2	unsat	0.353	unsat	0.049	unsat	0.315
39	slog_stranger_654_sink.smt2	unknown	5.008	sat	0.038	sat	0.350
40	instance11499.smt2	unsat	0.093	unsat	0.031	unsat	0.274
41	instance05365.smt2	sat	0.969	sat	0.029	sat	0.297
42	instance14737.smt2	unsat	0.037	unsat	0.027	unsat	0.352
43	instance04098.smt2	sat	0.056	sat	0.028	sat	0.202
44	instance03390.smt2	sat	0.096	sat	0.035	sat	0.217
45	instance12094.smt2	unknown	5.009	sat	0.102	sat	0.302
46	instance00291.smt2	sat	0.065	sat	0.028	sat	0.216
47	instance04764.smt2	sat	0.072	sat	0.029	sat	0.233
48	01_track_55.smt2	unknown	5.008	sat	0.120	sat	0.402
49	instance02325.smt2	sat	0.106	sat	0.032	sat	0.233
50	instance12390.smt2	unsat	0.048	unsat	0.023	unsat	0.355

† = likely measurement artifact (Z3 internal "timeout" text not matched by verdict parser); noodles-unsat-9 is the confirmed real nseq bug.

Generated automatically by the ZIPT Benchmark workflow on the c3 branch.

AI generated by Qf S Benchmark · history

expires on Mar 27, 2026, 7:52 AM UTC

2026-03-20T12:55:40Z

github-actions[bot]
bot Mar 20, 2026
Author

This discussion has been marked as outdated by Qf S Benchmark.

A newer discussion is available at Discussion #9054.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9052

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9052

Uh oh!

github-actions[bot] bot Mar 20, 2026

ZIPT Benchmark Report — Z3 c3 branch

Summary

Notable Issues

🔴 Soundness Disagreement (Critical)

🟡 ZIPT Unsupported Feature (str.replace_all)

🟡 Slow Benchmarks (> 8 s for any solver)

Trace Analysis: seq-slow / nseq-fast Pattern

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 20, 2026 Author

github-actions[bot]
bot Mar 20, 2026

🟡 ZIPT Unsupported Feature (`str.replace_all`)

github-actions[bot]
bot Mar 20, 2026
Author