[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9054

2026-03-20T12:55:38Z

github-actions[bot]
bot Mar 20, 2026

Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from tests/QF_S.tar.zst, total pool: 22,172 files)
Timeout: 10 seconds per benchmark (-T:10 for Z3 nseq; -T:5 for seq+trace; -t:10000 for ZIPT)
Build: Debug (CMAKE_BUILD_TYPE=Debug, Z3 4.17.0)

Summary

Metric	seq solver	nseq solver	ZIPT solver
sat	26	27	28
unsat	17	18	18
unknown	7	5	0
timeout	0	0	0
bug/crash	0	0	4
Total time (s)	36.931	3.669	13.279
Avg time/benchmark (s)	0.739	0.073	0.266

Soundness disagreements (any two solvers return conflicting sat/unsat): 0

Headline: nseq is ~10× faster than seq overall (3.67 s vs 36.93 s total wall-clock time) and solves 2 more instances (45 vs 43 definitive answers). No soundness bugs were found between any pair of solvers.

Notable Issues

Soundness Disagreements (Critical)

None found. All solvers agree on every instance where they both return a definitive answer.

Crashes / Bugs

All 4 ZIPT bug verdicts are unsupported-feature errors, not crashes. ZIPT outputs:

Unsupported feature: Function (declare-fun str.replace_all (String String String) String) currently not supported
```
Affected files (all from `20250403-rna/rna-sat/` and `20250403-pcp-string/`):
- `benchmark_0438.smt2`
- `benchmark_0013.smt2`
- `benchmark_0190.smt2`
- `pcp_instance_418.smt2`

**Latent nseq assertion violation** (`query3188.smt2`, `2020-sygus-qgen` suite):  
When nseq is forced to build a model (by removing the time limit), it triggers:
```
ASSERTION VIOLATION
File: src/smt/seq_model.cpp
Line: 317
UNEXPECTED CODE WAS REACHED.

During the timed benchmark run nseq returned unknown in 35 ms without hitting the assertion. seq correctly solves this instance as sat in 845 ms. ZIPT also returns sat in 207 ms. This indicates a bug in nseq's model-construction path for the type of constraints in query3188.smt2.

Slow Benchmarks (> 8s)

No benchmark exceeded 8 s for any solver (seq's worst was 5.009 s, the internal T:5 hard cap; nseq's worst was 0.916 s).

Cases Where nseq Solved But seq Did Not

Three instances where nseq (and ZIPT) found a definitive answer but seq hit its timeout:

File	seq	nseq	ZIPT
`dining-cryptographers_sat_non_incre_equiv_init_0_7.smt2`	unknown (5.009s)	unsat (0.497s)	unsat (0.335s)
`03_track_76.smt2`	unknown (5.009s)	sat (0.916s)	sat (0.394s)
`slog_stranger_3309_sink.smt2`	unknown (5.009s)	sat (0.243s)	sat (0.407s)

Case Where seq Solved But nseq Did Not

One instance where seq returned sat but nseq returned unknown:

File	seq	nseq	ZIPT
`query3188.smt2`	sat (0.845s)	unknown (0.035s)	sat (0.207s)

As noted above, nseq has a latent assertion violation in its model-construction path for this instance.

Trace Analysis: seq-fast / nseq-slow Hypotheses

No seq-fast / nseq-slow cases were observed in this run.

The trend is the opposite: nseq is uniformly faster than seq on every instance where both solvers produce the same verdict. The overall 10× speedup in nseq is consistent and covers both sat and unsat instances across all benchmark families tested (automatark-lu, woorpje-lu, slog, sygus-qgen, rna-sat, pcp-string).

The seq trace for dining-cryptographers_sat_non_incre_equiv_init_0_7.smt2 (which seq could not solve within 5 s) shows the solver spending most of its time in:

Repeated mk_eq_core equality normalisations (thousands of X == varout entries)
Character-level axiom enqueuing (enque_axiom for seq.unit characters)
Regex propagation (propagate_in_re) for complex union/concatenation regexes
Length-limit assertions (add_length, seq.length_limit[1:varout])

The seq solver appears to expand the regex structure into character-level axioms rather than reasoning about it holistically. nseq likely uses a more direct Parikh/automata-based argument to refute the constraints without character-level case splitting.

Per-File Results

Click to expand — all 50 benchmark results

#	File	seq verdict	seq time (s)	nseq verdict	nseq time (s)	ZIPT verdict	ZIPT time (s)	Notes
1	`instance11200.smt2`	unsat	0.038	unsat	0.028	unsat	0.314
2	`instance05074.smt2`	sat	0.075	sat	0.031	sat	0.201
3	`instance08489.smt2`	unsat	0.169	unsat	0.059	unsat	0.239
4	`instance15382.smt2`	sat	0.106	sat	0.033	sat	0.219
5	`dining-cryptographers_sat_non_incre_equiv_init_0_7.smt2`	unknown	5.009	unsat	0.497	unsat	0.335
6	`benchmark_0438.smt2`	unknown	0.914	unknown	0.035	bug	0.110	ZIPT: str.replace_all unsupported
7	`instance06903.smt2`	sat	0.049	sat	0.033	sat	0.225
8	`instance05630.smt2`	sat	0.078	sat	0.031	sat	0.232
9	`instance01549.smt2`	sat	0.623	sat	0.041	sat	0.209
10	`instance15219.smt2`	unsat	0.267	unsat	0.054	unsat	0.341
11	`instance13864.smt2`	unsat	0.206	unsat	0.055	unsat	0.279
12	`instance04058.smt2`	sat	0.820	sat	0.034	sat	0.194
13	`instance10667.smt2`	unsat	0.572	unsat	0.047	unsat	0.292
14	`instance01181.smt2`	sat	0.104	sat	0.033	sat	0.250
15	`instance12719.smt2`	unsat	0.039	unsat	0.027	unsat	0.313
16	`instance04793.smt2`	sat	3.061	sat	0.089	sat	0.254
17	`instance06019.smt2`	unsat	0.038	unsat	0.024	unsat	0.333
18	`instance15262.smt2`	unsat	0.142	unsat	0.033	unsat	0.307
19	`instance02487.smt2`	sat	0.125	sat	0.031	sat	0.271
20	`benchmark_0013.smt2`	unknown	0.949	unknown	0.036	bug	0.112	ZIPT: str.replace_all unsupported
21	`instance06027.smt2`	sat	0.154	sat	0.040	sat	0.286
22	`instance11858.smt2`	unsat	0.036	unsat	0.025	unsat	0.316
23	`instance06416.smt2`	sat	4.769	sat	0.124	sat	0.312
24	`instance08037.smt2`	unsat	0.265	unsat	0.057	unsat	0.328
25	`instance01652.smt2`	sat	0.076	sat	0.030	sat	0.215
26	`benchmark_0190.smt2`	unknown	0.890	unknown	0.034	bug	0.107	ZIPT: str.replace_all unsupported
27	`instance08668.smt2`	sat	0.095	sat	0.034	sat	0.322
28	`instance12920.smt2`	sat	1.249	sat	0.110	sat	0.270
29	`instance01206.smt2`	sat	0.098	sat	0.030	sat	0.215
30	`instance15469.smt2`	sat	0.897	sat	0.046	sat	0.311
31	`query3188.smt2`	sat	0.845	unknown	0.035	sat	0.207	nseq: latent assertion violation in seq_model.cpp:317
32	`instance12921.smt2`	unsat	0.026	unsat	0.024	unsat	0.289
33	`instance04285.smt2`	sat	0.136	sat	0.035	sat	0.247
34	`instance02474.smt2`	sat	0.117	sat	0.031	sat	0.283
35	`instance02273.smt2`	sat	0.035	sat	0.024	sat	0.292
36	`slog_stranger_4971_sink.smt2`	unsat	0.027	unsat	0.025	unsat	0.211
37	`03_track_76.smt2`	unknown	5.009	sat	0.916	sat	0.394
38	`instance01832.smt2`	sat	0.103	sat	0.030	sat	0.220
39	`instance00006.smt2`	sat	0.102	sat	0.040	sat	0.234
40	`pcp_instance_418.smt2`	unknown	0.220	unknown	0.031	bug	0.123	ZIPT: str.replace_all unsupported
41	`instance01431.smt2`	sat	0.036	sat	0.026	sat	0.186
42	`slog_stranger_3309_sink.smt2`	unknown	5.009	sat	0.243	sat	0.407
43	`instance06168.smt2`	unsat	0.179	unsat	0.039	unsat	0.274
44	`instance09484.smt2`	unsat	0.040	unsat	0.029	unsat	0.345
45	`03_track_110.smt2`	sat	0.152	sat	0.080	sat	0.239
46	`instance10830.smt2`	unsat	0.146	unsat	0.039	unsat	0.258
47	`instance01565.smt2`	sat	1.307	sat	0.072	sat	0.272
48	`instance10351.smt2`	unsat	0.189	unsat	0.061	unsat	0.310
49	`instance04659.smt2`	sat	0.032	sat	0.024	sat	0.253
50	`instance13803.smt2`	unsat	1.308	unsat	0.084	unsat	0.523

Generated automatically by the ZIPT Benchmark workflow on the c3 branch.

AI generated by Qf S Benchmark · history

expires on Mar 27, 2026, 12:55 PM UTC

2026-03-20T13:45:07Z

github-actions[bot]
bot Mar 20, 2026
Author

This discussion has been marked as outdated by Qf S Benchmark.

A newer discussion is available at Discussion #9057.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9054

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9054

Uh oh!

github-actions[bot] bot Mar 20, 2026

Summary

Notable Issues

Soundness Disagreements (Critical)

Crashes / Bugs

Slow Benchmarks (> 8s)

Cases Where nseq Solved But seq Did Not

Case Where seq Solved But nseq Did Not

Trace Analysis: seq-fast / nseq-slow Hypotheses

Per-File Results

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 20, 2026 Author

github-actions[bot]
bot Mar 20, 2026

github-actions[bot]
bot Mar 20, 2026
Author