You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2026-03-27 Branch: c3 (HEAD ebd35bc — Avoid doing extensions steps twice, as the bounds are recognized too late) Benchmark set: QF_S — 200 randomly selected files from tests/QF_S.tar.zst (22 172 total available) Timeouts: seq -T:5 (5 s Z3-internal, 7 s wall); nseq -T:5 (5 s Z3-internal, 12 s wall); ZIPT -t:5000 (5 s, 12 s wall) Build: Debug (CMAKE_BUILD_TYPE=Debug), Z3 4.17.0; ZIPT built from parikh branch (CEisenhofer/ZIPT)
Summary
Metric
seq solver
nseq solver
ZIPT solver
sat
78
102
105
unsat
67
70
71
unknown
51
22
4
timeout
0
0
3
bug/crash
4
6
17
Total time (s)
261.547
32.763
107.328
Avg time/benchmark (s)
1.308
0.164
0.537
nseq is ~8× faster than seq on average and solves 27 more benchmarks definitively (172 vs 145). ZIPT solves the most definitively (176) but has 17 feature-limitation failures and 3 timeouts.
Both Z3 string solvers agree the formula is SAT, but ZIPT returns UNSAT. This is the highest-priority finding: either ZIPT has a soundness bug on this instance, or (less likely) both Z3 solvers share a common bug. The file is from the 2019-Jiang/slog benchmark family. Recommended action: manually verify the model produced by seq/nseq against the formula.
🟠 Crashes / Assertion Violations
nseq — assertion violation (2 files)
Both not-contains-1-5-6-106.smt2 and not-contains-1-4-5-114.smt2 trigger a hard assertion failure in nseq:
ASSERTION VIOLATION
File: src/smt/seq/seq_nielsen.cpp
Line: 1505
Failed to verify: ext
```
These files are from the `not-contains` benchmark family. The `ext` assertion in `seq_nielsen.cpp:1505` is violated, indicating a bug in the Nielsen graph extension step on these inputs. Both files are classified SAT by ZIPT.
**seq / nseq — 4 apparent startup-warning false positives**
`instance12638.smt2`, `instance07009.smt2`, `instance06371.smt2`, `slog_stranger_500_sink.smt2` were classified `bug` by the harness (pattern match on "error"/"assertion" in stderr). Re-running these files individually confirms clean `sat`/`unsat` output. These were likely caused by transient Z3 debug-mode startup warnings being captured during the initial run and should be treated as **false positives** in the bug count above.
**ZIPT — 17 unsupported-feature failures**
All 17 ZIPT `bug` classifications are `Unsupported feature: Function (declare-fun str.replace_all …) currently not supported`. ZIPT does not implement `str.replace_all`, so any benchmark using it produces an error exit. Affected benchmark families: `20250403-rna`, `20250403-pcp-string`, `wildcard-matching-regex`. These are feature gaps, not logic bugs.
---
#### 🟡 Slow Benchmarks (any solver > 8 s wall-clock)
| File | Solver | Time (s) | Verdict |
|------|--------|---------|---------|
| `wildcard-matching-regex-16.smt2` | ZIPT | 12.010 | timeout |
| `wildcard-matching-regex-73.smt2` | ZIPT | 12.010 | timeout |
| `sub-matching-sat-41.smt2` | ZIPT | 12.010 | timeout |
| `wildcard-matching-regex-07.smt2` | ZIPT | 9.282 | bug (`str.replace_all`) |
All slow cases are ZIPT on wildcard-matching and sub-matching benchmarks. seq and nseq both return `unknown` on the wildcard-matching files (seq within 5 s, nseq within 0.7 s), suggesting these are genuinely hard instances for all string solvers in this benchmark set.
---
#### 🟡 seq Timeouts (instances where nseq succeeded)
seq returned `unknown` (timed out at 5 s) on **51** files. Of these, nseq solved **29 as sat** and **0 as unsat** — meaning nseq found witnesses that seq could not confirm within the time limit. These files tend to come from the `instance*` family (likely symbolic execution / path constraint benchmarks). seq's rewrite-based approach appears to struggle with satisfiable instances that require discovering a long witness string, whereas nseq's Parikh/Nielsen approach finds the solution much faster.
One notable case: `slog_stranger_2973_sink.smt2` — seq timed out (5.011 s), nseq solved in **3.217 s** (its second-longest runtime), ZIPT solved in 0.746 s. This suggests nseq was also working hard on this instance, unlike the typical 40–100 ms it spends on most sat instances.
---
#### Trace Analysis: seq-fast / nseq-slow Hypothesis
**One candidate matched the criterion** (seq < 1.0 s, nseq > 3× seq time, nseq > 0.5 s):
**`slog_stranger_1385_sink.smt2`** — seq: 0.218 s (unsat), nseq: 0.678 s (unsat)
The seq trace (`-tr:seq`) shows a rapid sequence of `mk_eq_core` rewriting steps that immediately expose the contradiction. The key entries are:
```
x_16 == literal_8 ("report" — 6 chars)
x_16 == literal_12 ("penalties" — 9 chars)
x_16 == literal_5 ("" — 0 chars)
Variable x_16 is simultaneously constrained to three ground string constants of different lengths. The seq rewriter detects the length clash at the mk_eq_core level (file seq_rewriter.cpp:5217) during the very first propagation wave, requiring no search at all — the contradiction is structurally visible without any case splitting.
The nseq solver (3.1× slower at 0.678 s) must first internalize these equations into its Nielsen graph, then run at least one round of simplification and Parikh-constraint generation before the length inconsistency becomes apparent to the SAT solver. The Parikh-based length propagation is more general than seq's immediate ground-term clash detection, but incurs a fixed overhead even on trivially inconsistent instances. For benchmarks where ground-string equalities immediately reveal a contradiction by length or content, seq's rewriter has a structural advantage over nseq's graph-based pipeline.
Per-File Results
Click to expand full per-file table (200 rows)
#
File
seq verdict
seq time (s)
nseq verdict
nseq time (s)
ZIPT verdict
ZIPT time (s)
Notes
1
instance15518.smt2
sat
.313
sat
.078
sat
.414
2
instance09527.smt2
sat
.720
sat
.045
sat
.254
3
instance07965.smt2
unsat
.352
unsat
.048
unsat
.335
4
slog_stranger_430_sink.smt2
unsat
.031
unsat
.022
unsat
.327
5
instance05065.smt2
sat
.334
sat
.071
sat
.243
6
instance14715.smt2
sat
3.384
sat
.040
sat
.472
7
instance09475.smt2
unsat
.236
unsat
.044
unsat
.438
8
instance08497.smt2
sat
1.137
sat
.039
sat
.298
9
slog_stranger_3942_sink.smt2
unknown
5.009
sat
.074
sat
.379
10
instance04073.smt2
sat
.032
sat
.022
sat
.300
11
instance01651.smt2
sat
.087
sat
.030
sat
.225
12
slog_stranger_2595_sink.smt2
unsat
.089
unsat
.025
unsat
1.016
13
coffee-can_sat_non_incre_equiv_bad_0_2.smt2
unknown
5.008
unsat
.051
unsat
.290
14
slog_stranger_242_sink.smt2
sat
.473
sat
.030
unsat
.226
SOUNDNESS_DISAGREEMENT
15
instance13106.smt2
unsat
.304
unsat
.048
unsat
.408
16
instance13034.smt2
sat
1.085
sat
.046
sat
.358
17
wildcard-matching-regex-16.smt2
unknown
5.008
unknown
.534
timeout
12.010
18
instance12638.smt2
bug
.007
bug
.006
unknown
.042
19
instance03744.smt2
sat
.071
sat
.028
sat
.230
20
instance07986.smt2
unsat
.189
unsat
.042
unsat
.379
21
instance12016.smt2
unsat
.313
unsat
.049
unsat
.353
22
instance05845.smt2
sat
.939
sat
.028
sat
.303
23
03_track_111.smt2
sat
.908
sat
.051
sat
.237
24
slog_stranger_197_sink.smt2
sat
.526
sat
.033
sat
.284
25
instance10079.smt2
sat
.456
sat
.056
sat
.289
26
instance06746.smt2
unsat
.037
unsat
.026
unsat
.329
27
instance02536.smt2
sat
.127
sat
.030
sat
.255
28
not-contains-1-5-6-106.smt2
unknown
5.008
bug
.249
sat
.182
29
instance09913.smt2
unknown
5.017
sat
.097
sat
.393
30
herman-linear_lstar_non_incre_equiv_bad_0_0.smt2
unknown
5.009
unsat
.031
unsat
.228
31
instance14231.smt2
sat
.041
sat
.025
sat
.441
32
instance06290.smt2
unsat
.031
unsat
.023
unsat
.289
33
query3429.smt2
sat
.551
sat
.034
sat
.233
34
instance06295.smt2
sat
.406
sat
.067
sat
.290
35
unsolved_pcp_instance_83.smt2
unknown
.373
unknown
.191
bug
.138
36
instance12099.smt2
sat
.460
sat
.047
sat
.318
37
instance10412.smt2
sat
.790
sat
.049
sat
.280
38
slog_stranger_2537_sink.smt2
unsat
.036
unsat
.022
unsat
.336
39
benchmark_0157.smt2
unknown
1.041
unknown
.199
bug
.130
40
instance13789.smt2
unsat
2.762
unsat
.047
unsat
.241
41
instance06012.smt2
unsat
.378
unsat
.043
unsat
.298
42
instance09005.smt2
unsat
.122
unsat
.039
unsat
.266
43
instance12897.smt2
unsat
4.054
unsat
.080
unsat
.270
44
unsolved_pcp_instance_486.smt2
unknown
.362
unknown
.190
bug
.199
45
slog_stranger_361_sink.smt2
unsat
.031
unsat
.022
unsat
.389
46
instance12106.smt2
unsat
.054
unsat
.030
unsat
.571
47
instance04515.smt2
unknown
5.011
sat
.040
sat
.189
48
mux-array_sat_non_incre_equiv_trans_0_0.smt2
unknown
5.009
unsat
.038
unsat
.267
49
instance14206.smt2
unsat
.167
unsat
.032
unsat
.213
50
instance05972.smt2
sat
.306
sat
.037
sat
.230
51
benchmark_0066.smt2
unknown
1.019
unknown
.203
bug
.118
52
instance14523.smt2
sat
.182
sat
.033
sat
.298
53
slog_stranger_3785_sink.smt2
unknown
5.010
unknown
5.008
sat
.601
54
wildcard-matching-regex-07.smt2
unknown
5.008
unknown
.296
bug
9.282
55
instance07009.smt2
bug
.006
bug
.006
unknown
.044
56
instance13116.smt2
unsat
.043
unsat
.028
unsat
.454
57
slog_stranger_1056_sink.smt2
unsat
.030
unsat
.021
unsat
.263
58
instance13891.smt2
unsat
.044
unsat
.023
unsat
.442
59
instance08819.smt2
sat
.161
sat
.031
sat
.401
60
instance07571.smt2
sat
.092
sat
.035
sat
.247
61
instance06147.smt2
unknown
5.013
sat
.068
sat
.283
62
instance13167.smt2
unsat
.028
unsat
.024
unsat
.439
63
instance11348.smt2
unsat
.068
unsat
.030
unsat
.218
64
pcp_instance_281.smt2
unknown
.379
unknown
.188
bug
.198
65
unsolved_pcp_instance_144.smt2
unknown
.593
unknown
.258
bug
.265
66
instance03078.smt2
unknown
5.014
sat
.055
sat
.227
67
query4617.smt2
sat
4.097
sat
.037
sat
.247
68
instance09976.smt2
sat
2.407
sat
.041
sat
.450
69
benchmark_0257.smt2
unknown
.992
unknown
.193
bug
.169
70
instance02988.smt2
unknown
5.010
sat
.042
sat
.229
71
instance14571.smt2
sat
.110
sat
.030
sat
.303
72
instance11139.smt2
unsat
.145
unsat
.036
unsat
.272
73
instance14828.smt2
unsat
.138
unsat
.033
unsat
.329
74
instance02930.smt2
sat
.442
sat
.032
sat
.187
75
instance15592.smt2
unsat
.033
unsat
.023
unsat
.345
76
instance07970.smt2
unsat
.543
unsat
.045
unsat
.371
77
instance05462.smt2
sat
.035
sat
.021
sat
.316
78
instance12446.smt2
sat
1.509
unknown
5.008
sat
.248
79
instance02768.smt2
sat
.440
sat
.033
sat
.197
80
instance11280.smt2
unsat
.143
unsat
.034
unsat
.300
81
instance12567.smt2
unsat
.046
unsat
.026
unsat
.452
82
instance04153.smt2
sat
.153
sat
.036
sat
.198
83
instance04760.smt2
sat
.259
sat
.047
sat
.393
84
slog_stranger_4772_sink.smt2
unknown
5.009
sat
.041
sat
.428
85
benchmark_0031.smt2
unknown
1.009
unknown
.193
bug
.182
86
instance13402.smt2
unsat
.359
unsat
.050
unsat
.347
87
instance10060.smt2
unknown
5.012
sat
.063
sat
.278
88
instance14550.smt2
unsat
.198
unsat
.036
unsat
.369
89
instance07237.smt2
sat
.839
sat
.053
sat
.327
90
instance13345.smt2
sat
.171
sat
.033
sat
.278
91
instance04148.smt2
unknown
5.009
sat
.056
sat
.266
92
instance05733.smt2
sat
1.686
sat
.036
sat
.207
93
04_track_178.smt2
unsat
.187
unsat
.023
unsat
.257
94
instance01783.smt2
unknown
5.009
sat
.041
sat
.244
95
slog_stranger_4677_sink.smt2
unknown
5.008
sat
.103
sat
.293
96
query7698.smt2
sat
.623
sat
.035
sat
.226
97
instance05245.smt2
sat
.097
sat
.029
sat
.183
98
instance03453.smt2
sat
.825
sat
.036
sat
.232
99
instance09794.smt2
unknown
5.010
sat
.069
sat
.367
100
instance05864.smt2
sat
.177
sat
.052
sat
.222
101
instance09657.smt2
unknown
5.009
sat
.113
sat
.423
102
instance12147.smt2
unsat
1.452
unsat
.031
unsat
.252
103
instance14222.smt2
unsat
.799
unsat
.139
unsat
.531
104
instance08777.smt2
unsat
3.149
unsat
.091
unsat
.466
105
instance04858.smt2
sat
.048
sat
.026
sat
.181
106
instance15185.smt2
sat
.837
sat
.037
sat
.225
107
instance00438.smt2
sat
.089
sat
.029
sat
.177
108
pcp_instance_49.smt2
unknown
.369
unknown
.192
bug
.137
109
instance14410.smt2
unsat
.042
unsat
.027
unsat
.353
110
instance15646.smt2
sat
1.378
sat
.111
sat
.487
111
instance09999.smt2
sat
.305
sat
.048
sat
.361
112
instance03161.smt2
sat
2.118
sat
.028
sat
.384
113
instance14857.smt2
unsat
.167
unsat
.041
unsat
.367
114
instance09106.smt2
unknown
5.008
sat
.046
sat
.246
115
slog_stranger_2211_sink.smt2
unsat
.034
unsat
.022
unsat
.318
116
wildcard-matching-regex-108.smt2
unknown
5.008
unknown
.334
bug
2.871
117
instance06371.smt2
bug
.007
bug
.006
unknown
.043
118
instance11514.smt2
unsat
.139
unsat
.033
unsat
.298
119
benchmark_0130.smt2
unknown
1.034
unknown
.203
bug
.170
120
instance11212.smt2
unknown
5.011
sat
.140
sat
.334
121
instance00872.smt2
sat
.147
sat
.033
sat
.270
122
instance00516.smt2
sat
.095
sat
.033
sat
.224
123
instance02012.smt2
sat
.046
sat
.028
sat
.201
124
instance02904.smt2
sat
.063
sat
.028
sat
.203
125
slog_stranger_69_sink.smt2
unknown
5.008
sat
.069
sat
.366
126
unsolved_pcp_instance_242.smt2
unknown
.374
unknown
.190
bug
.198
127
instance09823.smt2
unsat
.381
unsat
.056
unsat
.556
128
instance03356.smt2
sat
.216
sat
.044
sat
.600
129
instance13529.smt2
unsat
.150
unsat
.030
unsat
.330
130
instance02519.smt2
sat
.287
sat
.065
sat
.256
131
instance01425.smt2
sat
.045
sat
.022
sat
.445
132
instance01163.smt2
sat
.065
sat
.028
sat
.201
133
slog_stranger_779_sink.smt2
unsat
.039
unsat
.022
unsat
.309
134
instance09390.smt2
unsat
.180
unsat
.038
unsat
.353
135
instance09060.smt2
unsat
.450
unsat
.075
unsat
.385
136
wildcard-matching-regex-73.smt2
unknown
5.008
unknown
.658
timeout
12.010
137
slog_stranger_500_sink.smt2
bug
.006
bug
.006
unknown
.056
138
instance03271.smt2
sat
.087
sat
.030
sat
.239
139
benchmark_0124.smt2
unknown
1.034
unknown
.194
bug
.119
140
instance00824.smt2
unknown
5.009
sat
.069
sat
.285
141
instance08880.smt2
unsat
2.340
unsat
.043
unsat
.336
142
instance06663.smt2
unsat
.355
unsat
.048
unsat
.276
143
slog_stranger_2628_sink.smt2
unsat
.034
unsat
.023
unsat
.305
144
instance10185.smt2
unsat
.100
unsat
.030
unsat
.259
145
instance15418.smt2
unknown
5.009
sat
.088
sat
.393
146
instance05989.smt2
sat
1.348
sat
.027
sat
.335
147
instance06420.smt2
unsat
.299
unsat
.038
unsat
.357
148
instance15406.smt2
unsat
.040
unsat
.022
unsat
.377
149
instance05295.smt2
sat
.051
sat
.028
sat
.169
150
pcp_instance_263.smt2
unknown
.365
unknown
.186
bug
.128
151
instance02172.smt2
sat
1.147
sat
.037
sat
.367
152
not-contains-1-4-5-114.smt2
unknown
5.009
bug
.088
sat
.177
153
instance13600.smt2
sat
.113
sat
.029
sat
.218
154
instance01742.smt2
unknown
5.011
sat
.376
sat
.357
155
slog_stranger_2780_sink.smt2
sat
.668
sat
.033
sat
.243
156
instance10675.smt2
unknown
5.014
sat
.096
sat
.370
157
instance11773.smt2
unsat
.120
unsat
.036
unsat
.353
158
pcp_instance_191.smt2
unknown
.366
unknown
.189
bug
.194
159
instance02788.smt2
sat
.201
sat
.040
sat
.575
160
sub-matching-sat-41.smt2
unknown
5.010
unknown
5.008
timeout
12.010
161
slog_stranger_2973_sink.smt2
unknown
5.011
sat
3.217
sat
.746
162
instance14812.smt2
unsat
.188
unsat
.066
unsat
.302
163
instance12874.smt2
unsat
.172
unsat
.036
unsat
.343
164
slog_stranger_1520_sink.smt2
unsat
.030
unsat
.022
unsat
.265
165
03_track_159.smt2
sat
.429
sat
.062
sat
.250
166
instance06759.smt2
sat
.151
sat
.038
sat
.221
167
instance11215.smt2
sat
.041
sat
.021
sat
.313
168
benchmark_0150.smt2
unknown
1.009
unknown
.190
bug
.169
169
instance08236.smt2
sat
.172
sat
.043
sat
.357
170
instance07961.smt2
unknown
5.012
sat
.068
sat
.303
171
instance11178.smt2
unsat
.122
unsat
.034
unsat
.260
172
instance07864.smt2
unsat
.281
unsat
.059
unsat
.359
173
instance02205.smt2
unknown
5.011
sat
.051
sat
.212
174
slog_stranger_1385_sink.smt2
unsat
.218
unsat
.678
unsat
.438
175
query3568.smt2
sat
.964
sat
.067
sat
.226
176
instance00711.smt2
sat
.117
sat
.028
sat
.181
177
instance06544.smt2
unsat
.047
unsat
.024
unsat
.473
178
instance11973.smt2
unsat
.187
unsat
.052
unsat
.351
179
instance11602.smt2
sat
.430
sat
.039
sat
.275
180
01_track_10.smt2
sat
1.630
sat
.052
sat
.251
181
instance02044.smt2
sat
.136
sat
.032
sat
.227
182
instance09549.smt2
unsat
.548
unsat
.149
unsat
.479
183
instance11715.smt2
unsat
.036
unsat
.021
unsat
.336
184
instance12338.smt2
unknown
5.011
sat
.111
sat
.298
185
slog_stranger_642_sink.smt2
unsat
.028
unsat
.023
unsat
.258
186
instance00358.smt2
sat
.720
sat
.060
sat
.235
187
instance04470.smt2
unknown
5.017
sat
.061
sat
.338
188
instance11761.smt2
unsat
.050
unsat
.033
unsat
.402
189
instance07072.smt2
unsat
.179
unsat
.053
unsat
.316
190
query3336.smt2
sat
1.725
sat
.052
sat
.225
191
instance09375.smt2
unsat
.168
unsat
.047
unsat
.327
192
instance09778.smt2
unsat
.046
unsat
.028
unsat
.419
193
instance12569.smt2
sat
.478
sat
.848
sat
.311
194
instance06410.smt2
unsat
.107
unsat
.032
unsat
.369
195
instance07700.smt2
sat
.107
sat
.032
sat
.214
196
instance08970.smt2
sat
.218
sat
.040
sat
.367
197
instance07079.smt2
sat
1.239
sat
.100
sat
.355
198
instance15771.smt2
sat
.844
sat
.038
sat
.234
199
instance06068.smt2
unsat
.677
unsat
.047
unsat
.500
200
instance01077.smt2
sat
.084
sat
.028
sat
.217
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-27
Branch: c3 (HEAD
ebd35bc— Avoid doing extensions steps twice, as the bounds are recognized too late)Benchmark set: QF_S — 200 randomly selected files from
tests/QF_S.tar.zst(22 172 total available)Timeouts: seq
-T:5(5 s Z3-internal, 7 s wall); nseq-T:5(5 s Z3-internal, 12 s wall); ZIPT-t:5000(5 s, 12 s wall)Build: Debug (
CMAKE_BUILD_TYPE=Debug), Z3 4.17.0; ZIPT built fromparikhbranch (CEisenhofer/ZIPT)Summary
seqsolvernseqsolverSoundness disagreements (two solvers return conflicting sat/unsat): 1
Notable Issues
🔴 Soundness Disagreement (Critical)
slog_stranger_242_sink.smt2—seq→ sat,nseq→ sat, ZIPT → unsatBoth Z3 string solvers agree the formula is SAT, but ZIPT returns UNSAT. This is the highest-priority finding: either ZIPT has a soundness bug on this instance, or (less likely) both Z3 solvers share a common bug. The file is from the 2019-Jiang/slog benchmark family. Recommended action: manually verify the model produced by seq/nseq against the formula.
🟠 Crashes / Assertion Violations
nseq — assertion violation (2 files)
Both
not-contains-1-5-6-106.smt2andnot-contains-1-4-5-114.smt2trigger a hard assertion failure in nseq:Variable
x_16is simultaneously constrained to three ground string constants of different lengths. The seq rewriter detects the length clash at themk_eq_corelevel (fileseq_rewriter.cpp:5217) during the very first propagation wave, requiring no search at all — the contradiction is structurally visible without any case splitting.The nseq solver (3.1× slower at 0.678 s) must first internalize these equations into its Nielsen graph, then run at least one round of simplification and Parikh-constraint generation before the length inconsistency becomes apparent to the SAT solver. The Parikh-based length propagation is more general than seq's immediate ground-term clash detection, but incurs a fixed overhead even on trivially inconsistent instances. For benchmarks where ground-string equalities immediately reveal a contradiction by length or content, seq's rewriter has a structural advantage over nseq's graph-based pipeline.
Per-File Results
Click to expand full per-file table (200 rows)
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions