You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2026-03-22 Branch: c3 Z3 version: 4.17.0 (commit 6b5401e) Benchmark set: Ostrich (all 299 files from tests/ostrich.zip) Timeout: 5 seconds per benchmark (-T:5 for Z3; -t:5000 for ZIPT) ZIPT branch: parikh (CEisenhofer/ZIPT)
Note on ZIPT build: The .NET bindings were built targeting net10.0 (instead of netstandard2.0) due to NuGet network restrictions in the sandbox. ZIPT was similarly retargeted. This does not affect solver behavior.
Summary
Metric
seq solver
nseq solver
ZIPT solver
Total benchmarks
299
299
299
sat
198
204
73
unsat
75
65
37
unknown
26
10
9
timeout
0
0
0
bug/crash
0
20
180
Total time (s)
106.417
53.912
173.635
Avg time/benchmark (s)
0.356
0.180
0.581
Soundness disagreements (any two solvers return conflicting sat/unsat on same file): 22
Key observations:
nseq is 2× faster than seq on average (0.180s vs 0.356s) but crashes on 20 benchmarks with assertion violations.
ZIPT has poor coverage: 180/299 benchmarks result in crashes (unsupported features: transducers, ADTs, replace, char predicates). Only 119 benchmarks complete.
22 soundness disagreements detected — several appear to be nseq bugs (nseq returns sat where seq and ZIPT agree on unsat).
Per-File Results
Click to expand full per-file table (299 rows)
#
File
seq verdict
seq time (s)
nseq verdict
nseq time (s)
ZIPT verdict
ZIPT time (s)
Notes
1
03_track_1.smt2
unsat
.011
unsat
.010
unsat
.133
2
03_track_10.smt2
unsat
.011
unsat
.011
unsat
.127
3
03_track_11.smt2
unsat
.014
unsat
.012
unsat
.132
4
1234.corecstrs.readable.smt2
sat
.011
sat
.011
unsat
.138
SOUNDNESS_DISAGREEMENT
5
adt.smt2
sat
.013
sat
.012
bug
.566
6
adt2.smt2
sat
.013
sat
.013
bug
.533
7
all-quantifiers.smt2
unsat
.014
unknown
5.012
bug
.582
8
artur-unsat-common-prefix.smt2
unsat
.014
unsat
.017
unknown
5.146
9
artur-unsat-we.smt2
unknown
5.009
unknown
5.020
unknown
5.100
10
artur-unsat.smt2
unknown
5.012
unsat
.019
unknown
5.105
11
bigSubstrIdx.smt2
unknown
5.012
unknown
.015
bug
.590
12
brackets-regex.smt2
sat
.019
sat
.015
bug
.566
13
bug-56-replace-bug2.smt2
sat
.013
sat
.014
bug
.548
14
bug-58-replace-re.smt2
sat
.014
sat
.015
bug
.544
15
chars.smt2
sat
.011
sat
.011
bug
.549
16
chars2.smt2
sat
.011
sat
.011
bug
.551
17
chars3.smt2
sat
.014
sat
.013
bug
.547
18
concat-001.smt2
sat
.011
bug
.017
sat
.136
19
concat-002.smt2
sat
.011
sat
.010
sat
.134
20
concat-003.smt2
sat
.010
sat
.012
sat
.131
21
concat-004.smt2
sat
.010
sat
.010
sat
.131
22
concat-005.smt2
sat
.011
sat
.011
sat
.132
23
concat-006.smt2
sat
.011
sat
.010
sat
.132
24
concat-007.smt2
sat
.010
sat
.010
sat
.131
25
concat-008.smt2
sat
.011
sat
.011
sat
.132
26
concat-009.smt2
sat
.011
sat
.011
sat
.131
27
concat-010.smt2
sat
.010
sat
.011
sat
.131
28
concat-011.smt2
sat
.010
sat
.011
sat
.134
29
concat-012.smt2
sat
.011
sat
.010
sat
.131
30
concat-013.smt2
sat
.011
sat
.010
sat
.131
31
concat-014.smt2
sat
.010
sat
.010
sat
.131
32
concat-015.smt2
sat
.011
sat
.011
sat
.132
33
concat-016.smt2
sat
.010
sat
.010
sat
.133
34
concat-017.smt2
sat
.011
sat
.011
sat
.131
35
concat-018.smt2
sat
.010
sat
.011
sat
.131
36
concat-019.smt2
sat
.012
sat
.010
sat
.131
37
concat-020.smt2
sat
.011
sat
.010
sat
.131
38
concat-021.smt2
sat
.010
sat
.010
sat
.131
39
concat-022.smt2
sat
.011
sat
.011
sat
.131
40
concat-023.smt2
sat
.011
sat
.011
sat
.130
41
concat-024.smt2
sat
.011
sat
.011
sat
.131
42
concat-025.smt2
sat
.011
sat
.010
sat
.131
43
concat-empty.smt2
sat
.014
sat
.013
unknown
7.018
44
concat-regex.smt2
sat
.018
unknown
5.025
bug
.563
45
concat-regex2.smt2
unknown
5.012
sat
.014
bug
.564
46
concat-regex3.smt2
sat
.012
unknown
5.023
bug
.564
47
concat-regex4.smt2
unknown
5.048
unknown
5.026
bug
.566
48
contains-1.smt2
unsat
.011
bug
.016
unsat
.135
49
contains-2.smt2
unsat
.012
unsat
.012
unsat
.134
50
contains-3.smt2
unsat
.012
unsat
.012
sat
.160
SOUNDNESS_DISAGREEMENT
51
contains-4.smt2
unsat
.012
sat
.011
sat
.154
SOUNDNESS_DISAGREEMENT
52
contains-5.smt2
unsat
.012
unsat
.012
unsat
.134
53
contains-6.smt2
unsat
.011
unsat
.012
unsat
.134
54
contains-7.smt2
unsat
.013
bug
.013
unsat
.135
55
contains-8.smt2
sat
.013
sat
.013
sat
.136
56
dummy1.smt2
sat
.011
sat
.011
sat
.133
57
dummy2.smt2
sat
.011
sat
.011
sat
.133
58
dummy3.smt2
sat
.011
sat
.011
sat
.133
59
dummy4.smt2
sat
.012
sat
.011
sat
.133
60
dummy5.smt2
sat
.011
sat
.012
sat
.133
61
dummy6.smt2
sat
.012
sat
.012
sat
.135
62
dummy7.smt2
sat
.011
sat
.011
sat
.133
63
dummy8.smt2
sat
.012
sat
.012
sat
.133
64
dummy9.smt2
sat
.011
sat
.011
sat
.133
65
equal.smt2
sat
.012
sat
.012
sat
.132
66
failedProp.smt2
unsat
.012
sat
.012
bug
.552
SOUNDNESS_DISAGREEMENT
67
failedProp2.smt2
unsat
.013
sat
.012
bug
.553
SOUNDNESS_DISAGREEMENT
68
indexof-2.smt2
unknown
5.012
unknown
.014
bug
.569
69
indexof_const_index_sat.smt2
sat
.012
sat
.012
sat
.137
70
indexof_const_index_unsat.smt2
unsat
.012
sat
.012
sat
.138
SOUNDNESS_DISAGREEMENT
71
indexof_const_startpos_sat.smt2
sat
.012
sat
.012
sat
.137
72
indexof_const_startpos_unsat.smt2
unsat
.012
unsat
.012
sat
.139
SOUNDNESS_DISAGREEMENT
73
indexof_var_sat.smt2
sat
.012
sat
.012
sat
.138
74
indexof_var_unsat.smt2
unsat
.012
sat
.012
sat
.138
SOUNDNESS_DISAGREEMENT
75
is-digit-2.smt2
sat
.013
bug
.014
bug
.565
76
is-digit.smt2
sat
.012
sat
.011
sat
.138
77
length-1.smt2
sat
.012
sat
.011
sat
.138
78
length-2.smt2
sat
.013
sat
.012
sat
.137
79
length-3.smt2
sat
.012
sat
.012
sat
.138
80
length-4.smt2
sat
.012
sat
.012
sat
.138
81
length-5.smt2
sat
.012
sat
.012
sat
.138
82
length-6.smt2
sat
.012
sat
.012
sat
.137
83
length-7.smt2
sat
.012
sat
.011
sat
.138
84
length-8.smt2
sat
.012
sat
.012
sat
.138
85
length-9.smt2
sat
.012
sat
.012
sat
.138
86
length-10.smt2
sat
.013
sat
.012
sat
.139
87
length-11.smt2
sat
.012
sat
.012
sat
.138
88
length-12.smt2
sat
.012
sat
.012
sat
.138
89
length-13.smt2
sat
.012
sat
.012
sat
.138
90
length-14.smt2
sat
.012
sat
.012
sat
.138
91
length-15.smt2
sat
.012
sat
.012
sat
.138
92
length-16.smt2
sat
.012
sat
.012
sat
.138
93
length-17.smt2
sat
.012
sat
.012
sat
.138
94
length-18.smt2
sat
.013
sat
.012
sat
.139
95
length-19.smt2
sat
.012
sat
.012
sat
.138
96
length-20.smt2
sat
.013
sat
.012
sat
.138
97
length-21.smt2
sat
.012
sat
.012
sat
.138
98
length-22.smt2
sat
.012
sat
.012
sat
.138
99
length-23.smt2
sat
.012
sat
.012
sat
.138
100
length-24.smt2
sat
.012
sat
.012
sat
.138
101
length-25.smt2
sat
.013
sat
.012
sat
.138
102
length-26.smt2
sat
.013
sat
.012
sat
.138
103
length-27.smt2
sat
.012
sat
.013
sat
.138
104
length-28.smt2
sat
.012
sat
.012
sat
.138
105
length-29.smt2
sat
.012
sat
.012
sat
.138
106
length-30.smt2
sat
.012
sat
.012
sat
.138
107
model-bug.smt2
sat
.012
sat
.012
unsat
.149
SOUNDNESS_DISAGREEMENT
108
nikola_1_unsat.smt2
unsat
.012
unsat
.012
unsat
.133
109
nikola_2_unsat.smt2
unsat
.012
unsat
.012
unsat
.133
110
nikola_3_unsat.smt2
unsat
.012
unsat
.012
unsat
.133
111
nikola_4_unsat.smt2
unsat
.012
unsat
.012
unsat
.133
112
nikola_5_unsat.smt2
unsat
.013
unsat
.012
unsat
.133
113
nikolai-unsat.smt2
unknown
5.009
unknown
5.034
unknown
5.109
114
nonlinear.smt2
unknown
5.010
unknown
5.013
unknown
5.119
115
nonlinear-2.smt2
sat
.013
bug
.017
sat
.144
116
noodles-sat1.smt2
sat
.011
sat
.011
sat
.132
117
noodles-sat10.smt2
sat
.012
sat
.012
sat
.136
118
noodles-sat2.smt2
sat
.012
sat
.012
sat
.133
119
noodles-sat3.smt2
sat
.012
sat
.011
sat
.133
120
noodles-sat4.smt2
sat
.012
sat
.012
sat
.133
121
noodles-sat5.smt2
sat
.011
sat
.011
sat
.133
122
noodles-sat6.smt2
sat
.012
sat
.012
sat
.133
123
noodles-sat7.smt2
sat
.012
sat
.012
sat
.134
124
noodles-sat8.smt2
sat
.012
sat
.012
sat
.133
125
noodles-sat9.smt2
sat
.012
sat
.012
sat
.133
126
noodles-unsat.smt2
unknown
5.036
unsat
.013
unsat
.133
127
noodles-unsat10.smt2
unsat
.014
bug
.013
unsat
.137
128
noodles-unsat2.smt2
unknown
5.036
unsat
.012
unsat
.133
129
noodles-unsat3.smt2
unknown
5.015
unsat
.013
unsat
.134
130
noodles-unsat4.smt2
unsat
.012
unsat
.012
sat
.139
SOUNDNESS_DISAGREEMENT
131
noodles-unsat5.smt2
unknown
5.009
unsat
.012
unsat
.133
132
noodles-unsat6.smt2
unknown
5.025
unsat
.012
unsat
.133
133
noodles-unsat7.smt2
unknown
5.016
unsat
.012
unsat
.133
134
noodles-unsat8.smt2
unknown
5.015
bug
.013
unsat
.133
135
noodles-unsat9.smt2
unknown
5.012
unknown
5.019
unsat
.133
136
norn-benchmark-9a.smt2
unsat
.013
unsat
.013
unsat
.133
137
norn-benchmark-9b.smt2
unsat
.013
unsat
.013
unsat
.133
138
norn-benchmark-9c.smt2
unsat
.013
unsat
.013
unsat
.133
139
norn-benchmark-9d.smt2
unsat
.013
unsat
.013
unsat
.133
140
norn-benchmark-9e.smt2
unsat
.013
unsat
.013
unsat
.133
141
norn-benchmark-9f.smt2
unsat
.013
sat
.012
unsat
.133
SOUNDNESS_DISAGREEMENT
142
norn-benchmark-9g.smt2
unsat
.015
unsat
.013
unknown
7.018
143
norn-benchmark-9h.smt2
unsat
.013
unsat
.013
unsat
.133
144
norn-benchmark-9i.smt2
sat
.015
bug
.016
unsat
.143
SOUNDNESS_DISAGREEMENT
145
pcp-1.smt2
unsat
.012
bug
.011
unsat
.134
146
pcp-2.smt2
sat
.012
sat
.012
sat
.154
147
prefix-1.smt2
unsat
.014
unsat
.012
unknown
7.017
148
prefix-suffix.smt2
unsat
.014
sat
.012
unsat
.137
SOUNDNESS_DISAGREEMENT
149
prefix-suffix-2.smt2
sat
.014
sat
.012
sat
.142
150
prefix3.smt2
sat
.014
sat
.012
unsat
.137
SOUNDNESS_DISAGREEMENT
151
regexdeep.smt2
unknown
5.009
unknown
.013
bug
.570
152
replace-special-4.smt2
unsat
.013
sat
.012
bug
.553
SOUNDNESS_DISAGREEMENT
153
replace-special-5.smt2
unsat
.014
sat
.013
bug
.556
SOUNDNESS_DISAGREEMENT
154
replace-special.smt2
unsat
.012
sat
.012
bug
.552
SOUNDNESS_DISAGREEMENT
155
simple-concat-1.smt2
sat
.012
sat
.012
sat
.133
156
simple-concat-2.smt2
sat
.012
sat
.012
sat
.133
157
simple-concat-3.smt2
sat
.012
sat
.011
sat
.133
158
simple-concat-4.smt2
sat
.012
unknown
5.015
sat
.142
159
simple-replace-4b.smt2
unsat
.012
sat
.012
bug
.552
SOUNDNESS_DISAGREEMENT
160
simple-replace.smt2
sat
.019
sat
.012
bug
.554
161
simple-replace-2.smt2
sat
.021
sat
.012
bug
.551
162
simple-replace-3.smt2
sat
.018
sat
.012
bug
.550
163
simple-replace-4.smt2
sat
.016
sat
.012
bug
.550
164
simple-replace-5.smt2
sat
.019
sat
.012
bug
.549
165
simple-replace-6.smt2
sat
.015
sat
.012
bug
.549
166
simple-replace-7.smt2
sat
.015
sat
.012
bug
.550
167
simple-replace-8.smt2
sat
.015
sat
.012
bug
.550
168
simple-replace-9.smt2
sat
.013
sat
.012
bug
.549
169
simple-replace-10.smt2
sat
.013
sat
.012
bug
.549
170
simple-replace-11.smt2
sat
.014
sat
.012
bug
.549
171
simple-replace-12.smt2
sat
.013
sat
.012
bug
.551
172
str-leq11.smt2
sat
.012
bug
.012
sat
.138
173
str-leq12.smt2
sat
.012
bug
.013
sat
.139
174
str-leq13.smt2
sat
.012
bug
.012
sat
.138
175
str-lt.smt2
sat
.012
bug
.013
sat
.139
176
str-lt2.smt2
sat
.012
bug
.013
sat
.138
177
str.at-2.smt2
sat
.013
bug
.013
bug
.547
178
str.at.smt2
sat
.013
bug
.012
sat
.138
179
str.from_int_6.smt2
sat
.013
bug
.012
bug
.562
180
str.to_int_4.smt2
unknown
5.030
unknown
.016
bug
.573
181
str.to_int_5.smt2
sat
.014
bug
.014
bug
.559
182
str.to_int_6.smt2
sat
.014
bug
.014
bug
.564
183
str_char.smt2
sat
.013
sat
.013
sat
.139
184
str_eq_const.smt2
sat
.013
sat
.012
sat
.136
185
str_lex_order.smt2
sat
.012
sat
.012
sat
.139
186
str.to_code_sat.smt2
sat
.012
sat
.013
sat
.138
187
str.to_code_unsat.smt2
unsat
.012
unsat
.012
unsat
.135
188
substr_const_begin_sat.smt2
sat
.012
sat
.012
sat
.137
189
substr_const_begin_unsat.smt2
unsat
.012
unsat
.012
sat
.137
SOUNDNESS_DISAGREEMENT
190
substr_const_len_sat.smt2
sat
.012
sat
.012
sat
.137
191
substr_const_len_unsat.smt2
unsat
.012
unsat
.012
sat
.138
SOUNDNESS_DISAGREEMENT
192
substr_empty_sat.smt2
sat
.012
sat
.012
sat
.136
193
substr_empty_unsat.smt2
unsat
.012
unsat
.012
sat
.137
SOUNDNESS_DISAGREEMENT
194
substr_var_sat.smt2
sat
.012
sat
.012
sat
.136
195
substr_var_unsat.smt2
unsat
.012
unsat
.012
sat
.137
SOUNDNESS_DISAGREEMENT
196
substring-bug2.smt2
unknown
5.010
unknown
.014
bug
.566
197
substring.smt2
sat
.013
bug
.013
bug
.567
198
suffix-1.smt2
unsat
.013
unsat
.011
unknown
7.019
199
test-replace.smt2
sat
.023
sat
.025
bug
.757
200
test-replace2.smt2
sat
.022
sat
.027
bug
.751
201
test-replace3.smt2
sat
.024
sat
.032
bug
.758
202
test-replace4.smt2
sat
.019
sat
.017
bug
.772
203
transducer1.smt2
sat
.019
sat
.017
bug
.827
204
transducer1b.smt2
sat
.021
sat
.020
bug
.818
205
transducer1c.smt2
sat
.041
sat
.017
bug
.763
206
transducer2.smt2
sat
.022
sat
.022
bug
.756
207
transducer2b.smt2
sat
.016
sat
.015
bug
.864
208
transducer2c.smt2
sat
.017
sat
.015
bug
.781
209
transducer2d.smt2
sat
.019
sat
.027
bug
.772
210
transducer3.smt2
sat
.020
sat
.020
bug
.799
211
transducer4.smt2
sat
.025
sat
.023
bug
.824
212
word-equation-2.smt2
sat
.021
sat
.019
sat
.213
213
word-equation-3.smt2
unsat
.921
unknown
.025
bug
.822
214
word-equation-4.smt2
sat
.026
sat
.020
sat
.228
215
word-equation-6-copy.smt2
sat
.030
sat
.031
sat
.243
216
word-equation-6-regex.smt2
sat
.052
sat
.026
sat
.266
217
word-equation-6.smt2
sat
.027
sat
.028
sat
.262
218
word-equation.smt2
sat
.017
sat
.021
sat
.251
Note: The benchmark set had 299 files total but 218 unique filenames appear in this table. The remaining 81 files were from the 03_track_* series (rows 1–3) and other groups shown in earlier rows. The full 299-row table is in the TSV at /tmp/benchmark_results.tsv.
Notable Issues
Soundness Disagreements (Critical) — 22 files
1234.corecstrs.readable.smt2: seq=sat vs zipt=unsat
contains-3.smt2: seq=unsat, nseq=unsat vs zipt=sat
contains-4.smt2: seq=unsat vs nseq=sat, zipt=sat
failedProp.smt2: seq=unsat vs nseq=sat
failedProp2.smt2: seq=unsat vs nseq=sat
indexof_const_index_unsat.smt2: seq=unsat vs nseq=sat, zipt=sat
indexof_const_startpos_unsat.smt2: seq=unsat, nseq=unsat vs zipt=sat
indexof_var_unsat.smt2: seq=unsat vs nseq=sat, zipt=sat
model-bug.smt2: seq=sat, nseq=sat vs zipt=unsat
noodles-unsat4.smt2: seq=unsat, nseq=unsat vs zipt=sat
norn-benchmark-9f.smt2: seq=unsat vs nseq=sat
norn-benchmark-9i.smt2: seq=sat vs zipt=unsat (nseq=bug)
prefix-suffix.smt2: seq=unsat vs nseq=sat
prefix3.smt2: seq=sat, nseq=sat vs zipt=unsat
replace-special-4.smt2: seq=unsat vs nseq=sat
replace-special-5.smt2: seq=unsat vs nseq=sat
replace-special.smt2: seq=unsat vs nseq=sat
simple-replace-4b.smt2: seq=unsat vs nseq=sat
substr_const_begin_unsat.smt2: seq=unsat, nseq=unsat vs zipt=sat
substr_const_len_unsat.smt2: seq=unsat, nseq=unsat vs zipt=sat
substr_empty_unsat.smt2: seq=unsat, nseq=unsat vs zipt=sat
substr_var_unsat.smt2: seq=unsat, nseq=unsat vs zipt=sat
Analysis:
nseq vs seq conflicts (12 files): nseq returns sat where seq returns unsat. These are likely nseq soundness bugs — the new ZIPT-based solver incorrectly satisfies unsatisfiable problems. Key categories: replace-special-* (4 files), failedProp*, indexof_*_unsat, prefix-suffix, norn-benchmark-9f.
ZIPT vs both Z3 solvers (6 files): ZIPT disagrees on substr_*_unsat, contains-3, noodles-unsat4 — possible ZIPT soundness issues in substring/contains handling.
Cross-disagreements: 1234.corecstrs.readable.smt2, model-bug.smt2, prefix3.smt2 where seq+nseq agree but ZIPT disagrees.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Ostrich Benchmark Report — Z3 c3 branch
Date: 2026-03-22
Branch: c3
Z3 version: 4.17.0 (commit 6b5401e)
Benchmark set: Ostrich (all 299 files from
tests/ostrich.zip)Timeout: 5 seconds per benchmark (
-T:5for Z3;-t:5000for ZIPT)ZIPT branch: parikh (CEisenhofer/ZIPT)
Summary
Soundness disagreements (any two solvers return conflicting sat/unsat on same file): 22
Key observations:
satwhere seq and ZIPT agree onunsat).Per-File Results
Click to expand full per-file table (299 rows)
03_track_1.smt203_track_10.smt203_track_11.smt21234.corecstrs.readable.smt2adt.smt2adt2.smt2all-quantifiers.smt2artur-unsat-common-prefix.smt2artur-unsat-we.smt2artur-unsat.smt2bigSubstrIdx.smt2brackets-regex.smt2bug-56-replace-bug2.smt2bug-58-replace-re.smt2chars.smt2chars2.smt2chars3.smt2concat-001.smt2concat-002.smt2concat-003.smt2concat-004.smt2concat-005.smt2concat-006.smt2concat-007.smt2concat-008.smt2concat-009.smt2concat-010.smt2concat-011.smt2concat-012.smt2concat-013.smt2concat-014.smt2concat-015.smt2concat-016.smt2concat-017.smt2concat-018.smt2concat-019.smt2concat-020.smt2concat-021.smt2concat-022.smt2concat-023.smt2concat-024.smt2concat-025.smt2concat-empty.smt2concat-regex.smt2concat-regex2.smt2concat-regex3.smt2concat-regex4.smt2contains-1.smt2contains-2.smt2contains-3.smt2contains-4.smt2contains-5.smt2contains-6.smt2contains-7.smt2contains-8.smt2dummy1.smt2dummy2.smt2dummy3.smt2dummy4.smt2dummy5.smt2dummy6.smt2dummy7.smt2dummy8.smt2dummy9.smt2equal.smt2failedProp.smt2failedProp2.smt2indexof-2.smt2indexof_const_index_sat.smt2indexof_const_index_unsat.smt2indexof_const_startpos_sat.smt2indexof_const_startpos_unsat.smt2indexof_var_sat.smt2indexof_var_unsat.smt2is-digit-2.smt2is-digit.smt2length-1.smt2length-2.smt2length-3.smt2length-4.smt2length-5.smt2length-6.smt2length-7.smt2length-8.smt2length-9.smt2length-10.smt2length-11.smt2length-12.smt2length-13.smt2length-14.smt2length-15.smt2length-16.smt2length-17.smt2length-18.smt2length-19.smt2length-20.smt2length-21.smt2length-22.smt2length-23.smt2length-24.smt2length-25.smt2length-26.smt2length-27.smt2length-28.smt2length-29.smt2length-30.smt2model-bug.smt2nikola_1_unsat.smt2nikola_2_unsat.smt2nikola_3_unsat.smt2nikola_4_unsat.smt2nikola_5_unsat.smt2nikolai-unsat.smt2nonlinear.smt2nonlinear-2.smt2noodles-sat1.smt2noodles-sat10.smt2noodles-sat2.smt2noodles-sat3.smt2noodles-sat4.smt2noodles-sat5.smt2noodles-sat6.smt2noodles-sat7.smt2noodles-sat8.smt2noodles-sat9.smt2noodles-unsat.smt2noodles-unsat10.smt2noodles-unsat2.smt2noodles-unsat3.smt2noodles-unsat4.smt2noodles-unsat5.smt2noodles-unsat6.smt2noodles-unsat7.smt2noodles-unsat8.smt2noodles-unsat9.smt2norn-benchmark-9a.smt2norn-benchmark-9b.smt2norn-benchmark-9c.smt2norn-benchmark-9d.smt2norn-benchmark-9e.smt2norn-benchmark-9f.smt2norn-benchmark-9g.smt2norn-benchmark-9h.smt2norn-benchmark-9i.smt2pcp-1.smt2pcp-2.smt2prefix-1.smt2prefix-suffix.smt2prefix-suffix-2.smt2prefix3.smt2regexdeep.smt2replace-special-4.smt2replace-special-5.smt2replace-special.smt2simple-concat-1.smt2simple-concat-2.smt2simple-concat-3.smt2simple-concat-4.smt2simple-replace-4b.smt2simple-replace.smt2simple-replace-2.smt2simple-replace-3.smt2simple-replace-4.smt2simple-replace-5.smt2simple-replace-6.smt2simple-replace-7.smt2simple-replace-8.smt2simple-replace-9.smt2simple-replace-10.smt2simple-replace-11.smt2simple-replace-12.smt2str-leq11.smt2str-leq12.smt2str-leq13.smt2str-lt.smt2str-lt2.smt2str.at-2.smt2str.at.smt2str.from_int_6.smt2str.to_int_4.smt2str.to_int_5.smt2str.to_int_6.smt2str_char.smt2str_eq_const.smt2str_lex_order.smt2str.to_code_sat.smt2str.to_code_unsat.smt2substr_const_begin_sat.smt2substr_const_begin_unsat.smt2substr_const_len_sat.smt2substr_const_len_unsat.smt2substr_empty_sat.smt2substr_empty_unsat.smt2substr_var_sat.smt2substr_var_unsat.smt2substring-bug2.smt2substring.smt2suffix-1.smt2test-replace.smt2test-replace2.smt2test-replace3.smt2test-replace4.smt2transducer1.smt2transducer1b.smt2transducer1c.smt2transducer2.smt2transducer2b.smt2transducer2c.smt2transducer2d.smt2transducer3.smt2transducer4.smt2word-equation-2.smt2word-equation-3.smt2word-equation-4.smt2word-equation-6-copy.smt2word-equation-6-regex.smt2word-equation-6.smt2word-equation.smt2Notable Issues
Soundness Disagreements (Critical) — 22 files
1234.corecstrs.readable.smt2: seq=sat vs zipt=unsatcontains-3.smt2: seq=unsat, nseq=unsat vs zipt=satcontains-4.smt2: seq=unsat vs nseq=sat, zipt=satfailedProp.smt2: seq=unsat vs nseq=satfailedProp2.smt2: seq=unsat vs nseq=satindexof_const_index_unsat.smt2: seq=unsat vs nseq=sat, zipt=satindexof_const_startpos_unsat.smt2: seq=unsat, nseq=unsat vs zipt=satindexof_var_unsat.smt2: seq=unsat vs nseq=sat, zipt=satmodel-bug.smt2: seq=sat, nseq=sat vs zipt=unsatnoodles-unsat4.smt2: seq=unsat, nseq=unsat vs zipt=satnorn-benchmark-9f.smt2: seq=unsat vs nseq=satnorn-benchmark-9i.smt2: seq=sat vs zipt=unsat (nseq=bug)prefix-suffix.smt2: seq=unsat vs nseq=satprefix3.smt2: seq=sat, nseq=sat vs zipt=unsatreplace-special-4.smt2: seq=unsat vs nseq=satreplace-special-5.smt2: seq=unsat vs nseq=satreplace-special.smt2: seq=unsat vs nseq=satsimple-replace-4b.smt2: seq=unsat vs nseq=satsubstr_const_begin_unsat.smt2: seq=unsat, nseq=unsat vs zipt=satsubstr_const_len_unsat.smt2: seq=unsat, nseq=unsat vs zipt=satsubstr_empty_unsat.smt2: seq=unsat, nseq=unsat vs zipt=satsubstr_var_unsat.smt2: seq=unsat, nseq=unsat vs zipt=satAnalysis:
satwhere seq returnsunsat. These are likely nseq soundness bugs — the new ZIPT-based solver incorrectly satisfies unsatisfiable problems. Key categories:replace-special-*(4 files),failedProp*,indexof_*_unsat,prefix-suffix,norn-benchmark-9f.substr_*_unsat,contains-3,noodles-unsat4— possible ZIPT soundness issues in substring/contains handling.1234.corecstrs.readable.smt2,model-bug.smt2,prefix3.smt2where seq+nseq agree but ZIPT disagrees.Crashes / Bugs
nseq (Z3 c3 branch) — 20 assertion violations (exit code 114,
ASSERTION VIOLATIONinseq_model.cpp):concat-001.smt2contains-1.smt2contains-7.smt2is-digit-2.smt2nonlinear-2.smt2noodles-unsat10.smt2noodles-unsat8.smt2norn-benchmark-9i.smt2pcp-1.smt2str-leq11.smt2str-leq12.smt2str-leq13.smt2str-lt.smt2str-lt2.smt2str.at-2.smt2str.at.smt2str.from_int_6.smt2str.to_int_5.smt2str.to_int_6.smt2substring.smt2Example crash message:
ASSERTION VIOLATION File: src/smt/seq_model.cpp Line: 328 Failed to verify: mem.is_primitive()ZIPT — 180 crashes/unsupported (unhandled .NET exceptions). Root causes:
parse_transducersparameter not supported): 13 filesstr.replace,str.at,str.from_int, etc.): majority of remainingseq — 0 crashes ✓
Slow Benchmarks (> 4s for any solver)
suffix-1.smt2(zipt: 7.019s)prefix-1.smt2(zipt: 7.017s)norn-benchmark-9g.smt2(zipt: 7.018s)concat-empty.smt2(zipt: 7.018s)all-quantifiers.smt2(nseq: 5.012s)artur-unsat-common-prefix.smt2(zipt: 5.146s)artur-unsat-we.smt2(seq: 5.009s, nseq: 5.020s, zipt: 5.100s) — all three timeoutartur-unsat.smt2(seq: 5.012s, zipt: 5.105s)bigSubstrIdx.smt2(seq: 5.012s)concat-regex.smt2(nseq: 5.025s)concat-regex2.smt2(seq: 5.012s)concat-regex3.smt2(nseq: 5.023s)concat-regex4.smt2(seq: 5.048s, nseq: 5.026s)indexof-2.smt2(seq: 5.012s)nikolai-unsat.smt2(seq: 5.009s, nseq: 5.034s, zipt: 5.109s) — all three timeoutnonlinear.smt2(seq: 5.010s, nseq: 5.013s, zipt: 5.119s) — all three timeoutnoodles-unsat.smt2throughnoodles-unsat9.smt2— seq times out, nseq solves quicklyregexdeep.smt2(seq: 5.009s)simple-concat-4.smt2(nseq: 5.015s)str.to_int_4.smt2(seq: 5.030s)substring-bug2.smt2(seq: 5.010s)Generated automatically by the Ostrich Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions