You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2026-03-20 Branch: c3 Benchmark set: QF_S (50 randomly selected files from tests/QF_S.tar.zst; 22 172 total available) Timeout: 10 seconds per benchmark (-T:5 + outer 7 s for seq with tracing; -T:10 / outer 12 s for nseq; -t:10000 ms for ZIPT) Z3 build: Debug, CMake+Ninja, commit 88ef8c7 (v4.17.0) ZIPT build: parikh branch, Release, .NET 8, linked against freshly built Microsoft.Z3.dll (netstandard2.1)
Summary
Metric
seq solver
nseq solver
ZIPT solver
sat
21
25
21
unsat
12
16
15
unknown
17
8
5
timeout
0
0
3
bug/crash
0
1
6
Total time (s)
80.264
32.297
47.536
Avg time/benchmark (s)
1.605
0.646
0.951
Soundness disagreements (any two solvers return conflicting sat/unsat): 0
nseq is the fastest solver on this sample (avg 0.646 s), solves 41/50 instances definitively (vs. 33/50 for seq and 36/50 for ZIPT). seq returns unknown on 17 instances, of which 8 are definitively answered by nseq (and 7 also by ZIPT).
Notable Issues
Soundness Disagreements (Critical)
✅ None found. All three solvers agreed on every instance where at least two of them produced a definitive answer.
Crashes / Bugs
nseq — 1 assertion violation
File
Error
pcp_instance_251.smt2
ASSERTION VIOLATION in src/ast/rewriter/seq_axioms.cpp:1108: NOT IMPLEMENTED YET! — triggered by str.replace_all
ZIPT — 6 unsupported-feature crashes
All six ZIPT crashes share the same root cause: ZIPT does not implement str.replace_all.
File
ZIPT output
pcp_instance_160.smt2
Unsupported feature: str.replace_all currently not supported
pcp_instance_125.smt2
same
pcp_instance_188.smt2
same
pcp_instance_251.smt2
same
benchmark_0089.smt2
same
benchmark_0153.smt2
same
Note: seq and nseq return unknown (not a crash) for the pcp/rna files — they do parse str.replace_all but cannot decide the instances within the timeout.
Slow Benchmarks (> 8 s)
Three diseq-* benchmarks caused nseq and ZIPT to hit their full timeouts; seq returned unknown quickly (≈5 s internal limit):
File
seq
nseq
ZIPT
diseq-1-3-6-100.smt2
unknown 5.010 s
unknown 10.009 s
timeout 12.018 s
diseq-1-3-5-106.smt2
unknown 5.009 s
unknown 10.009 s
timeout 12.018 s
diseq-1-5-6-100.smt2
unknown 5.009 s
unknown 10.009 s
timeout 12.018 s
These are disequality-heavy benchmarks; none of the three solvers could decide them within the allotted time.
seq Regressions vs. nseq: Instances Where Only seq Fails
Eight instances were solved definitively by nseq (and in 7 cases also by ZIPT) but seq timed out at 5 seconds:
File
seq
nseq
ZIPT
Burns_sat_non_incre_equiv_trans_28_0.smt2
unknown 5.010 s
unsat 0.054 s
unsat 0.293 s
eqdist_sat_non_incre_equiv_init_0_3.smt2
unknown 5.011 s
unsat 0.037 s
unsat 0.272 s
two_token_pass_sat_non_incre_equiv_init_0_7.smt2
unknown 5.010 s
unsat 0.037 s
unsat 0.263 s
instance13779.smt2
unknown 5.011 s
unsat 0.115 s
unsat 0.349 s
instance02438.smt2
unknown 5.018 s
sat 0.058 s
sat 0.322 s
slog_stranger_3748_sink.smt2
unknown 5.010 s
sat 0.210 s
sat 0.353 s
instance07678.smt2
unknown 5.017 s
sat 0.204 s
sat 0.326 s
instance09958.smt2
unknown 5.012 s
sat 0.069 s
unknown 0.052 s
Trace Analysis: seq-fast / nseq-slow Hypotheses
No seq-fast / nseq-slow cases were observed in this run. In every instance, nseq was equal to or faster than seq. The trend was strongly in the opposite direction: nseq outperformed seq on 8 instances that seq could not decide at all.
However, the trace data for the seq-slow / nseq-fast cases reveals a clear pattern worth noting for development purposes:
Burns / eqdist / hornstr-equiv class (seq times out, nseq solves in < 0.1 s)
The seq solver's .z3-trace for Burns_sat_non_incre_equiv_trans_28_0.smt2 (94 375 trace lines) and eqdist_sat_non_incre_equiv_init_0_3.smt2 (165 032 trace lines) show the same pathological pattern:
The dominant trace event is [seq] mk_eq_core (9 120 calls for Burns, 3 151 for eqdist), all of the form X == reg1, Y == reg1, X == varin, Y == varout — the same small set of variable-equality assertions repeated thousands of times.
This is interleaved with 6 276 / 8 656 [seq] assign_eh calls that repeatedly re-trigger propagate_in_re for the same membership predicates.
The [seq] solve_ne / [seq] reduce_ne pattern (108 occurrences each for Burns) indicates the disequality-handling path is iterated many times without closure.
The seq solver appears to be caught in a fixpoint loop: it expands membership predicates into automaton-accept form, generates new character-level axioms (e.g., (seq.unit Char[49]) through Char[99]), and re-simplifies equations without making net progress.
Hypothesis: These benchmarks involve str.in_re membership over Kleene-star languages combined with string equations and disequalities. The seq solver's tactic of converting membership to automaton transitions generates an ever-growing set of character-split axioms, but the disequality constraints prevent early pruning — seq has no Parikh-constraint or length-bound shortcut for such instances. The nseq solver, by contrast, likely applies Nielsen-graph reductions or Parikh-based length reasoning that immediately derives a contradiction (for unsat cases) or a valid assignment (for sat cases) without enumerating character-level witnesses. This is consistent with nseq's dramatically lower call counts for these problem types.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from
tests/QF_S.tar.zst; 22 172 total available)Timeout: 10 seconds per benchmark (
-T:5+ outer 7 s for seq with tracing;-T:10/ outer 12 s for nseq;-t:10000ms for ZIPT)Z3 build: Debug, CMake+Ninja, commit
88ef8c7(v4.17.0)ZIPT build:
parikhbranch, Release, .NET 8, linked against freshly builtMicrosoft.Z3.dll(netstandard2.1)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 0
nseq is the fastest solver on this sample (avg 0.646 s), solves 41/50 instances definitively (vs. 33/50 for seq and 36/50 for ZIPT). seq returns
unknownon 17 instances, of which 8 are definitively answered by nseq (and 7 also by ZIPT).Notable Issues
Soundness Disagreements (Critical)
✅ None found. All three solvers agreed on every instance where at least two of them produced a definitive answer.
Crashes / Bugs
nseq — 1 assertion violation
pcp_instance_251.smt2ASSERTION VIOLATIONinsrc/ast/rewriter/seq_axioms.cpp:1108:NOT IMPLEMENTED YET!— triggered bystr.replace_allZIPT — 6 unsupported-feature crashes
All six ZIPT crashes share the same root cause: ZIPT does not implement
str.replace_all.pcp_instance_160.smt2Unsupported feature: str.replace_all currently not supportedpcp_instance_125.smt2pcp_instance_188.smt2pcp_instance_251.smt2benchmark_0089.smt2benchmark_0153.smt2Note: seq and nseq return
unknown(not a crash) for the pcp/rna files — they do parsestr.replace_allbut cannot decide the instances within the timeout.Slow Benchmarks (> 8 s)
Three
diseq-*benchmarks caused nseq and ZIPT to hit their full timeouts; seq returnedunknownquickly (≈5 s internal limit):diseq-1-3-6-100.smt2diseq-1-3-5-106.smt2diseq-1-5-6-100.smt2These are disequality-heavy benchmarks; none of the three solvers could decide them within the allotted time.
seq Regressions vs. nseq: Instances Where Only seq Fails
Eight instances were solved definitively by nseq (and in 7 cases also by ZIPT) but seq timed out at 5 seconds:
Burns_sat_non_incre_equiv_trans_28_0.smt2eqdist_sat_non_incre_equiv_init_0_3.smt2two_token_pass_sat_non_incre_equiv_init_0_7.smt2instance13779.smt2instance02438.smt2slog_stranger_3748_sink.smt2instance07678.smt2instance09958.smt2Trace Analysis: seq-fast / nseq-slow Hypotheses
No seq-fast / nseq-slow cases were observed in this run. In every instance, nseq was equal to or faster than seq. The trend was strongly in the opposite direction: nseq outperformed seq on 8 instances that seq could not decide at all.
However, the trace data for the seq-slow / nseq-fast cases reveals a clear pattern worth noting for development purposes:
Burns / eqdist / hornstr-equiv class (seq times out, nseq solves in < 0.1 s)
The seq solver's
.z3-traceforBurns_sat_non_incre_equiv_trans_28_0.smt2(94 375 trace lines) andeqdist_sat_non_incre_equiv_init_0_3.smt2(165 032 trace lines) show the same pathological pattern:[seq] mk_eq_core(9 120 calls for Burns, 3 151 for eqdist), all of the formX == reg1,Y == reg1,X == varin,Y == varout— the same small set of variable-equality assertions repeated thousands of times.[seq] assign_ehcalls that repeatedly re-triggerpropagate_in_refor the same membership predicates.[seq] solve_ne/[seq] reduce_nepattern (108 occurrences each for Burns) indicates the disequality-handling path is iterated many times without closure.(seq.unit Char[49])throughChar[99]), and re-simplifies equations without making net progress.Hypothesis: These benchmarks involve
str.in_remembership over Kleene-star languages combined with string equations and disequalities. The seq solver's tactic of converting membership to automaton transitions generates an ever-growing set of character-split axioms, but the disequality constraints prevent early pruning — seq has no Parikh-constraint or length-bound shortcut for such instances. The nseq solver, by contrast, likely applies Nielsen-graph reductions or Parikh-based length reasoning that immediately derives a contradiction (forunsatcases) or a valid assignment (forsatcases) without enumerating character-level witnesses. This is consistent with nseq's dramatically lower call counts for these problem types.Per-File Results
Click to expand the full 50-row results table
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions