Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
90cd20d
Add generation of parallel benchmarks and test runs.
Jul 6, 2025
5452dd7
Do not use walltime killer anymore.
Jul 6, 2025
bca1f08
Use --continue-after-unknown.
Jul 6, 2025
6258316
Better status reporting.
Jul 6, 2025
c635d8e
Show contacts only of competitive solvers.
Jul 6, 2025
680a97a
Generate more meaningful YML names.
Jul 6, 2025
ce5b4e7
Update removed benchmarks.
Jul 6, 2025
e12e8a1
Remove benchmarks also for non-AWS tracks.
Jul 6, 2025
0402477
Revert "Do not use walltime killer anymore."
Jul 6, 2025
8800f20
Parse the single query/incremental results without scramble_id.
Jul 7, 2025
b7c46ac
bitwuzla: Change options for parallel track. (#201)
mpreiner Jul 15, 2025
4fddcda
submission: add additional configurations for lower CPU core counts. …
zmylinxi99 Jul 15, 2025
dde119e
Remove unused AWS infrastructure.
Jul 7, 2025
2c57e14
fix: Use LazyFrame during selection of incremental benchmarks.
Jul 16, 2025
f95717c
feat: Do not declare required machines for the parallel track.
Jul 18, 2025
76076e2
feat: Memory limit for parallel track.
Jul 18, 2025
f75b731
Change names of reduced-core parallel solvers.
Jul 18, 2025
96256c6
fix: Correctly recognize out of memory for incremental track.
Jul 23, 2025
8297d7f
feat: Adapt parsing of Incremental results to new YML names.
Jul 23, 2025
9ebce05
fix: Add nb_answers to all results.
Jul 23, 2025
24a718f
feat: Update year.
Jul 23, 2025
f4beace
feat: Better structure of generated result directories.
Jul 23, 2025
f4fa309
present derived solvers' increment over base solver
wintered Jul 30, 2025
9d64aee
cosmetics
wintered Jul 30, 2025
8f1c054
interim
wintered Jul 30, 2025
b338b90
minor
wintered Jul 30, 2025
8b1ec77
fixing CI issues
wintered Jul 30, 2025
509927e
SMTS submission 2025 update (bug fix + 2 more versions) (#203)
Tomaqa Jul 20, 2025
04d8bff
feat: Do not fail on missing runs, only report errors.
Jul 28, 2025
66c5d56
feat: Adapt unsat core and model result parsing to new YML names.
Jul 28, 2025
fc197d5
fix: Unify columns for all tracks to get consistent data set.
Jul 28, 2025
d79ee2b
fix: Drop benchmark_yml only during result processing.
Jul 28, 2025
3640e5d
fix: Initialize unsat core column.
Jul 28, 2025
af6462e
fix: Move previous year.
Jul 28, 2025
5b7bfd4
feat: Allow flushing cache in benchexec conversion.
Jul 31, 2025
811a16b
feat: Model validation with new YML names.
Jul 31, 2025
b0f0ab3
feat: Add 2025 benchmarks
Jul 31, 2025
090e76e
feat: Add preliminary results of all tracks except UnsatCore
Jul 31, 2025
a0a3b52
fix: Result processing.
Jul 31, 2025
1b48b5d
fix: Recongize lexing-error error from Dolmen.
Jul 31, 2025
6f098c1
fix: Progress computation.
Jul 31, 2025
c28a84b
fix: Apply black.
Jul 31, 2025
faf235c
feat: Change unsat core validation resource limits.
Jul 31, 2025
3abd97e
feat: New unsat core validation rules.
Jul 31, 2025
68c0bd6
Merge remote-tracking branch 'origin/2025_final_execution' into prese…
wintered Aug 2, 2025
68cc10a
establishing consistency among submissions with multiple cores
wintered Aug 2, 2025
ba23708
feat: Add preliminary results of UnsatCore track
Aug 2, 2025
12c9977
feat: Adapt unsat core validation to new YML format.
Aug 3, 2025
e1e67d1
fix: Remove join as there are no removed results.
Aug 3, 2025
eb0a5d3
chore: Apply black.
Aug 3, 2025
a17c198
feat: Remove non-competing divisions from results.
Aug 3, 2025
62cf124
fix: Remove debug prints.
Aug 3, 2025
4dd7dfd
feat: Generate results of Parallel track.
Aug 3, 2025
5aef59f
feat: Add BestOverall ranking to certificates.
Aug 3, 2025
424c571
feat: Generate solver data for certificates automatically.
Aug 3, 2025
643a6e7
chore: Update year in certificates.
Aug 3, 2025
7468375
2024 -> 2025
wintered Aug 4, 2025
f4efacf
2024 -> 2025
wintered Aug 4, 2025
7ef410f
target for cleaning up web results
wintered Aug 4, 2025
8bc62ae
Merge remote-tracking branch 'origin/2025_final_execution' into prese…
wintered Aug 4, 2025
71e5365
feat: show legends for base and non-competing only when at least one
wintered Aug 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ build: clean-build ## Build wheel file using poetry
clean-build: ## clean build artifacts
@rm -rf dist

.PHONY: clean-web-results
clean-web-results: ## clean web results
@rm -rf web/content/results

.PHONY: help
help:
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}'
Expand Down Expand Up @@ -76,8 +80,8 @@ results-generation:
@poetry run smtcomp export-results-pages data Incremental
# @echo "🚀 Generating results to web/content/results for Cloud"
# @poetry run smtcomp export-results-pages data Cloud
# @echo "🚀 Generating results to web/content/results for Parallel"
# @poetry run smtcomp export-results-pages data Parallel
@echo "🚀 Generating results to web/content/results for Parallel"
@poetry run smtcomp export-results-pages data Parallel

cache:
@echo "🚀 Generating cache"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ The step from Single Query can be followed with the model validation results whe
Tasks for unsat core validation can be generated by:

```
smtcomp generate-unsatcore-validation-files ../tmp/execution SCRAMBLER_EXECUTABLE UNSAT_CORE_RESULT_DIR/*/*
smtcomp generate-unsatcore-validation-files ../tmp/execution SCRAMBLER_EXECUTABLE UNSAT_CORE_RESULT_DIR
```

---
Expand Down
Binary file added data/benchmarks-2025.json.gz
Binary file not shown.
2 changes: 1 addition & 1 deletion data/latex-certificates/gen_certificates_ornament.tex
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@
\node [circular drop shadow={shadow scale=1.05},
decorate, decoration=zigzag,
fill=blue!20,draw,thick,circle,text width=3.5cm,align=center,xshift=2cm]
{\large\textbf{SMT-COMP\\ 2024}};
{\large\textbf{SMT-COMP\\ 2025}};
\end{tikzpicture}
\end{minipage}

Expand Down
20 changes: 0 additions & 20 deletions data/latex-certificates/solvers_pretty_name.csv

This file was deleted.

Binary file added data/results-inc-2025.json.gz
Binary file not shown.
Binary file added data/results-mv-2025.json.gz
Binary file not shown.
Binary file added data/results-parallel-2025.json.gz
Binary file not shown.
Binary file added data/results-sq-2025.json.gz
Binary file not shown.
Binary file added data/results-uc-2025.json.gz
Binary file not shown.
50 changes: 35 additions & 15 deletions smtcomp/benchexec.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ def get_suffix(track: defs.Track) -> str:
return "_unsatcorevalidation"
case defs.Track.SingleQuery:
return ""
case _:
raise ValueError("No Cloud or Parallel")
case defs.Track.Parallel:
return "_parallel"
case defs.Track.Cloud:
return "_cloud"


def get_xml_name(s: defs.Submission, track: defs.Track, division: defs.Division) -> str:
Expand All @@ -44,8 +46,9 @@ class CmdTask(BaseModel):
taskdirs: List[str]


def generate_benchmark_yml(benchmark: Path, expected_result: Optional[bool], orig_file: Optional[Path]) -> None:
ymlfile = benchmark.with_suffix(".yml")
def generate_benchmark_yml(
ymlfile: Path, benchmark: Path, expected_result: Optional[bool], orig_file: Optional[Path]
) -> None:
with ymlfile.open("w") as f:
f.write("format_version: '2.0'\n\n")

Expand Down Expand Up @@ -103,24 +106,38 @@ def generate_tool_modules(s: defs.Submission, cachedir: Path) -> None:
generate_tool_module(s, cachedir, False)


def generate_xml(config: defs.Config, cmdtasks: List[CmdTask], file: Path, tool_module_name: str) -> None:
def generate_xml(
config: defs.Config, cmdtasks: List[CmdTask], file: Path, tool_module_name: str, track: defs.Track, test: bool
) -> None:
doc, tag, text = Doc().tagtext()

doc.asis('<?xml version="1.0"?>')
doc.asis(
'<!DOCTYPE benchmark PUBLIC "+//IDN sosy-lab.org//DTD BenchExec benchmark 2.3//EN"'
' "https://www.sosy-lab.org/benchexec/benchmark-2.2.3dtd">'
)

timelimit = config.timelimit_s_test if test else config.timelimit_s
cpuCores = config.cpuCores_parallel if track == defs.Track.Parallel else config.cpuCores
memlimit = config.memlimit_M_parallel if track == defs.Track.Parallel else config.memlimit_M

if test:
cpuCores = config.cpuCores_parallel_test if track == defs.Track.Parallel else config.cpuCores_test
memlimit = config.memlimit_M_parallel_test if track == defs.Track.Parallel else config.memlimit_M_test
with tag(
"benchmark",
tool=f"tools.{tool_module_name}",
timelimit=f"{config.timelimit_s * config.cpuCores}s",
walltimelimit=f"{config.timelimit_s}s",
memlimit=f"{config.memlimit_M} MB",
cpuCores=f"{config.cpuCores}",
timelimit=f"{timelimit * cpuCores}s",
walltimelimit=f"{timelimit}s",
memlimit=f"{memlimit} MB",
cpuCores=f"{cpuCores}",
):
with tag("require", cpuModel="Intel Xeon E3-1230 v5 @ 3.40 GHz"):
text()

if track != defs.Track.Parallel:
# we run the test jobs on different machines (main machines are used)
used_cpuModel = "Intel Core i7" if test else "Intel Xeon E3-1230 v5 @ 3.40 GHz"
with tag("require", cpuModel=used_cpuModel):
text()

with tag("resultfiles"):
text("**/error.log")
Expand Down Expand Up @@ -222,7 +239,7 @@ def cmdtask_for_submission(
return res


def generate(s: defs.Submission, cachedir: Path, config: defs.Config) -> None:
def generate(s: defs.Submission, cachedir: Path, config: defs.Config, test: bool) -> None:
generate_tool_modules(s, cachedir)

dst = cachedir / "benchmarks"
Expand All @@ -241,7 +258,6 @@ def generate(s: defs.Submission, cachedir: Path, config: defs.Config) -> None:
# cloud and parallel tracks are not executed via benchexec
if target_track in (
defs.Track.Cloud,
defs.Track.Parallel,
defs.Track.UnsatCoreValidation,
defs.Track.ProofExhibition,
):
Expand All @@ -258,6 +274,8 @@ def generate(s: defs.Submission, cachedir: Path, config: defs.Config) -> None:
cmdtasks=res,
file=file,
tool_module_name=tool_module_name(s, target_track == defs.Track.Incremental),
track=target_track,
test=test,
)
generated_divisions.append(division)

Expand All @@ -273,9 +291,11 @@ def generate(s: defs.Submission, cachedir: Path, config: defs.Config) -> None:
out("set -x")
out(f"for DIVISION in {division_list}")
out(" do\n")
out(f' TARGET="../final_results{track_suffix}/$DIVISION/{tool}"')
out(f' TARGET="../results/results{track_suffix}/$DIVISION/{tool}"')
out(" rm -rf $TARGET")
out(" mkdir -p $TARGET")

extra_args = ""
out(
f" PYTHONPATH=$(pwd) benchexec/contrib/vcloud-benchmark.py run_definitions/{tool}{track_suffix}_$DIVISION.xml --read-only-dir / --overlay-dir . --overlay-dir /home --vcloudClientHeap 500 --vcloudPriority URGENT --cgroupAccess -o $TARGET"
)
Expand Down Expand Up @@ -319,6 +339,6 @@ def generate_unsatcore_validation(s: defs.Submission, cachedir: Path, config: de
out(" rm -rf $TARGET")
out(" mkdir -p $TARGET")
out(
f" PYTHONPATH=$(pwd) benchexec/contrib/vcloud-benchmark.py run_definitions/{tool}_unsatcorevalidation_$DIVISION.xml --read-only-dir / --overlay-dir . --overlay-dir /home --vcloudClientHeap 500 --vcloudPriority URGENT --cgroupAccess --tryLessMemory -o $TARGET"
f" PYTHONPATH=$(pwd) benchexec/contrib/vcloud-benchmark.py run_definitions/{tool}_unsatcorevalidation_$DIVISION.xml --read-only-dir / --overlay-dir . --overlay-dir /home --vcloudClientHeap 500 --vcloudPriority URGENT --cgroupAccess -o $TARGET"
)
out("done")
26 changes: 16 additions & 10 deletions smtcomp/certificates.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import smtcomp.defs as defs
import smtcomp.results as results
import smtcomp.generate_website_page as page
import smtcomp.submission as submission

show_experimental = False

Expand Down Expand Up @@ -70,11 +71,13 @@ def withtrack(l: list[str], name: str, category: category) -> None:
class overall:

def __init__(self) -> None:
self.best = category()
self.biggest = category()
self.largest = category()

def latex(self) -> str:
l: list[str] = []
withtrack(l, "Best Overall", self.best)
withtrack(l, "Biggest Lead", self.biggest)
withtrack(l, "Largest Contribution", self.largest)
return ", ".join(l)
Expand All @@ -83,7 +86,7 @@ def __str__(self) -> str:
return self.latex()

def isNotEmpty(self) -> bool:
return self.biggest.isNotEmpty() and self.largest.isNotEmpty()
return self.best.isNotEmpty() or self.biggest.isNotEmpty() or self.largest.isNotEmpty()


class info:
Expand Down Expand Up @@ -132,7 +135,7 @@ def __repr__(self) -> str:
def update(
solvers: defaultdict[str, info],
select: Callable[[info, str], None],
podium: page.PodiumDivision | page.PodiumBiggestLead | page.PodiumLargestContribution,
podium: page.PodiumDivision | page.PodiumBestOverall | page.PodiumBiggestLead | page.PodiumLargestContribution,
) -> None:
if podium.track == defs.Track.SingleQuery:
select(solvers[podium.winner_seq], "sq_seq")
Expand Down Expand Up @@ -173,12 +176,10 @@ def add_logic(logics: dict[Tuple[str, defs.Track], bool], list: dict[str, int],
logics[v, track] = True


def parse_pretty_names(solvers: defaultdict[str, info], pretty_names: Path) -> None:
with open(pretty_names, newline="") as input:
input = csv.DictReader(input) # type: ignore

for row in input:
solvers[row["Solver Name"]].members = int(row["Members"]) # type: ignore
def process_submissions(solvers: defaultdict[str, info], submissions: list[defs.Submission]) -> None:
for s in submissions:
if s.competitive:
solvers[s.name].members = len(s.contributors)


def parse_experimental_division(solvers: Any, experimental_division: Path) -> dict[str, bool]:
Expand All @@ -192,11 +193,14 @@ def parse_experimental_division(solvers: Any, experimental_division: Path) -> di


def generate_certificates(
website_results: Path, input_for_certificates: Path, pretty_names: Path, experimental_division: Path
website_results: Path, input_for_certificates: Path, submission_dir: Path, experimental_division: Path
) -> None:
solvers: defaultdict[str, info] = defaultdict(info)

parse_pretty_names(solvers, pretty_names)
submissions = [submission.read_submission_or_exit(f)
for f in submission_dir.glob("*.json")]

process_submissions(solvers, submissions)
solvers["-"].members = 0

# Remove experimental division
Expand All @@ -223,6 +227,8 @@ def generate_certificates(
continue
case page.PodiumCrossDivision():
match result.root:
case page.PodiumBestOverall():
update(solvers, (lambda x, k: x.overall.best.update(k, True)), result.root)
case page.PodiumBiggestLead():
update(solvers, (lambda x, k: x.overall.biggest.update(k, True)), result.root)
case page.PodiumLargestContribution():
Expand Down
53 changes: 36 additions & 17 deletions smtcomp/defs.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,17 @@
U = TypeVar("U")


baseMapSMTLIB2025 = {
"Bitwuzla-MachBV": "Bitwuzla-MachBV-base",
"Z3-Inc-Z3++": "Z3-Inc-Z3++-base",
"Z3-Noodler-Mocha": "Z3-Noodler-Mocha-base",
"Z3-Owl": "Z3-Owl-base",
"Z3-Noodler": "Z3-Noodler",
"z3siri": "z3siri-base",
"Z3-alpha": "Z3-alpha-base",
}


class EnumAutoInt(Enum):
"""
Normal enum with strings, but each enum is associated to an int
Expand Down Expand Up @@ -135,7 +146,7 @@ def name_is_default_field(cls, data: Any) -> Any:

class SolverType(EnumAutoInt):
wrapped = "wrapped"
derived = "derived"
derived = "derived" # TODO: put a datatype information on base solver
standalone = "Standalone"
portfolio = "Portfolio"

Expand Down Expand Up @@ -1317,6 +1328,7 @@ class Submission(BaseModel, extra="forbid"):
website: HttpUrl
system_description: HttpUrl
solver_type: SolverType
# TODO add field base_solver?
participations: Participations
seed: int | None = None
competitive: bool = True
Expand All @@ -1325,6 +1337,7 @@ class Submission(BaseModel, extra="forbid"):
description="Must be set for the final version of the submission. An archive on zenodo is needed in this case.",
)

# TODO: model validator to check the sanity of the new base_solver field
@model_validator(mode="after")
def check_archive(self) -> Submission:
if self.archive is None and not all(p.archive for p in self.participations.root):
Expand Down Expand Up @@ -1446,13 +1459,20 @@ class Results(BaseModel):
## Parameters that can change each year
class Config:
__next_id__: ClassVar[int] = 0
current_year = 2024
current_year = 2025
oldest_previous_results = 2018
timelimit_s = 60 * 20
timelimit_s_test = 60 # time limit for test runs
memlimit_M = 1024 * 30
memlimit_M_test = 1024 * 8 # memory limit for test runs
memlimit_M_parallel = 1024 * 1000
memlimit_M_parallel_test = 1024 * 8
cpuCores = 4
unsatcore_validation_timelimit_s = 60 * 5
unsatcore_validation_memlimit_M = 1024 * 30
cpuCores_test = 2
cpuCores_parallel = 128
cpuCores_parallel_test = 8
unsatcore_validation_timelimit_s = 60 * 20
unsatcore_validation_memlimit_M = 1024 * 15
unsatcore_validation_cpuCores = 4
min_used_benchmarks = 300
ratio_of_used_benchmarks = 0.5
Expand Down Expand Up @@ -1494,22 +1514,21 @@ class Config:

removed_benchmarks = [
{
"logic": int(Logic.QF_LIA),
"family": "20210219-Dartagnan/ConcurrencySafety-Main",
"name": "39_rand_lock_p0_vs-O0.smt2",
} # scrambler segfault (perhaps stack limit)
"logic": int(Logic.UFDTNIA),
"family": "20241211-verus/verismo",
"name": "tspec__math__nonlinearverismo_tspec.math.nonlinear.proof_mul_pos_neg_rel._01.smt2",
}, # reported by Mathias Preiner as syntactically invalid
{
"logic": int(Logic.UFDTNIA),
"family": "20241211-verus/verismo",
"name": "tspec__math__nonlinearverismo_tspec.math.nonlinear.proof_div_pos_neg_rel._01.smt2",
}, # reported by Mathias Preiner as syntactically invalid
]
"""
Benchmarks to remove before selection (currently just for aws)
Benchmarks to remove before selection
"""

removed_results = [
{
"logic": int(Logic.QF_BV),
"family": "20230221-oisc-gurtner",
"name": "SLL-NESTED-8-32-sp-not-excluded.smt2",
} # wrong status in SMTLIB
]
removed_results = []
"""
Benchmarks to remove after running the solvers. Can be used when the selection has already been done.
"""
Expand Down Expand Up @@ -1537,7 +1556,7 @@ def data(self) -> Path:

@functools.cached_property
def previous_years(self) -> list[int]:
return list(range(self.oldest_previous_results, self.current_year))
return list(range(self.oldest_previous_results, self.current_year - 1))

@functools.cached_property
def previous_results(self) -> list[tuple[int, Path]]:
Expand Down
Loading
Loading