Skip to content

Conversation

@jerhard
Copy link
Member

@jerhard jerhard commented Aug 11, 2022

This PR imports benchmarks that were taken from https://gitlab.com/sosy-lab/software/regression-verification-tasks/-/tree/new-svcomp-spec. Further information on these benchmarks can be found here.

The benchmarks were processed with incremental-ldv.py to generate the patch files and create the benchmark sets in index/sets/ldv/. The file index/sets/ldv/combined.yaml contains all benchmarks that are imported; the other yaml files in index/sets/ldv/ contain the benchmarks per directory.

Problems

There are some significant problems with this benchmark set. In particular:

  • Most of the benchmarks have syntax errors that cause Goblint to be unable to parse them.
  • Those benchmarks that Goblint runs on have very little live code. I checked ten benchmarks and the reported number of live code on these varies between 20 and 230 lines of code. @michael-schwarz and I inspected one benchmark manually and came to the conclusion that indeed very little code is live.
  • Due to the little live code, run times on many of the working benchmarks are within ~3 seconds
  • The patches tend to be large, and the patches are patches of CIL-files. While there are some patches where Goblint only detects 1-2 functions to be changed, a considerable amount of the patches that Goblint can run on results in >50 functions to be changed.

@sim642 sim642 marked this pull request as draft August 11, 2022 11:27
@sim642 sim642 added the new benchmark New benchmark to analyze label Aug 13, 2022
@sim642
Copy link
Member

sim642 commented Aug 13, 2022

  • Most of the benchmarks have syntax errors that cause Goblint to be unable to parse them.

Would be useful to have a CIL issue about this parsing error.

@michael-schwarz
Copy link
Member

Iirc they were actually errors (the type of the implementation not matching the one given in the prototype in the same file) etc that GCC also flags. If we want to do something here, I don't think we should patch CIL to accept invalid programs, but rather fix the benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark New benchmark to analyze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants