STALAGMITE: Inferring Input Grammars from Code with Symbolic Parsing

STALAGMITE is a technique to mine input grammars from recursive descent parsers. In contrast to existing techniques, STALAGMITE does not require sample inputs. Instead, STALAGMITE utilizes symbolic execution to analyze parsers. Input grammars have various applications, including fuzzing, debugging, documentation and reverse engineering.

Prototype

This repository contains our STALAGMITE prototype, which is based on the KLEE symbolic execution engine.

Research paper

STALAGMITE is detailed and evaluated in our TOSEM research paper.

Reproducibility package

We provide a Dockerfile to reproduce our results.

A quick peek at the grammar artifacts

If you just want to have a look at the grammars STALAGMITE mined from our evaluation subjects, see paper_evaluation_data/.

Prerequisites

To build and run the docker container, we recommend the following minimum system specifications:

x86 CPU (We used AMD Ryzen Threadripper 3960X 24-Core Processor)
Linux (We used Ubuntu 22.04)
At least 16GB RAM per parallel experiment (We used 256GB RAM)

Important files

File	Description
./eval/eval.py	Evaluation script
./klee.patch	KLEE changes
./config.py	Central definition of parameters
./system_level_grammar/traces_to_grammars.py	Execution traces to grammar conversion
./system_level_grammar/generalize_tokens.py	Token generalization
./system_level_grammar/reduce_overapproximation.py	Overapproximation reduction
./subjects/	Evaluation subjects

Running the experiments

The experiments can be run as follows:

make docker-build

make docker-run subject=tinyc
make docker-run subject=lisp
make docker-run subject=mjs
make docker-run subject=json
make docker-run subject=cjson
make docker-run subject=parson
make docker-run subject=calc
make docker-run subject=simplearithmeticparser
make docker-run subject=cgi_decode

Results will be copied to ./output_docker. For example,

./output_docker/subjects/cgi_decode/1/eval/ will contain accuracy.csv and readability.csv
./output_docker/subjects/cgi_decode/1/grammars/ will contain the mined grammars (initial and refined).

Changing the limits

By default, all experiments are configured with a 16GB memory and 24h time limit. To use different limits, e.g., a time limit of 4h and a memory limit of 8GB, create an environment file config.env:

MAX_TIME="240min"
MAX_MEMORY=8000

Now run an experiment with this config:

make docker-run-env subject=calc envfile=config.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STALAGMITE: Inferring Input Grammars from Code with Symbolic Parsing

Prototype

Research paper

Reproducibility package

A quick peek at the grammar artifacts

Prerequisites

Important files

Running the experiments

Changing the limits

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
eval		eval
output_docker		output_docker
paper_evaluation_data		paper_evaluation_data
subjects		subjects
system_level_grammar		system_level_grammar
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
common.py		common.py
config.py		config.py
docker_entrypoint.sh		docker_entrypoint.sh
klee.patch		klee.patch
update_alternatives_llvm.sh		update_alternatives_llvm.sh

License

leonbett/stalagmite

Folders and files

Latest commit

History

Repository files navigation

STALAGMITE: Inferring Input Grammars from Code with Symbolic Parsing

Prototype

Research paper

Reproducibility package

A quick peek at the grammar artifacts

Prerequisites

Important files

Running the experiments

Changing the limits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages