STALAGMITE is a technique to mine input grammars from recursive descent parsers. In contrast to existing techniques, STALAGMITE does not require sample inputs. Instead, STALAGMITE utilizes symbolic execution to analyze parsers. Input grammars have various applications, including fuzzing, debugging, documentation and reverse engineering.
This repository contains our STALAGMITE prototype, which is based on the KLEE symbolic execution engine.
STALAGMITE is detailed and evaluated in our TOSEM research paper.
We provide a Dockerfile to reproduce our results.
If you just want to have a look at the grammars STALAGMITE mined from our evaluation subjects, see paper_evaluation_data/.
To build and run the docker container, we recommend the following minimum system specifications:
- x86 CPU (We used AMD Ryzen Threadripper 3960X 24-Core Processor)
- Linux (We used Ubuntu 22.04)
- At least 16GB RAM per parallel experiment (We used 256GB RAM)
| File | Description |
|---|---|
| ./eval/eval.py | Evaluation script |
| ./klee.patch | KLEE changes |
| ./config.py | Central definition of parameters |
| ./system_level_grammar/traces_to_grammars.py | Execution traces to grammar conversion |
| ./system_level_grammar/generalize_tokens.py | Token generalization |
| ./system_level_grammar/reduce_overapproximation.py | Overapproximation reduction |
| ./subjects/ | Evaluation subjects |
The experiments can be run as follows:
make docker-build
make docker-run subject=tinyc
make docker-run subject=lisp
make docker-run subject=mjs
make docker-run subject=json
make docker-run subject=cjson
make docker-run subject=parson
make docker-run subject=calc
make docker-run subject=simplearithmeticparser
make docker-run subject=cgi_decodeResults will be copied to ./output_docker.
For example,
./output_docker/subjects/cgi_decode/1/eval/will containaccuracy.csvandreadability.csv./output_docker/subjects/cgi_decode/1/grammars/will contain the mined grammars (initial and refined).
By default, all experiments are configured with a 16GB memory and 24h time limit.
To use different limits, e.g., a time limit of 4h and a memory limit of 8GB, create an environment file config.env:
MAX_TIME="240min"
MAX_MEMORY=8000Now run an experiment with this config:
make docker-run-env subject=calc envfile=config.env