Skip to content

Commit 51f60b9

Browse files
Improve documentation (#381)
This commit expands the project documentation to cover: * New AFL++ integration support * grammarinator-decode functionality in the C++ backend * Overview of key features * libFuzzer CLI parameter reference * Updated WeightedModel documentation * Full list and descriptions of mutation/recombination operators ("creators") * CLI options for controlling operator selection (--allowlist, --blocklist) and memoization behavior (--memo-size, --unique-attempts)
1 parent a2c4528 commit 51f60b9

File tree

8 files changed

+337
-22
lines changed

8 files changed

+337
-22
lines changed

README.rst

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,55 @@ grammar-based approach is to leverage the large variety of publicly
2222
available `ANTLR v4 grammars`_. It includes both a Python-based and a
2323
high-performance C++ backend for generation.
2424

25-
The `trophy page`_ of the found issues is available from the wiki.
26-
2725
.. _ANTLR: http://www.antlr.org
2826
.. _`ANTLR v4 grammars`: https://github.com/antlr/grammars-v4
2927
.. _`trophy page`: https://github.com/renatahodovan/grammarinator/wiki
3028

29+
+--------------------------------------------------------------------------+
30+
| **TL;DR - KEY FEATURES** |
31+
+--------------------------------------------------------------------------+
32+
| *Quick overview of the most important capabilities* |
33+
+==========================================================================+
34+
| |
35+
| * **Generate** test cases from scratch based on `ANTLR v4 grammars`_ or |
36+
| **mutate/recombine** existing test cases after they have been parsed. |
37+
| |
38+
| * Beside blackbox test generation, supports guided fuzzing through |
39+
| native integration with `libFuzzer`_ and `AFL++`_. |
40+
| |
41+
| * The AFL++ integration also enables **grammar-aware test case |
42+
| minimization** via the ``afl-tmin`` utility. |
43+
| |
44+
| * **Grammar-aware mutation and recombination** without slowing down the |
45+
| fuzzing with parsing (using pre-parsed input seeds). |
46+
| |
47+
| * Fine-grained **probabilistic generation control** via inline grammar |
48+
| weights or external JSON-based weight configurations (for alternatives |
49+
| and quantifiers). |
50+
| |
51+
| * Support for inline **semantic predicates** in grammars to dynamically |
52+
| enable or disable grammar alternatives during generation. |
53+
| |
54+
| * Multiple **size-control strategies**, including maximum recursion depth|
55+
| and maximum token count limits. |
56+
| |
57+
| * Built-in **caching** to filter out duplicate generated inputs. |
58+
| |
59+
| * Both **grammar-aware and grammar-unaware mutators**, with selective |
60+
| enablement and disabling support. |
61+
| |
62+
| * Extensible **serialization** pipeline with custom serializers for |
63+
| formatting tree-based outputs into concrete test inputs. |
64+
| |
65+
| * Advanced customization hooks: |
66+
| |
67+
| * **custom models** for programmatic decision guidance |
68+
| * **custom listeners** for information collection during generation |
69+
| * **custom transformers** for post-generation tree transformations |
70+
+--------------------------------------------------------------------------+
71+
72+
.. _libFuzzer: https://llvm.org/docs/LibFuzzer.html
73+
.. _AFL++: https://aflplus.plus
3174

3275
Requirements
3376
============

docs/guide/aflpp_integration.rst

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
.. _aflpp integration:
2+
3+
=================
4+
AFL++ Integration
5+
=================
6+
7+
The C++ backend of *Grammarinator* provides seamless integration with
8+
AFL++ via its custom mutator interface. This allows *Grammarinator*
9+
to be used not only as a blackbox test case generator, but also as an
10+
**in-process input synthesizer**, where its internal derivation trees are
11+
evolved and mutated during fuzzing runs. The mutator operates on serialized
12+
``.grt*`` trees and performs grammar-aware transformations based on the compiled
13+
ANTLR grammar.
14+
15+
Overview
16+
--------
17+
18+
The integration uses the AFL++ custom mutator API `custom mutator hooks`_.
19+
AFL++ loads a shared library implementing these hooks and delegates mutation
20+
and related workflow operations to Grammarinator.
21+
22+
This enables grammar-aware, structure-preserving mutation and recombination
23+
of test cases at runtime -- improving coverage and syntactic correctness
24+
compared to purely byte-level fuzzing.
25+
26+
.. _`custom mutator hooks`: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/custom_mutators.md
27+
28+
Building the AFL++-Compatible Mutator
29+
-------------------------------------
30+
31+
To enable this integration in a real AFL++ fuzzing setup, a specialized
32+
shared library must be generated from the C++ generator class produced by
33+
:ref:`grammarinator-process<grammarinator-process>`. This can be compiled
34+
by the :ref:`the build script<cpp_compilation>` using the ``--grafl`` flag
35+
(short for *grammarinator-afl*).
36+
37+
Example using the `HTML grammar`_::
38+
39+
python3 grammarinator-cxx/dev/build.py --clean \
40+
--generator HTMLGenerator \
41+
--includedir <dir-to-HTMLGenerator> \
42+
--afl-includedir <AFLplusplus-root>/include \
43+
--serializer SimpleSpaceSerializer \
44+
--grafl
45+
46+
This command produces a shared library::
47+
48+
grammarinator-cxx/build/lib/libgrafl-html.so
49+
50+
AFL++ will load this ``.so`` as the custom mutator library through the
51+
``AFL_CUSTOM_MUTATOR_LIBRARY`` environment variable.
52+
53+
Test inputs are expected to be encoded as ``.grt*`` trees
54+
(e.g., FlatBuffer-encoded). During fuzzing, mutations will occur in a
55+
**grammar-aware** manner, resulting in:
56+
57+
- higher syntactic validity of inputs,
58+
- better exploration of the structured input space,
59+
- and potentially deeper semantic bugs found in the target.
60+
61+
Note that only ``.grt*``-style inputs (e.g., ``.grtf`` for FlatBuffer-encoded
62+
trees) are supported by the AFL++ integration.
63+
64+
Fuzzing Configuration
65+
---------------------
66+
67+
Unlike the :ref:`grammarinator-generate<grammarinator-generate>` utility, the
68+
AFL++ custom mutator integration cannot be configured through command-line
69+
arguments. Instead, the behavior of the mutator can be controlled via
70+
environment variables prefixed with ``GRAFL_``.
71+
72+
The following options are currently supported:
73+
74+
* **GRAFL_MAX_DEPTH**: Equivalent to ``--max-depth`` (integer)
75+
* **GRAFL_MAX_TOKENS**: Equivalent to ``--max-tokens`` (integer)
76+
* **GRAFL_MEMO_SIZE**: Equivalent to ``--memo-size`` (integer)
77+
* **GRAFL_RANDOM_MUTATORS**: Enables random mutators; inverse of
78+
``--disable-random-mutators`` (boolean; accepts ``1``, ``true``, or ``yes``
79+
case-insensitively)
80+
* **GRAFL_WEIGHTS**: Equivalent to ``--weights`` (path to a JSON file)
81+
* **GRAFL_MAX_TRIM_STEPS**: Maximum number of mutation steps performed during
82+
trimming of a single test input (integer)
83+
84+
Verifying the Setup
85+
-------------------
86+
87+
To run a fuzzing session with AFL++ equipped with Grammarinator, a compiler
88+
wrapper (e.g., ``afl-clang-fast``) and the ``afl-fuzz`` utility must first be
89+
obtained. Both can be installed or built with following the instruction in the
90+
official `AFL++ documentation`_.
91+
92+
Once the target application is compiled with the AFL++ compiler wrapper, the
93+
required instrumentation is automatically injected into the binary. This
94+
instrumentation is later used by ``afl-fuzz`` to guide the fuzzing process.
95+
96+
Next, select or create a grammar that describes the expected input format (e.g.,
97+
`HTML grammar`_), then :ref:`build<cpp_compilation>` the required binaries with
98+
``--grafl``, and optionally also with ``--generate`` and ``--decode`` flags.
99+
100+
The next step is to prepare an initial tree corpus that serves as the starting
101+
point for the fuzzing session. One option is to generate this corpus from
102+
scratch using the :ref:`grammarinator-generate<grammarinator-generate>`
103+
utility. For example::
104+
105+
grammarinator-generate-html \
106+
-n 100 \
107+
-o html-src/%d.html \
108+
--population html-trees/ \
109+
--keep-trees
110+
111+
Alternatively, an initial tree corpus can be created by converting existing
112+
source files (e.g., HTML documents) into tree format using the
113+
:ref:`grammarinator-parse<grammarinator-parse>` utility. For example::
114+
115+
grammarinator-parse html-src \
116+
-o html-trees \
117+
-g HTMLLexer.g4 HTMLParser.g4 \
118+
--tree-format flatbuffers
119+
120+
To test the integration, run AFL++ in custom-mutator-only mode and point it to
121+
the generated shared library::
122+
123+
AFL_CUSTOM_MUTATOR_ONLY=1 \
124+
AFL_CUSTOM_MUTATOR_LIBRARY=grammarinator-cxx/build/lib/libgrafl-html.so \
125+
afl-fuzz -i html-trees -o outdir -- ./target_app @@
126+
127+
Setting ``AFL_CUSTOM_MUTATOR_ONLY=1`` is **mandatory**. Without this flag,
128+
AFL++ would apply its built-in byte-level mutators to the test cases, which
129+
would corrupt the encoded tree representation used by Grammarinator.
130+
131+
**Note 1:** When using AFL++ with Grammarinator integration, both the input
132+
and output corpora must be in tree format. Therefore, any existing input corpus
133+
must first be converted into trees using the
134+
:ref:`grammarinator-parse<grammarinator-parse>` utility. After the fuzzing
135+
session, the resulting tree corpus can be converted back into source-level test
136+
cases using the :ref:`grammarinator-decode<grammarinator-decode-cpp>` utility.
137+
138+
**Note 2:** The items of a tree corpus can be minimized using the ``afl-tmin``
139+
tool in a grammar-aware manner by providing the appropriate custom
140+
mutator-related environment variables. For example::
141+
142+
AFL_CUSTOM_MUTATOR_ONLY=1 \
143+
AFL_CUSTOM_MUTATOR_LIBRARY=grammarinator-cxx/build/lib/libgrafl-html.so \
144+
afl-tmin -i html-trees -o html-trimmed -e -- ./target_app @@
145+
146+
.. _AFL++ documentation: https://aflplus.plus/docs/install/
147+
.. _`HTML grammar`: https://github.com/antlr/grammars-v4/tree/master/html

docs/guide/fuzzer_building.rst

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ The following sections describe each step in detail.
2020
Generator Creation from ANTLR Grammar
2121
-------------------------------------
2222

23+
.. _grammarinator-process:
24+
2325
In both Python and C++ backends, the first step is to convert the ANTLR grammar
2426
into a generator class. This generator encapsulates the logic for producing
2527
derivation trees from the grammar rules.
@@ -34,7 +36,8 @@ header-only C++ file (e.g., ``HTMLGenerator.hpp``). While this class does not
3436
yet have dedicated API documentation, it mirrors the structure and behavior of
3537
the Python generator and is used by the compiled fuzzing tools
3638
(e.g., the ``grammarinator-generate-html`` binary and
37-
:ref:`libFuzzer integration<libfuzzer integration>`).
39+
:ref:`libFuzzer integration<libfuzzer integration>` or
40+
:ref:`AFL++ integration<aflpp integration>`).
3841

3942
The generator class -- whether Python or C++ -- is automatically produced by
4043
the ``grammarinator-process`` command line utility. This tool loads and
@@ -125,7 +128,7 @@ For example, if using a custom serializer and transformer, your config file
125128

126129
Depending on the build flags, the following outputs may be generated:
127130

128-
- With ``--tools``:
131+
- With ``--generate``:
129132

130133
- ``grammarinator-generate-<name>``: standalone blackbox generator
131134

@@ -135,10 +138,20 @@ Depending on the build flags, the following outputs may be generated:
135138
``LLVMFuzzerCustomMutator`` or ``LLVMFuzzerCustomCrossover`` (useful for
136139
:ref:`libFuzzer integration<libfuzzer integration>`)
137140

141+
- With ``--grafl``:
142+
143+
- ``libgafl-<name>.so``: shared library to define various hooks for
144+
:ref:`AFL++ integration<aflpp integration>`
145+
138146
- With ``--fuzznull``:
139147

140148
- ``fuzznull-<name>``: dummy libFuzzer binary for integration testing
141149

150+
- With ``--decode``:
151+
152+
- ``grammarinator-decode-<name>``: standalone tool to convert tests from
153+
tree to source format with the chosen serializer
154+
142155
All outputs are written to the ``build/<Release|Debug>/bin`` and
143156
``build/<Release|Debug>/lib`` directories.
144157

docs/guide/libfuzzer_integration.rst

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,27 @@ mutator. Test inputs are expected to be serialized ``.grt*`` trees
5959
Note that only ``.grt*``-style inputs (e.g., ``.grtf`` for FlatBuffer-encoded
6060
trees) are supported by the libFuzzer integration.
6161

62+
Fuzzing Configuration
63+
---------------------
64+
65+
The libFuzzer mutator integration can be configured through command-line
66+
options, similarly to :ref:`grammarinator-generate<grammarinator-generate>`.
67+
These arguments **must** be passed after the ``-ignore_remaining_args=1`` flag,
68+
so that libFuzzer forwards them to Grammarinator.
69+
70+
The following options are supported:
71+
72+
* **-max_depth**: Equivalent to ``--max-depth`` (integer)
73+
* **-max_tokens**: Equivalent to ``--max-tokens`` (integer)
74+
* **-memo_size**: Equivalent to ``--memo-size`` (integer)
75+
* **-random_mutators**: Enable random mutators; equivalent to the
76+
inverse of ``--disable-random-mutators`` (0 or 1)
77+
* **-weights**: Equivalent to ``--weights`` (path to a JSON file)
78+
* **-allowlist**: Equivalent to ``--allowlist`` (comma-separated list of
79+
enabled creators)
80+
* **-blocklist**: Equivalent to ``--blocklist`` (comma-separated list of
81+
disabled creators)
82+
6283
Verifying the Setup
6384
-------------------
6485

@@ -74,5 +95,12 @@ This will create a ``fuzznull-html`` binary under
7495
``grammarinator-cxx/build/Release/bin/``, which can be invoked directly to
7596
verify the setup and test input processing.
7697

77-
Note: clang++ must be used in this case, since other compilers don't support
78-
libFuzzer.
98+
**Note 1:** clang++ must be used in this case, since other compilers don't
99+
support libFuzzer.
100+
101+
**Note 2:** When using LibFuzzer with Grammarinator integration, both the input
102+
and output corpora must be in tree format. Therefore, any existing input corpus
103+
must first be converted into trees using the
104+
:ref:`grammarinator-parse<grammarinator-parse>` utility. After the fuzzing
105+
session, the resulting tree corpus can be converted back into source-level test
106+
cases using the :ref:`grammarinator-decode<grammarinator-decode-cpp>` utility.

docs/guide/models.rst

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -108,10 +108,12 @@ models are:
108108
109109
3. :class:`grammarinator.runtime.WeightedModel`: This model modifies the
110110
behavior of another model by adjusting (pre-multiplying) the weights of
111-
alternatives. By default, the multiplier of each alternative starts from 1,
112-
unless custom values are assigned to specific alternatives. This assignment
113-
can happen through the constructor of WeightedModel (when using the API)
114-
or with the ``--weigths`` CLI option of the
111+
alternatives and by setting the probability of repeating a quantified
112+
subexpression. By default, the multiplier of each alternative starts from
113+
1 and the probability of each quantifier is 0.5, unless custom values are
114+
assigned to specific alternatives or quantifiers. This assignment can
115+
happen through the constructor of WeightedModel (when using the API) or
116+
with the ``--weigths`` CLI option of the
115117
:ref:`grammarinator-generate<grammarinator-generate>` utility by providing
116118
a file containing the weights.
117119

@@ -124,4 +126,7 @@ models are:
124126

125127
.. code-block:: text
126128
127-
{ "ruleName_A": {"alternation_B_idx": {"alternative_C_idx": weight_ABC, ...}, ...}, ... }
129+
{
130+
"alts": { "ruleName_A": {"alternation_B_idx": {"alternative_C_idx": weight_ABC, ...}, ...}, ... ,
131+
"quants": { "ruleName_C": {"quant_D_idx": weight_ABC, ...}, ... ,
132+
}

docs/guide/population.rst

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,7 @@ Supported Tree Formats
2323
- Compact and fast to read/write.
2424
- Cross-language compatible (e.g., usable from Python, C++, etc. with
2525
FlatBuffer_ bindings).
26-
- Supported natively by:
27-
28-
- ``grammarinator-generate`` (Python)
29-
- C++ backend generators (e.g., ``grammarinator-generate-html``)
30-
- libFuzzer integration (``libgrlf-html.a``)
31-
26+
- Supported by both Python and C++ components.
3227
- Default format when tree codec is not explicitly selected.
3328

3429
2. **JSON-encoded trees** (``.grtj``):
@@ -127,3 +122,14 @@ specified by ``--tree-format``. The resulting trees are then serialized using
127122
the function defined by ``--serializer`` (or :class:`str` by default). The
128123
serialized tests are saved into the ``--out`` directory with the ``--ext``
129124
extension and encoded with ``--encoding``.
125+
126+
.. _grammarinator-decode-cpp:
127+
128+
The decoder functionality can be created not only in Python, but also
129+
in C++ using serializers written in C++. For this, the ``--decode`` argument
130+
has to be provided to :ref:`the build script<cpp_compilation>`. When
131+
converting an output corpus generated by either the
132+
:ref:`libFuzzer integration<libfuzzer integration>` or the
133+
:ref:`AFL++ integration<aflpp integration>`, it is recommended to use
134+
these C++ decoders. When built with the same configuration, they will reproduce
135+
exactly the same test cases that were observed during fuzzing.

0 commit comments

Comments
 (0)