Skip to content

Commit 67a9e88

Browse files
ChenDu-Metachemag
authored andcommitted
ISOBMFF: add fuzzing support for Parser
Add fuzzing support to ISOBMFF with libfuzzer. Add fuzzing to Parser
1 parent 1dba7f1 commit 67a9e88

File tree

7 files changed

+390
-0
lines changed

7 files changed

+390
-0
lines changed

Build/fuzz/Parser_fuzzer

2.42 MB
Binary file not shown.

fuzz/Makefile

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Copyright (c) Meta Platforms, Inc. and its affiliates.
2+
3+
# Include the main project's Makefile configuration
4+
include ../Submodules/makelib/Common.mk
5+
6+
# Fuzzing specific configuration
7+
DIR_SRC := ../ISOBMFF/source/
8+
DIR_INC := ../ISOBMFF/include/
9+
DIR_BUILD_PRODUCTS := ../Build/Release/Products/
10+
DIR_BUILD_FUZZ := ../Build/fuzz/
11+
12+
# Compiler and flags
13+
CC := clang
14+
CXX := clang++
15+
CXXFLAGS := -std=c++14 -I$(DIR_INC) -I$(DIR_SRC) -fsanitize=address,fuzzer,undefined -g -O1
16+
LDFLAGS := -fsanitize=address,fuzzer,undefined $(DIR_BUILD_PRODUCTS)x86_64/libISOBMFF.a -lstdc++ -lpthread
17+
18+
# Fuzzer targets
19+
FUZZERS := Parser_fuzzer
20+
FUZZ_TARGETS := $(patsubst %,$(DIR_BUILD_FUZZ)%,$(FUZZERS))
21+
22+
# Default target
23+
all: $(FUZZ_TARGETS)
24+
25+
# Create build directory
26+
$(DIR_BUILD_FUZZ):
27+
@mkdir -p $(DIR_BUILD_FUZZ)
28+
29+
# Rule to build fuzzers
30+
$(DIR_BUILD_FUZZ)%_fuzzer: %_fuzzer.cpp | $(DIR_BUILD_FUZZ)
31+
@echo "Building fuzzer: $<"
32+
$(CXX) $(CXXFLAGS) $< $(LDFLAGS) -o $@
33+
@echo "Fuzzer built successfully: $@"
34+
35+
# Rule to run fuzzers
36+
RUNS?=100
37+
fuzz: $(FUZZ_TARGETS)
38+
@echo "Running fuzzers..."
39+
@for fuzzer in $(FUZZERS); do \
40+
mkdir -p $(DIR_BUILD_FUZZ)corpus/$$fuzzer; \
41+
$(DIR_BUILD_FUZZ)$$fuzzer -artifact_prefix=corpus/$$fuzzer/ corpus/$$fuzzer -runs=$(RUNS); \
42+
done
43+
44+
.PHONY: all fuzz

fuzz/Parser_fuzzer.cpp

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and its affiliates.
3+
*/
4+
5+
// This file was auto-generated using fuzz/converter.py from
6+
// Parser_unittest.cpp.
7+
// Do not edit directly.
8+
9+
#include <ISOBMFF.hpp> // for various
10+
#include <ISOBMFF/Parser.hpp> // for Parser
11+
12+
// libfuzzer infra to test the fuzz target
13+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
14+
const std::vector<uint8_t> buffer_vector = {data, data + size};
15+
{
16+
ISOBMFF::Parser parser;
17+
parser.Parse(buffer_vector);
18+
}
19+
return 0;
20+
}

fuzz/README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Auto-Generating Fuzzers and Corpus from Unittests
2+
3+
We decided to use the unittests (which show how to test each of the
4+
parsers, and even provides some interesting input cases) to auto-generate
5+
both the fuzzers (the cpp files that contain the fuzzing glue) and the
6+
corpus.
7+
8+
We provide a script (`converter.py`) that parses unittests in order to
9+
auto-generate both the fuzzers (in the libfuzzer's case this means the
10+
`LLVMFuzzerTestOneInput()` function), and the corpus used as fuzzing
11+
seed. The idea is to allow the user to signal the parts of the unittest
12+
that are useful for fuzzing, including code that needs to be fuzzed
13+
(in order to generate fuzzers), and the unittest buffers (which can be
14+
used for the corpus).
15+
16+
* In order to signal the code that needs to be fuzzed, the unittest writer
17+
can use the following syntax:
18+
19+
```
20+
$ cat ISOBMFF-Tests/Parser_unittest.cpp
21+
...
22+
// fuzzer::conv: begin
23+
ISOBMFF::Parser parser;
24+
parser.Parse(buffer);
25+
// fuzzer::conv: end
26+
...
27+
```
28+
29+
This tells the autogen script to put all the code between the
30+
"fuzzer::conv: begin" and "fuzzer::conv: end" tags into a fuzzer
31+
function (libfuzzer's `LLVMFuzzerTestOneInput()` function). The
32+
script does a couple of other fixes, including copying the include
33+
files, fixing the parameter names, and doing some minor namespace
34+
adjustments. The latter should be formalizable if there is interest
35+
in reusing the infra for another project.
36+
37+
* In order to signal the buffer(s) that can be used as seed, the unittest
38+
writer can use the following syntax:
39+
40+
```
41+
$ cat ISOBMFF-Tests/Parser_unittest.cpp
42+
...
43+
// fuzzer::conv: data
44+
const std::vector<uint8_t> &buffer = {
45+
0x00, 0x00, 0x00, 0x18,
46+
0x66, 0x74, 0x79, 0x70,
47+
...
48+
};
49+
...
50+
```
51+
52+
The script will look for C++-defined buffers, extract all the hexadecimal
53+
constants, and put them into a binary file.
54+
55+
56+
# Operation
57+
58+
(1) In order to re-generate a fuzzing files, use the converter script:
59+
60+
```
61+
$ ./converter.py ../ISOBMFF-Tests/Parser_unittest.cpp ./
62+
```
63+
64+
The script will parse the input file (`../ISOBMFF-Tests/Parser_unittest.cpp`)
65+
and generate a similarly-named fuzzer (`Parser_fuzzer.cpp`) in the
66+
output directory (`./`). It will also generate some cases in the per-fuzzer
67+
corpus directory (`corpus/Parser_fuzzer/unittest.*.bin`).
68+
69+
(2) Build the excutable
70+
* After build of ISOBMFF main project
71+
```
72+
$ cd fuzz
73+
$ make
74+
```
75+
76+
(3) In order to run the fuzzers, use:
77+
78+
```
79+
$ cd fuzz
80+
$ make fuzz
81+
...
82+
```
83+
84+
This will run each of the generated fuzzers a fixed number of times, and
85+
produce new cases in the per-fuzzer corpus directory.

fuzz/converter.py

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
#!/usr/bin/env python3
2+
3+
# Copyright (c) Meta Platforms, Inc. and its affiliates.
4+
5+
"""
6+
converter.py - A tool to generate fuzzers from unit tests.
7+
8+
This script parses unit test files to extract code that should be fuzzed and
9+
test data that can be used as corpus for fuzzing. It generates fuzzer files
10+
and corpus files based on the extracted information.
11+
"""
12+
13+
import argparse
14+
import os.path
15+
import pathlib
16+
import re
17+
import string
18+
import sys
19+
20+
default_values = {
21+
"debug": 0,
22+
"infile": None,
23+
"outdir": "./",
24+
}
25+
26+
27+
CODE_TEMPLATE = """\
28+
/*
29+
* Copyright (c) Meta Platforms, Inc. and its affiliates.
30+
*/
31+
32+
// This file was auto-generated using fuzz/converter.py from
33+
// $infile.
34+
// Do not edit directly.
35+
36+
$include
37+
38+
// libfuzzer infra to test the fuzz target
39+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
40+
$code
41+
return 0;
42+
}
43+
"""
44+
45+
DATA_REGEXP = r"0x[0-9a-f][0-9a-f]"
46+
47+
48+
def do_something(options):
49+
# read and parse the input file
50+
lines = read_input(options.infile, options.debug)
51+
output = parse_input(lines, options.debug)
52+
# get the output file name
53+
if options.infile == sys.stdin:
54+
fout_code = sys.stdout
55+
outfile_data_dir = None
56+
outfile_data_template = None
57+
else:
58+
outfile_code = os.path.basename(options.infile)
59+
outfile_code = outfile_code.replace("unittest", "fuzzer")
60+
outfile_code = os.path.join(options.outdir, outfile_code)
61+
fout_code = open(outfile_code, "w")
62+
outfile_data_dir = os.path.basename(options.infile)
63+
outfile_data_dir = outfile_data_dir.replace("unittest", "fuzzer")
64+
outfile_data_dir = outfile_data_dir.replace(".cpp", "")
65+
outfile_data_dir = os.path.join(options.outdir, "corpus", outfile_data_dir)
66+
outfile_data_template = os.path.join(outfile_data_dir, "unittest.%02i.bin")
67+
68+
# compose the output file
69+
infile = os.path.basename(options.infile)
70+
output["infile"] = infile
71+
code_template = string.Template(CODE_TEMPLATE)
72+
code = code_template.substitute(output)
73+
fout_code.write(code)
74+
fout_code.close()
75+
if options.debug > 0:
76+
print("outfile_code: %s" % outfile_code)
77+
78+
# compose the data files
79+
for i, data_bytes in enumerate(output["data"]):
80+
outfile_data = outfile_data_template % i
81+
if outfile_data_dir is None:
82+
continue
83+
pathlib.Path(outfile_data_dir).mkdir(parents=True, exist_ok=True)
84+
fout_data = open(outfile_data, "wb")
85+
fout_data.write(data_bytes)
86+
fout_data.close()
87+
if options.debug > 0:
88+
print("outfile_data: %s" % outfile_data)
89+
90+
91+
def read_input(infile, debug):
92+
# open infile
93+
if infile != sys.stdin:
94+
try:
95+
fin = open(infile, "r+")
96+
except IOError:
97+
print('Error: cannot open file "%s":', infile)
98+
else:
99+
fin = sys.stdin.buffer
100+
101+
# process infile
102+
lines = fin.readlines()
103+
fin.close()
104+
return lines
105+
106+
107+
def parse_input(lines, debug):
108+
lines_include = []
109+
lines_data = []
110+
lines_code = []
111+
output = {}
112+
113+
# extract the includes, data, and code
114+
data_mode = False
115+
code_mode = False
116+
for line in lines:
117+
if isinstance(line, bytes):
118+
line = line.decode("utf-8")
119+
# skip blank lines/comments
120+
if len(line.strip()) == 0:
121+
continue
122+
# need include lines, buffers, code lines
123+
if line.startswith("#include "):
124+
lines_include.append(line)
125+
elif "fuzzer::conv: data" in line:
126+
# data mode
127+
data_mode = True
128+
lines_data.append([])
129+
continue
130+
elif line == "};\n":
131+
data_mode = False
132+
elif "fuzzer::conv: begin" in line:
133+
# begin code mode
134+
data_mode = False
135+
code_mode = True
136+
lines_code.append(" {\n")
137+
continue
138+
elif "fuzzer::conv: end" in line:
139+
# end code mode
140+
code_mode = False
141+
lines_code.append(" }\n")
142+
continue
143+
if data_mode:
144+
lines_data[-1].append(line)
145+
if code_mode:
146+
lines_code.append(line)
147+
148+
# clean the includes
149+
output["include"] = ""
150+
for line in lines_include:
151+
# remove gmock and gtest lines
152+
if "gmock" in line or "gtest" in line:
153+
continue
154+
output["include"] += line
155+
156+
# clean the data
157+
output["data"] = []
158+
for ll in lines_data:
159+
data_bytes = re.findall(DATA_REGEXP, "".join(ll))
160+
output["data"].append(bytes.fromhex("".join(b[2:] for b in data_bytes)))
161+
162+
# clean the code
163+
output["code"] = ""
164+
for line in lines_code:
165+
# add namespace if needed
166+
if "ISOBMFF::" not in line and "ISOBMFF " not in line:
167+
line = line.replace("Parser", "ISOBMFF::Parser")
168+
line = line.replace("buffer", "buffer_vector")
169+
output["code"] += line
170+
output["code"] = output["code"].strip("\n")
171+
172+
# Add buffer_vector definition
173+
output["code"] = (
174+
" const std::vector<uint8_t> buffer_vector = {data, data + size};\n"
175+
+ output["code"]
176+
)
177+
178+
return output
179+
180+
181+
def get_options(argv):
182+
"""Generic option parser.
183+
184+
Args:
185+
argv: list containing arguments
186+
187+
Returns:
188+
Namespace - An argparse.ArgumentParser-generated option object
189+
"""
190+
# init parser
191+
parser = argparse.ArgumentParser(description=__doc__)
192+
parser.add_argument(
193+
"-d",
194+
"--debug",
195+
action="count",
196+
dest="debug",
197+
default=default_values["debug"],
198+
help="Increase verbosity (use multiple times for more)",
199+
)
200+
parser.add_argument(
201+
"--quiet",
202+
action="store_const",
203+
dest="debug",
204+
const=-1,
205+
help="Zero verbosity",
206+
)
207+
parser.add_argument(
208+
"infile",
209+
type=str,
210+
default=default_values["infile"],
211+
metavar="input-file",
212+
help="input file",
213+
)
214+
parser.add_argument(
215+
"outdir",
216+
type=str,
217+
default=default_values["outdir"],
218+
metavar="output-file",
219+
help="output file",
220+
)
221+
# do the parsing
222+
options = parser.parse_args(argv[1:])
223+
return options
224+
225+
226+
def main(argv):
227+
# parse options
228+
options = get_options(argv)
229+
# get infile/outdir
230+
if options.infile == "-":
231+
options.infile = sys.stdin
232+
# print results
233+
if options.debug > 0:
234+
print(options)
235+
# do something
236+
do_something(options)
237+
238+
239+
if __name__ == "__main__":
240+
# at least the CLI program name: (CLI) execution
241+
main(sys.argv)

fuzz/corpus/Parser_fuzzer/crash-da39a3ee5e6b4b0d3255bfef95601890afd80709

Whitespace-only changes.
3.15 KB
Binary file not shown.

0 commit comments

Comments
 (0)