Skip to content

Commit d26ad82

Browse files
authored
Generalize extract_wasms.py (WebAssembly#7254)
Rather than pattern-match the very specific form we emit in ClusterFuzz testcases, support any Uint8Array that contains what look like wasm contents. This allows us to also process Fuzzilli testcases.
1 parent e7abb04 commit d26ad82

File tree

2 files changed

+60
-12
lines changed

2 files changed

+60
-12
lines changed

scripts/clusterfuzz/extract_wasms.py

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,16 @@
1414
# limitations under the License.
1515

1616
'''
17-
Wasm extractor for testcases generated by the ClusterFuzz run.py script. Usage:
17+
Wasm extractor for testcases generated by the ClusterFuzz run.py script. This is
18+
general enough to also handle Fuzzilli output.
19+
20+
Usage:
1821
1922
extract_wasms.py INFILE.js OUTFILE
2023
2124
That will find embedded wasm files in INFILE.js, of the form
2225
23-
var .. = new Uint8Array([..wasm_contents..]);
26+
new Uint8Array([..wasm_contents..]);
2427
2528
and extract them into OUTFILE.0.wasm, OUTFILE.1.wasm, etc. It also emits
2629
OUTFILE.js which will no longer contain the embedded contents, after which the
@@ -50,24 +53,41 @@ def get_wasm_filename():
5053
js = f.read()
5154

5255

53-
def repl(text):
56+
def repl(match):
57+
text = match.group(0)
58+
5459
# We found something of the form
5560
#
56-
# var binary = new Uint8Array([..binary data as numbers..]);
61+
# new Uint8Array([..binary data as numbers..]);
5762
#
58-
# Parse out the numbers into a binary wasm file.
59-
numbers = text.groups()[0]
63+
# See if the numbers are the beginnings of a wasm file, "\0asm". If so, we
64+
# assume it is wasm. (We are careful here because Fuzzilli output can
65+
# contain normal JavaScript Typed Arrays, which we do not want to touch.)
66+
numbers = match.groups()[0]
6067
numbers = numbers.split(',')
61-
numbers = [int(n) for n in numbers]
68+
69+
try:
70+
# Handle both base 10 and 16 by passing in base 0.
71+
parsed = [int(n, 0) for n in numbers]
72+
binary = bytes(parsed)
73+
except ValueError:
74+
# Not wasm; return the existing text.
75+
return text
76+
77+
if binary[:4] != b'\0asm':
78+
return text
79+
80+
# It is wasm. Parse out the numbers into a binary wasm file.
6281
with open(get_wasm_filename(), 'wb') as f:
63-
f.write(bytes(numbers))
82+
f.write(binary)
6483

65-
# Replace it with nothing.
66-
return ''
84+
# Replace the Uint8Array with undefined + a comment.
85+
return 'undefined /* extracted wasm */'
6786

6887

69-
# Replace the wasm files and write them out.
70-
js = re.sub(r'var \w+ = new Uint8Array\(\[([\d,]+)\]\)', repl, js)
88+
# Replace the wasm files and write them out. We investigate any new Uint8Array
89+
# on an array of values like [100, 200] or [0x61, 0x6D, 0x6a] etc.
90+
js = re.sub(r'new Uint8Array\(\[([\d,x a-fA-F]+)\]\)', repl, js)
7191

7292
# Write out the new JS.
7393
with open(f'{out}.js', 'w') as f:

test/lit/scripts/extract_wasms.lit

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
;; Test extracting wasm files from JS.
2+
3+
;; A proper wasm start sequence (\0asm), so we will extract it.
4+
;; RUN: echo "good1(new Uint8Array([0x00, 0x61, 0x73, 0x6D, 0x01]));" > %t.js
5+
6+
;; A difference in the second byte, so we won't.
7+
;; RUN: echo "bad1(new Uint8Array([0x00, 0xff, 0x73, 0x6D, 0x01]));" >> %t.js
8+
9+
;; The last byte is unparseable as an integer, so we won't.
10+
;; RUN: echo "bad2(new Uint8Array([0x00, 0x61, 0x73, 0x6D, 6Dx0]));" >> %t.js
11+
12+
;; This is not a Uint8Array, so we do nothing.
13+
;; RUN: echo "bad3(new Uint16Array([0x00, 0x61, 0x73, 0x6D, 0x01]));" >> %t.js
14+
15+
;; Another proper one. Note the second number is in base 10, which works too,
16+
;; & there is various odd whitespace which we also ignore.
17+
;; RUN: echo "good2(new Uint8Array([0x00,97, 0x73, 0x6D,0x01]));" >> %t.js
18+
19+
;; RUN: python %S/../../../scripts/clusterfuzz/extract_wasms.py %t.js %t.out
20+
;; RUN: cat %t.out.js | filecheck %s
21+
;;
22+
;; We extracted the good but not the bad.
23+
;; CHECK: good1(undefined /* extracted wasm */)
24+
;; CHECK: bad1(new Uint8Array
25+
;; CHECK: bad2(new Uint8Array
26+
;; CHECK: bad3(new Uint16Array
27+
;; CHECK: good2(undefined /* extracted wasm */)
28+

0 commit comments

Comments
 (0)