Skip to content

Commit a03f002

Browse files
authored
[BranchHints] Fuzz branch hints (#7704)
Add two helper passes, one to delete specific branch hints by their instrumentation ID (as added by InstrumentBranchHints), and one to remove all instrumentation. The new fuzzer then * Adds random branch hints * Instruments them and runs that to see the output * Delete all incorrect hints * Remove all instrumentation, leaving a wasm with correct hints only * Optimize * Add new instrumentation and run that to see the output The idea is that once we have a wasm with only correct hints, the optimizer is allowed to remove some (e.g. in DCE), but it should never emit an invalid branch hint (e.g. by forgetting to flip a hint when it flips an if). We do need to avoid passes that reorder or unconditionalize code, or parts of them, so this is not quite that simple, but it still allows us to fuzz this.
1 parent 07860ff commit a03f002

18 files changed

+1550
-75
lines changed

scripts/fuzz_opt.py

Lines changed: 219 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -515,10 +515,10 @@ def compare_between_vms(x, y, context):
515515
y_line = y_lines[i]
516516
if x_line != y_line:
517517
# this is different, but maybe it's a vm difference we can ignore
518-
LEI_LOGGING = '[LoggingExternalInterface logging'
519-
if x_line.startswith(LEI_LOGGING) and y_line.startswith(LEI_LOGGING):
520-
x_val = x_line[len(LEI_LOGGING) + 1:-1]
521-
y_val = y_line[len(LEI_LOGGING) + 1:-1]
518+
LOGGING_PREFIX = '[LoggingExternalInterface logging'
519+
if x_line.startswith(LOGGING_PREFIX) and y_line.startswith(LOGGING_PREFIX):
520+
x_val = x_line[len(LOGGING_PREFIX) + 1:-1]
521+
y_val = y_line[len(LOGGING_PREFIX) + 1:-1]
522522
if numbers_are_close_enough(x_val, y_val):
523523
continue
524524
if x_line.startswith(FUZZ_EXEC_NOTE_RESULT) and y_line.startswith(FUZZ_EXEC_NOTE_RESULT):
@@ -1844,6 +1844,220 @@ def get_relevant_lines(wat):
18441844
compare(get_relevant_lines(original), get_relevant_lines(processed), 'Preserve')
18451845

18461846

1847+
# Test that we preserve branch hints properly. The invariant that we test here
1848+
# is that, given correct branch hints (that is, the input wasm's branch hints
1849+
# are always correct: a branch is taken iff the hint is that it is taken), then
1850+
# the optimizer does not end up with incorrect branch hints. It is fine if the
1851+
# optimizer removes some hints (it may remove entire chunks of code in DCE, for
1852+
# example, and it may find ways to simplify code so fewer things execute), but
1853+
# it should not emit a branch hint that is wrong - if it is not certain, it
1854+
# should remove the branch hint.
1855+
#
1856+
# Note that bugs found by this fuzzer tend to require the following during
1857+
# reducing: BINARYEN_TRUST_GIVEN_WASM=1 in the env, and --text as a parameter.
1858+
class BranchHintPreservation(TestCaseHandler):
1859+
frequency = 0.1
1860+
1861+
def handle(self, wasm):
1862+
# Generate an instrumented wasm.
1863+
instrumented = wasm + '.inst.wasm'
1864+
run([
1865+
in_bin('wasm-opt'),
1866+
wasm,
1867+
'-o', instrumented,
1868+
# Add random branch hints (so we have something to work with).
1869+
'--randomize-branch-hints',
1870+
# Instrument them with logging.
1871+
'--instrument-branch-hints',
1872+
'-g',
1873+
] + FEATURE_OPTS)
1874+
1875+
# Collect the logging.
1876+
out = run_bynterp(instrumented, ['--fuzz-exec-before', '-all'])
1877+
1878+
# Process the output. We look at the lines like this:
1879+
#
1880+
# [LoggingExternalInterface log-branch 1 0 0]
1881+
#
1882+
# where the three integers are: ID, predicted, actual.
1883+
all_ids = set()
1884+
bad_ids = set()
1885+
LOG_BRANCH_PREFIX = '[LoggingExternalInterface log-branch'
1886+
for line in out.splitlines():
1887+
if line.startswith(LOG_BRANCH_PREFIX):
1888+
# (1:-1 strips away the '[', ']' at the edges)
1889+
_, _, id_, hint, actual = line[1:-1].split(' ')
1890+
all_ids.add(id_)
1891+
if hint != actual:
1892+
# This hint was misleading.
1893+
bad_ids.add(id_)
1894+
1895+
# If no good ids remain, there is nothing to test (no hints will remain
1896+
# later down, after we remove bad ones).
1897+
if bad_ids == all_ids:
1898+
note_ignored_vm_run('no good ids')
1899+
return
1900+
1901+
# Generate proper hints for testing: A wasm file with 100% valid branch
1902+
# hints, and instrumentation to verify that.
1903+
de_instrumented = wasm + '.de_inst.wasm'
1904+
args = [
1905+
in_bin('wasm-opt'),
1906+
instrumented,
1907+
'-o', de_instrumented,
1908+
]
1909+
# Remove the bad ids (using the instrumentation to identify them by ID).
1910+
if bad_ids:
1911+
args += [
1912+
'--delete-branch-hints=' + ','.join(bad_ids),
1913+
]
1914+
args += [
1915+
# Remove all prior instrumentation, so it does not confuse us later
1916+
# when we log our final hints, and also so it does not inhibit
1917+
# optimizations.
1918+
'--deinstrument-branch-hints',
1919+
'-g',
1920+
] + FEATURE_OPTS
1921+
run(args)
1922+
1923+
# Add optimizations to see if things break.
1924+
opted = wasm + '.opted.wasm'
1925+
args = [
1926+
in_bin('wasm-opt'),
1927+
de_instrumented,
1928+
'-o', opted,
1929+
'-g',
1930+
1931+
# Some passes are just skipped, as they do not modify ifs or brs,
1932+
# but they do break the invariant of not adding bad branch hints.
1933+
# There are two main issues here:
1934+
# * Moving code around, possibly causing it to start to execute if
1935+
# it previously was not reached due to a trap (a branch hint
1936+
# seems to have no effects in the optimizer, so it will do such
1937+
# movements). And if it starts to execute and is a wrong hint, we
1938+
# get an invalid fuzzer finding.
1939+
# * LICM moves code out of loops.
1940+
'--skip-pass=licm',
1941+
# * HeapStoreOptimization moves struct.sets closer to struct.news.
1942+
'--skip-pass=heap-store-optimization',
1943+
# * MergeBlocks moves code out of inner blocks to outer blocks.
1944+
'--skip-pass=merge-blocks',
1945+
# * Monomorphize can subtly reorder code:
1946+
#
1947+
# (call $foo
1948+
# (select
1949+
# (i32.div_s ..which will trap..)
1950+
# (if with branch hint)
1951+
# =>
1952+
# (call $foo_1
1953+
# (if with branch hint)
1954+
#
1955+
# where $foo_1 receives the if's result and uses it in the
1956+
# ("reverse-inlined") select. Now the if executes first, when
1957+
# previously the trap stopped it.
1958+
'--skip-pass=monomorphize',
1959+
'--skip-pass=monomorphize-always',
1960+
# SimplifyGlobals finds globals that are "read only to be written",
1961+
# and can remove the ifs that do so:
1962+
#
1963+
# if (foo) { foo = 1 }
1964+
# =>
1965+
# if (0) {}
1966+
#
1967+
# This is valid if the global's value is never read otherwise, but
1968+
# it does alter the if's behavior.
1969+
'--skip-pass=simplify-globals',
1970+
'--skip-pass=simplify-globals-optimizing',
1971+
1972+
# * Merging/folding code. When we do so, code identical in content
1973+
# but differing in metadata will end up with the metadata from one
1974+
# of the copies, which might be wrong (we follow LLVM here, see
1975+
# details in the passes).
1976+
# * CodeFolding merges code blocks inside functions.
1977+
'--skip-pass=code-folding',
1978+
# * DuplicateFunctionElimination merges functions.
1979+
'--skip-pass=duplicate-function-elimination',
1980+
1981+
# Some passes break the invariant in some cases, but we do not want
1982+
# to skip them entirely, as they have other things we need to fuzz.
1983+
# We add pass-args for them:
1984+
# * Do not fold inside OptimizeInstructions.
1985+
'--pass-arg=optimize-instructions-never-fold-or-reorder',
1986+
# * Do not unconditionalize code in RemoveUnusedBrs.
1987+
'--pass-arg=remove-unused-brs-never-unconditionalize',
1988+
1989+
] + get_random_opts() + FEATURE_OPTS
1990+
run(args)
1991+
1992+
# Add instrumentation, to see if any branch hints are wrong after
1993+
# optimizations. We must do this in a separate invocation from the
1994+
# optimizations due to flags like --converge (which would instrument
1995+
# multiple times).
1996+
final = wasm + '.final.wasm'
1997+
args = [
1998+
in_bin('wasm-opt'),
1999+
opted,
2000+
'-o', final,
2001+
'--instrument-branch-hints',
2002+
'-g',
2003+
] + FEATURE_OPTS
2004+
run(args)
2005+
2006+
# Run the final wasm.
2007+
out = run_bynterp(final, ['--fuzz-exec-before', '-all'])
2008+
2009+
# Preprocess the logging. We must discard all lines from functions that
2010+
# trap, because we are fuzzing branch hints, which are not an effect,
2011+
# and so they can be reordered with traps; consider this:
2012+
#
2013+
# (i32.add
2014+
# (block
2015+
# (if (X) (unreachable)
2016+
# (i32.const 10)
2017+
# )
2018+
# (block
2019+
# (@metadata.code.branch_hint "\00")
2020+
# (if (Y) (unreachable)
2021+
# (i32.const 20)
2022+
# )
2023+
# )
2024+
#
2025+
# It is ok to reorder traps, so the optimizer might flip the arms of
2026+
# this add (imagine other code inside the arms justified that). That
2027+
# reordering is fine since the branch hint has no effect that the
2028+
# optimizer needs to care about. However, after we instrument, there
2029+
# *is* an effect, the visible logging, so if X is true we trap and do
2030+
# not log a branch hint, but if we reorder, we do log, then trap.
2031+
#
2032+
# Note that this problem is specific to traps, because the optimizer can
2033+
# reorder them, and does not care about identity.
2034+
#
2035+
# To handle this, gather lines for each call, and then see which groups
2036+
# end in traps. (Initialize the list of groups with an empty group, for
2037+
# any logging before the first call.)
2038+
line_groups = [['before calls']]
2039+
for line in out.splitlines():
2040+
if line.startswith(FUZZ_EXEC_CALL_PREFIX):
2041+
line_groups.append([line])
2042+
else:
2043+
line_groups[-1].append(line)
2044+
2045+
# No bad hints should pop up after optimizations.
2046+
for group in line_groups:
2047+
if not group or group[-1] == '[trap unreachable]':
2048+
continue
2049+
for line in group:
2050+
if line.startswith(LOG_BRANCH_PREFIX):
2051+
_, _, id_, hint, actual = line[1:-1].split(' ')
2052+
hint = int(hint)
2053+
actual = int(actual)
2054+
assert hint in (0, 1)
2055+
# We do not care about the integer value of the condition,
2056+
# only if it was 0 or non-zero.
2057+
actual = (actual != 0)
2058+
assert hint == actual, 'Bad hint after optimizations'
2059+
2060+
18472061
# The global list of all test case handlers
18482062
testcase_handlers = [
18492063
FuzzExec(),
@@ -1859,6 +2073,7 @@ def get_relevant_lines(wat):
18592073
ClusterFuzz(),
18602074
Two(),
18612075
PreserveImportsExports(),
2076+
BranchHintPreservation(),
18622077
]
18632078

18642079

scripts/fuzz_shell.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,10 @@ var imports = {
353353
// how many time units to wait).
354354
});
355355
},
356+
357+
'log-branch': (id, expected, actual) => {
358+
console.log(`[LoggingExternalInterface log-branch ${id} ${expected} ${actual}]`);
359+
},
356360
},
357361
// Emscripten support.
358362
'env': {

src/passes/CodeFolding.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,10 @@ struct CodeFolding
249249
// run the rest of the optimization mormally.
250250
auto maybeAddBlock = [this](Block* block, Expression*& other) -> Block* {
251251
// If other is a suffix of the block, wrap it in a block.
252+
//
253+
// Note that we do not consider metadata here. Like LLVM, we ignore
254+
// metadata when trying to fold code together, preferring certain
255+
// optimization over possible benefits of profiling data.
252256
if (block->list.empty() ||
253257
!ExpressionAnalyzer::equal(other, block->list.back())) {
254258
return nullptr;

0 commit comments

Comments
 (0)