Skip to content

Commit ad8bc65

Browse files
authored
Ignore unreachable code in wasm binaries (#1122)
Ignoring unreachable code in wasm binaries lets us avoid corner cases with unstructured code in wasm binaries that is a poor fit for Binaryen's structured IR.
1 parent e5e7728 commit ad8bc65

22 files changed

+569
-423
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,10 @@ The differences between Binaryen IR and WebAssembly are:
3636

3737
* Binaryen IR [is an AST](https://github.com/WebAssembly/binaryen/issues/663), for convenience of optimization. This differs from the WebAssembly binary format which is a stack machine.
3838
* WebAssembly limits block/if/loop types to none and the concrete value types (i32, i64, f32, f64). Binaryen IR has an unreachable type, and it allows block/if/loop to take it, allowing [local transforms that don't need to know the global context](https://github.com/WebAssembly/binaryen/issues/903).
39-
* Binaryen IR's text format requires the names of blocks and loops to be unique. This differs from the WebAssembly s-expression format which allows duplicate names (and depends on scoping to disambiguate).
39+
* Binaryen IR requires the names of blocks and loops to be unique. (Reading wast files with duplicate names is supported, by disambiguating them).
40+
* Binaryen IR has only one node with a list: blocks. WebAssembly on the other hand allows lists in loops and ifs (Binaryen would represent those with additional blocks as necessary). The motivation here is that many passes need special code for iterating on lists, so having a single IR node with a list simplifies things.
41+
* Binaryen's text format allows only s-expressions. WebAssembly's official text format is a stack machine with s-expression extensions. Binaryen can't read stack machine code, but it can read a wast if it contains only s-expressions.
42+
* Binaryen ignores unreachable code when reading WebAssembly binaries. That means that if you read a wasm file with unreachable code, that code will be discarded as if it were optimized out (often this is what you want anyhow, and optimized programs have no unreachable code anyway, but if you write an unoptimized file and then read it, it may look different). The reason for this behavior is that unreachable code in WebAssembly has corner cases that are tricky to handle in Binaryen IR (it can be very unstructured, and Binaryen IR is more structured than WebAssembly as noted earlier). Note that Binaryen does support unreachable code in wast text files, since as we saw Binaryen only supports s-expressions there, which are structured.
4043

4144
As a result, you might notice that round-trip conversions (wasm => Binaryen IR => wasm) change code a little in some corner cases.
4245

src/wasm-binary.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -855,9 +855,15 @@ class WasmBinaryBuilder {
855855

856856
std::vector<Expression*> expressionStack;
857857

858+
bool definitelyUnreachable; // set when we know code is definitely unreachable. this helps parse
859+
// stacky wasm code, which can be unsuitable for our IR when unreachable
860+
858861
BinaryConsts::ASTNodes lastSeparator = BinaryConsts::End;
859862

863+
// process a block-type scope, until an end or else marker, or the end of the function
860864
void processExpressions();
865+
void skipUnreachableCode();
866+
861867
Expression* popExpression();
862868
Expression* popNonVoidExpression();
863869

src/wasm/wasm-binary.cpp

Lines changed: 60 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1807,28 +1807,81 @@ void WasmBinaryBuilder::readGlobals() {
18071807
}
18081808
}
18091809

1810-
void WasmBinaryBuilder::processExpressions() { // until an end or else marker, or the end of the function
1810+
void WasmBinaryBuilder::processExpressions() {
1811+
if (debug) std::cerr << "== processExpressions" << std::endl;
1812+
definitelyUnreachable = false;
18111813
while (1) {
18121814
Expression* curr;
18131815
auto ret = readExpression(curr);
18141816
if (!curr) {
18151817
lastSeparator = ret;
1818+
if (debug) std::cerr << "== processExpressions finished" << std::endl;
1819+
return;
1820+
}
1821+
expressionStack.push_back(curr);
1822+
if (curr->type == unreachable) {
1823+
// once we see something unreachable, we don't want to add anything else
1824+
// to the stack, as it could be stacky code that is non-representable in
1825+
// our AST. but we do need to skip it
1826+
// if there is nothing else here, just stop. otherwise, go into unreachable
1827+
// mode. peek to see what to do
1828+
if (pos == endOfFunction) {
1829+
throw ParseException("Reached function end without seeing End opcode");
1830+
}
1831+
auto peek = input[pos];
1832+
if (peek == BinaryConsts::End || peek == BinaryConsts::Else) {
1833+
if (debug) std::cerr << "== processExpressions finished with unreachable" << std::endl;
1834+
lastSeparator = BinaryConsts::ASTNodes(peek);
1835+
pos++;
1836+
return;
1837+
} else {
1838+
skipUnreachableCode();
1839+
return;
1840+
}
1841+
}
1842+
}
1843+
}
1844+
1845+
void WasmBinaryBuilder::skipUnreachableCode() {
1846+
if (debug) std::cerr << "== skipUnreachableCode" << std::endl;
1847+
// preserve the stack, and restore it. it contains the instruction that made us
1848+
// unreachable, and we can ignore anything after it. things after it may pop,
1849+
// we want to undo that
1850+
auto savedStack = expressionStack;
1851+
// clear the stack. nothing should be popped from there anyhow, just stuff
1852+
// can be pushed and then popped. Popping past the top of the stack will
1853+
// result in uneachables being returned
1854+
expressionStack.clear();
1855+
while (1) {
1856+
// set the definitelyUnreachable flag each time, as sub-blocks may set and unset it
1857+
definitelyUnreachable = true;
1858+
Expression* curr;
1859+
auto ret = readExpression(curr);
1860+
if (!curr) {
1861+
if (debug) std::cerr << "== skipUnreachableCode finished" << std::endl;
1862+
lastSeparator = ret;
1863+
definitelyUnreachable = false;
1864+
expressionStack = savedStack;
18161865
return;
18171866
}
18181867
expressionStack.push_back(curr);
18191868
}
18201869
}
18211870

18221871
Expression* WasmBinaryBuilder::popExpression() {
1872+
if (debug) std::cerr << "== popExpression" << std::endl;
18231873
if (expressionStack.empty()) {
1824-
throw ParseException("attempted pop from empty stack at " + std::to_string(pos));
1874+
if (definitelyUnreachable) {
1875+
// in unreachable code, trying to pop past the polymorphic stack
1876+
// area results in receiving unreachables
1877+
if (debug) std::cerr << "== popping unreachable from polymorphic stack" << std::endl;
1878+
return allocator.alloc<Unreachable>();
1879+
}
1880+
throw ParseException("attempted pop from empty stack / beyond block start boundary at " + std::to_string(pos));
18251881
}
1882+
// the stack is not empty, and we would not be going out of the current block
18261883
auto ret = expressionStack.back();
1827-
// to simulate the wasm polymorphic stack mode, leave a final
1828-
// unreachable, don't empty the stack in that case
1829-
if (!(expressionStack.size() == 1 && ret->type == unreachable)) {
1830-
expressionStack.pop_back();
1831-
}
1884+
expressionStack.pop_back();
18321885
return ret;
18331886
}
18341887

@@ -2222,7 +2275,6 @@ void WasmBinaryBuilder::visitBreak(Break *curr, uint8_t code) {
22222275
void WasmBinaryBuilder::visitSwitch(Switch *curr) {
22232276
if (debug) std::cerr << "zz node: Switch" << std::endl;
22242277
curr->condition = popNonVoidExpression();
2225-
22262278
auto numTargets = getU32LEB();
22272279
if (debug) std::cerr << "targets: "<< numTargets<<std::endl;
22282280
for (size_t i = 0; i < numTargets; i++) {

test/example/c-api-unused-mem.txt

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -67,19 +67,15 @@
6767
)
6868
(block $label$2
6969
(br $label$1)
70-
(unreachable)
7170
)
72-
(unreachable)
7371
)
7472
(block $label$3
7573
(block $label$4
7674
(block $label$5
7775
)
7876
(block $label$6
7977
(br $label$4)
80-
(unreachable)
8178
)
82-
(unreachable)
8379
)
8480
(block $label$7
8581
(block $label$8
@@ -88,18 +84,11 @@
8884
(get_local $var$0)
8985
)
9086
(return)
91-
(unreachable)
9287
)
9388
(unreachable)
94-
(unreachable)
9589
)
96-
(unreachable)
97-
(unreachable)
9890
)
99-
(unreachable)
100-
(unreachable)
10191
)
102-
(unreachable)
10392
)
10493
(func $__wasm_start (type $1)
10594
(block $label$0

test/fib-dbg.wasm.fromBinary

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,7 @@
7272
(return
7373
(get_local $var$1)
7474
)
75-
(unreachable)
7675
)
77-
(unreachable)
7876
)
7977
(func $stackSave (type $2) (result i32)
8078
(return
@@ -214,10 +212,7 @@
214212
(return
215213
(get_local $var$4)
216214
)
217-
(unreachable)
218-
(unreachable)
219215
)
220-
(unreachable)
221216
)
222217
(func $runPostSets (type $4)
223218
(local $var$0 i32)

test/min.wast.fromBinary

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@
4646
(br $label$0
4747
(i32.const 2)
4848
)
49-
(i32.const 0)
5049
)
5150
)
5251
(func $f1 (type $3) (param $var$0 i32) (param $var$1 i32) (param $var$2 i32) (result i32)

test/min.wast.fromBinary.noDebugInfo

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@
4646
(br $label$0
4747
(i32.const 2)
4848
)
49-
(i32.const 0)
5049
)
5150
)
5251
(func $3 (type $3) (param $var$0 i32) (param $var$1 i32) (param $var$2 i32) (result i32)

test/passes/flatten-control-flow.bin.txt

Lines changed: 1 addition & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -119,51 +119,7 @@
119119
(local $var$8 f64)
120120
(block $label$0
121121
(nop)
122-
(block
123-
(block
124-
(unreachable)
125-
)
126-
)
127-
(drop
128-
(f32.neg
129-
(get_local $var$1)
130-
)
131-
)
132-
(drop
133-
(f64.neg
134-
(get_local $var$2)
135-
)
136-
)
137-
(drop
138-
(i32.eqz
139-
(get_local $var$3)
140-
)
141-
)
142-
(drop
143-
(i32.eqz
144-
(get_local $var$4)
145-
)
146-
)
147-
(drop
148-
(f32.neg
149-
(get_local $var$7)
150-
)
151-
)
152-
(drop
153-
(i64.eqz
154-
(get_local $var$5)
155-
)
156-
)
157-
(drop
158-
(i64.eqz
159-
(get_local $var$6)
160-
)
161-
)
162-
(drop
163-
(f64.neg
164-
(get_local $var$8)
165-
)
166-
)
122+
(unreachable)
167123
)
168124
)
169125
(func $9 (type $9) (param $var$0 i64) (param $var$1 f32) (param $var$2 f64) (param $var$3 i32) (param $var$4 i32) (result f64)

test/polymorphic_stack.wast

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,21 @@
8686
)
8787
)
8888
)
89+
(func $unreachable-in-block-but-code-before (param $0 i32) (result i32)
90+
(if
91+
(get_local $0)
92+
(return
93+
(i32.const 127)
94+
)
95+
)
96+
(block $label$0 (result i32)
97+
(br_if $label$0
98+
(return
99+
(i32.const -32)
100+
)
101+
)
102+
)
103+
)
89104
(func $br_table_unreachable_to_also_unreachable (result i32)
90105
(block $a (result i32)
91106
(block $b

test/polymorphic_stack.wast.from-wast

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,21 @@
9090
)
9191
)
9292
)
93+
(func $unreachable-in-block-but-code-before (type $FUNCSIG$ii) (param $0 i32) (result i32)
94+
(if
95+
(get_local $0)
96+
(return
97+
(i32.const 127)
98+
)
99+
)
100+
(block $label$0 (result i32)
101+
(br_if $label$0
102+
(return
103+
(i32.const -32)
104+
)
105+
)
106+
)
107+
)
93108
(func $br_table_unreachable_to_also_unreachable (type $1) (result i32)
94109
(block $a (result i32)
95110
(block $b

0 commit comments

Comments
 (0)