Skip to content

Commit 4a330e5

Browse files
authored
Support names field in source maps (#25870)
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes #20715 and closes #25116.
1 parent afd9bc1 commit 4a330e5

File tree

5 files changed

+319
-41
lines changed

5 files changed

+319
-41
lines changed

ChangeLog.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ See docs/process.md for more on how version tagging works.
2020

2121
4.0.22 (in development)
2222
-----------------------
23+
- Source maps now support 'names' field with function name information.
24+
emsymbolizer will show function names when used with a source map. The size
25+
of source maps may increase 2-3x and the link time can increase slightly due
26+
to more processing on source map creation. (#25870)
2327
- The minimum version of python required to run emscripten was updated from 3.8
2428
to 3.10. (#25891)
2529

test/core/test_dwarf.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#include <emscripten.h>
2+
3+
EM_JS(int, out_to_js, (int x), {})
4+
5+
class MyClass {
6+
public:
7+
void foo();
8+
void bar();
9+
};
10+
11+
void __attribute__((noinline)) MyClass::foo() {
12+
out_to_js(0); // line 12
13+
out_to_js(1);
14+
out_to_js(2);
15+
}
16+
17+
void __attribute__((always_inline)) MyClass::bar() {
18+
out_to_js(3);
19+
__builtin_trap(); // line 19
20+
}
21+
22+
int main() {
23+
MyClass mc;
24+
mc.foo();
25+
mc.bar();
26+
}

test/test_other.py

Lines changed: 63 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -9629,12 +9629,49 @@ def check_dwarf_loc_info(address, funcs, locs):
96299629
for loc in locs:
96309630
self.assertIn(loc, out)
96319631

9632-
def check_source_map_loc_info(address, loc):
9632+
def check_source_map_loc_info(address, func, loc):
96339633
out = self.run_process(
96349634
[emsymbolizer, '-s', 'sourcemap', 'test_dwarf.wasm', address],
96359635
stdout=PIPE).stdout
9636+
self.assertIn(func, out)
96369637
self.assertIn(loc, out)
96379638

9639+
def do_tests(src):
9640+
# 1. Test DWARF + source map together
9641+
# For DWARF, we check for the full inlined info for both function names and
9642+
# source locations. Source maps does not provide inlined info. So we only
9643+
# check for the info of the outermost function.
9644+
self.run_process([EMCC, test_file(src), '-g', '-gsource-map', '-O1', '-o',
9645+
'test_dwarf.js'])
9646+
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9647+
out_to_js_call_loc)
9648+
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_func[0],
9649+
out_to_js_call_loc[0])
9650+
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9651+
# Source map shows the original (inlined) source location with the original
9652+
# function name
9653+
check_source_map_loc_info(unreachable_addr, unreachable_func[0],
9654+
unreachable_loc[0])
9655+
9656+
# 2. Test source map only
9657+
# The addresses, function names, and source locations are the same across
9658+
# the builds because they are relative offsets from the code section, so we
9659+
# don't need to recompute them
9660+
self.run_process([EMCC, test_file(src), '-gsource-map', '-O1', '-o',
9661+
'test_dwarf.js'])
9662+
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_func[0],
9663+
out_to_js_call_loc[0])
9664+
check_source_map_loc_info(unreachable_addr, unreachable_func[0],
9665+
unreachable_loc[0])
9666+
9667+
# 3. Test DWARF only
9668+
self.run_process([EMCC, test_file(src), '-g', '-O1', '-o',
9669+
'test_dwarf.js'])
9670+
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9671+
out_to_js_call_loc)
9672+
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9673+
9674+
# -- C program test --
96389675
# We test two locations within test_dwarf.c:
96399676
# out_to_js(0); // line 6
96409677
# __builtin_trap(); // line 13
@@ -9657,31 +9694,32 @@ def check_source_map_loc_info(address, loc):
96579694
# The first one corresponds to the innermost inlined location.
96589695
unreachable_loc = ['test_dwarf.c:13:3', 'test_dwarf.c:18:3']
96599696

9660-
# 1. Test DWARF + source map together
9661-
# For DWARF, we check for the full inlined info for both function names and
9662-
# source locations. Source maps provide neither function names nor inlined
9663-
# info. So we only check for the source location of the outermost function.
9664-
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9665-
out_to_js_call_loc)
9666-
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
9667-
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9668-
check_source_map_loc_info(unreachable_addr, unreachable_loc[0])
9669-
9670-
# 2. Test source map only
9671-
# The addresses, function names, and source locations are the same across
9672-
# the builds because they are relative offsets from the code section, so we
9673-
# don't need to recompute them
9674-
self.run_process([EMCC, test_file('core/test_dwarf.c'),
9675-
'-gsource-map', '-O1', '-o', 'test_dwarf.js'])
9676-
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
9677-
check_source_map_loc_info(unreachable_addr, unreachable_loc[0])
9697+
do_tests('core/test_dwarf.c')
96789698

9679-
# 3. Test DWARF only
9680-
self.run_process([EMCC, test_file('core/test_dwarf.c'),
9681-
'-g', '-O1', '-o', 'test_dwarf.js'])
9682-
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9683-
out_to_js_call_loc)
9684-
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9699+
# -- C++ program test --
9700+
# We test two locations within test_dwarf.cpp:
9701+
# out_to_js(0); // line 12
9702+
# __builtin_trap(); // line 19
9703+
self.run_process([EMCC, test_file('core/test_dwarf.cpp'),
9704+
'-g', '-gsource-map', '-O1', '-o', 'test_dwarf.js'])
9705+
# Address of out_to_js(0) within MyClass::foo(), uninlined
9706+
out_to_js_call_addr = self.get_instr_addr('call\t0', 'test_dwarf.wasm')
9707+
# Address of __builtin_trap() within MyClass::bar(), inlined into main()
9708+
unreachable_addr = self.get_instr_addr('unreachable', 'test_dwarf.wasm')
9709+
9710+
# Function name of out_to_js(0) within MyClass::foo(), uninlined
9711+
out_to_js_call_func = ['MyClass::foo()']
9712+
# Function names of __builtin_trap() within MyClass::bar(), inlined into
9713+
# main(). The first one corresponds to the innermost inlined function.
9714+
unreachable_func = ['MyClass::bar()', 'main']
9715+
9716+
# Source location of out_to_js(0) within MyClass::foo(), uninlined
9717+
out_to_js_call_loc = ['test_dwarf.cpp:12:3']
9718+
# Source locations of __builtin_trap() within MyClass::bar(), inlined into
9719+
# main(). The first one corresponds to the innermost inlined location.
9720+
unreachable_loc = ['test_dwarf.cpp:19:3', 'test_dwarf.cpp:25:6']
9721+
9722+
do_tests('core/test_dwarf.cpp')
96859723

96869724
def test_emsymbolizer_functions(self):
96879725
'Test emsymbolizer use cases that only provide function-granularity info'

tools/emsymbolizer.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ class Location:
117117
def __init__(self):
118118
self.version = None
119119
self.sources = []
120+
self.funcs = []
120121
self.mappings = {}
121122
self.offsets = []
122123

@@ -128,6 +129,7 @@ def parse(self, filename):
128129

129130
self.version = source_map_json['version']
130131
self.sources = source_map_json['sources']
132+
self.funcs = source_map_json['names']
131133

132134
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
133135
vlq_map = {c: i for i, c in enumerate(chars)}
@@ -155,6 +157,7 @@ def decodeVLQ(string):
155157
src = 0
156158
line = 1
157159
col = 1
160+
func = 0
158161
for segment in source_map_json['mappings'].split(','):
159162
data = decodeVLQ(segment)
160163
info = []
@@ -169,7 +172,9 @@ def decodeVLQ(string):
169172
if len(data) >= 4:
170173
col += data[3]
171174
info.append(col)
172-
# TODO: see if we need the name, which is the next field (data[4])
175+
if len(data) == 5:
176+
func += data[4]
177+
info.append(func)
173178

174179
self.mappings[offset] = WasmSourceMap.Location(*info)
175180
self.offsets.append(offset)
@@ -207,6 +212,7 @@ def lookup(self, offset, lower_bound=None):
207212
self.sources[info.source] if info.source is not None else None,
208213
info.line,
209214
info.column,
215+
self.funcs[info.func] if info.func is not None else None,
210216
)
211217

212218

0 commit comments

Comments
 (0)