Skip to content

llvm-objdump gives wrong line numbers for WebAssembly #129523

@stevenwdv

Description

@stevenwdv

This issue was previously filed as emscripten-core/emscripten#23717.

llvm-objdump gives wrong line info for a simple WebAssembly file.

Steps to reproduce

  • Create a simple main.cpp:
int main() { return 42; }
  • Now compile with debug symbols:
em++ -g main.cpp
Verbose output
 "/home/swdv/emsdk/upstream/bin/clang++" -target wasm64-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/home/swdv/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -g3 -DNO_USE_MYFUN -v -c main.cpp -o /tmp/emscripten_temp_pe2lfvyf/main_0.o
clang version 21.0.0git (https:/github.com/llvm/llvm-project 6dc41a639334b913e762f65410fcd14a722b137f)
Target: wasm64-unknown-emscripten
Thread model: posix
InstalledDir: /home/swdv/emsdk/upstream/bin
 (in-process)
 "/home/swdv/emsdk/upstream/bin/clang-21" -cc1 -triple wasm64-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name main.cpp -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility=hidden -debug-info-kind=constructor -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/home/swdv/Downloads/plainwasmtest -v -fcoverage-compilation-dir=/home/swdv/Downloads/plainwasmtest -resource-dir /home/swdv/emsdk/upstream/lib/clang/21 -D EMSCRIPTEN -D NO_USE_MYFUN -isysroot /home/swdv/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten/c++/v1 -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1 -internal-isystem /home/swdv/emsdk/upstream/lib/clang/21/include -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include -fdeprecated-macro -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_pe2lfvyf/main_0.o -x c++ main.cpp
clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten/c++/v1"
ignoring nonexistent directory "/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten"
#include "..." search starts here:
#include <...> search starts here:
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/compat
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1
 /home/swdv/emsdk/upstream/lib/clang/21/include
 /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
 /home/swdv/emsdk/upstream/bin/clang --version
 /home/swdv/emsdk/upstream/bin/wasm-ld -o hello.wasm /tmp/emscripten_temp_pe2lfvyf/main_0.o -L/home/swdv/emsdk/upstream/emscripten/cache/sysroot/lib/wasm64-emscripten -L/home/swdv/emsdk/upstream/emscripten/src/lib -lGL-getprocaddr -lal -lhtml5 -lstubs-debug -lnoexit -lc-debug -ldlmalloc-debug -lcompiler_rt -lc++-noexcept -lc++abi-debug-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -mwasm64 /tmp/tmp5u5b29eklibemscripten_js_symbols.so --export=emscripten_stack_get_end --export=emscripten_stack_get_free --export=emscripten_stack_get_base --export=emscripten_stack_get_current --export=emscripten_stack_init --export=_emscripten_stack_alloc --export=__wasm_call_ctors --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-if-defined=fflush --export-table -z stack-size=65536 --no-growable-memory --initial-heap=16777216 --no-entry --stack-first --table-base=1
 /home/swdv/emsdk/upstream/bin/llvm-objcopy hello.wasm hello.wasm --remove-section=producers
 /home/swdv/emsdk/node/20.18.0_64bit/bin/node /home/swdv/emsdk/upstream/emscripten/src/compiler.mjs /tmp/tmp3fupbzr6.json
 /home/swdv/emsdk/node/20.18.0_64bit/bin/node /home/swdv/emsdk/upstream/emscripten/tools/preprocessor.mjs /tmp/emscripten_temp_pe2lfvyf/settings.js shell.html
  • Now disassemble the main function:
~/emsdk/upstream/bin/llvm-objdump --disassemble-symbols=__original_main --line-numbers a.out.wasm
  • Observe how the line numbers and file are completely incorrect, mentioning fflush.c instead of our main.cpp:
a.out.wasm:	file format wasm

Disassembly of section CODE:

0000017c <__original_main>:
        .local i32, i32, i32, i32, i32, i32, i32
; __original_main():
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:17
     180: 23 80 80 80 80 00    	global.get	0
     186: 21 00        	local.set	0
     188: 41 10        	i32.const	16
     18a: 21 01        	local.set	1
     18c: 20 00        	local.get	0
     18e: 20 01        	local.get	1
     190: 6b           	i32.sub 
     191: 21 02        	local.set	2
     193: 41 00        	i32.const	0
     195: 21 03        	local.set	3
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:18
     197: 20 02        	local.get	2
     199: 20 03        	local.get	3
     19b: 36 02 0c     	i32.store	12
     19e: 41 8d 21     	i32.const	4237
     1a1: 21 04        	local.set	4
     1a3: 41 15        	i32.const	21
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:15
     1a5: 21 05        	local.set	5
     1a7: 20 04        	local.get	4
     1a9: 20 05        	local.get	5
     1ab: 36 02 00     	i32.store	0
     1ae: 41 2a        	i32.const	42
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:20
     1b0: 21 06        	local.set	6
     1b2: 20 06        	local.get	6
     1b4: 0f           	return
     1b5: 0b           	end

Version of emscripten/emsdk

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 4.0.3 (a9651ff57165f5710bb09a5fe52590fd6ddb72df)
clang version 21.0.0git (https:/github.com/llvm/llvm-project 6dc41a639334b913e762f65410fcd14a722b137f)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/swdv/emsdk/upstream/bin

More findings from emscripten-core/emscripten#23717

@kripken emscripten-core/emscripten#23717 (comment):

[...] llvm-dwarfdump gives proper output.

@dschuff emscripten-core/emscripten#23717 (comment):

So this problem has to do with the way LLVM handles symbols for linked wasm files and debug info. Specifically, symbol addresses in DWARF are always encoded as offsets in the code section, whereas for linked files, LLVM uses the offset in the file as the address for a function (this is to match how engines print code addresses in backtraces). See some changes (and llvm/llvm-project#76198) I made to implement this about a year ago in LLVM. So if you use e.g. llvm-objdump to print symbol addresses, they will match what browser backtraces show, but not match what you see if you use llvm-dwarfdump to look at the debug info, and llvm-symbolizer will not get the right answer. I think the same mechanism in LLVM that causes the latter problem is what is happening when llvm-objdump is looking up line information from the debug info during disassembly (despite the fact that it's correctly finding the right code address when you ask it to disassemble a symbol by name).

So this is an unfortunate mismatch and not everything works right, as you have seen. Emscripten has a tool emsymbolizer that knows a bunch of ways emscripten can store name/address information (e.g. DWARF, source maps, name sections) and can symbolize addresses. It papers over this problem using the --adjust-vma flag of llvm-symbolizer, but it currently only supports the use case of looking up a name or line from an address one at a time.

We might be able to improve this situation. Adjusting how symbols are represented in LLVM is tricky, since they are used in various places in assembly, linking, etc. Ideally we also wouldn't need a bunch of special hacks in the tools such as llvm-objdump (although I wouldn't necessarily be above some kind of special case if it wasn't too horrible). [...]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions