Support cargo-driven wasm-bindgen linking#27179
Conversation
15227f6 to
7017b20
Compare
|
|
||
| // JS library symbols emitted with a top-level `export` (MODULARIZE=instance), | ||
| // forwarded so the WASM_ESM_INTEGRATION wrapper can re-export them. | ||
| const exportedLibrarySymbols = []; |
There was a problem hiding this comment.
I would hope we could avoid creating yet another type of symbol list in emscripten.
The caller of the JS compiler already has the EXPORTED_FUNCTIONS list, and the list of all JS symbols (librarySymbols) so maybe we don't need this new list?
There was a problem hiding this comment.
Good call — dropped the separate list. The exported JS library symbols are now derived in the caller from the existing librarySymbols ∩ EXPORTED_FUNCTIONS. The one wrinkle: wasm-bindgen self-registers its exports via EXPORTED_FUNCTIONS.add(...) on the JS-compiler side, and those additions don't otherwise round-trip to Python, so jsifier now forwards the final EXPORTED_FUNCTIONS (instead of a bespoke list) and the caller intersects it with librarySymbols.
| // wasm-bindgen's glue reaches the exports by name off a `wasmExports` object, | ||
| // so provide the aggregate via a namespace import. Emscripten's own named | ||
| // imports are unaffected and remain tree-shakable. | ||
| import * as wasmExports from './{{{ WASM_BINARY_FILE }}}'; |
There was a problem hiding this comment.
There are other places in codebase that also use wasmExports.. maybe we should remove the && WASM_BINDGEN so that they start working under WASM_ESM_INTEGRATION?
Actually I wonder how those codepaths work today? For example registerTLSInit(wasmExports['_emscripten_tls_init']);
There was a problem hiding this comment.
Done — dropped the && WASM_BINDGEN so the namespace import applies to all WASM_ESM_INTEGRATION builds. It's safe because every wasmExports = ... assignment lives under #if !WASM_ESM_INTEGRATION (inside createWasm) or the non-instance branch, and WASM_ESM_INTEGRATION requires MODULARIZE=instance — so nothing assigns to the now read-only binding. Verified a plain C++ ESM build still builds and runs.
Re your TLS question: under ESM integration registerTLSInit already uses the named import (registerTLSInit(__emscripten_tls_init)) rather than wasmExports['_emscripten_tls_init'], which is why that path worked. This change makes the generic by-name wasmExports[...] accesses work under ESM integration too.
| # section so emcc, when used as the linker (e.g. by cargo/rustc), knows to run | ||
| # wasm-bindgen as a post-link step. | ||
| with webassembly.Module(wasm_file) as module: | ||
| return module.get_custom_section('__wasm_bindgen_emscripten_marker') is not None |
There was a problem hiding this comment.
Can we just have the rust compiler pass a flag rather than having magic behaviour like this? Maybe -sWASM_BINDGEN=auto (or some better name)?
There was a problem hiding this comment.
I'm open to a flag. For rustc users it would be possible to pass as a -Clink-arg=-s... then. -sWASM_BINDGEN currently means a C/C++ build with Rust being linked in. We'd need to think more about if we could fully unify with that path, since right now it's a different type of bindgen path. Maybe it can be unified I'm not sure.
-sWASM_BINDGEN_RUST or something that makes it clear Rust is in charge of the top-level entry point bindings might make sense though.
Not sure what the best term is here either!
7017b20 to
099c7c5
Compare
When cargo/rustc drives the build with emcc as the linker, the linked wasm carries a __wasm_bindgen_emscripten_marker custom section and rustc supplies the exact -sEXPORTED_FUNCTIONS. Detect that marker in phase_post_link and run wasm-bindgen as a post-link step, the same way the -sWASM_BINDGEN staticlib flow does, without any export discovery. This defines two clearly-separated modes: - C++-driven (-sWASM_BINDGEN set): the user owns EXPORTED_FUNCTIONS; their exports are left untouched. - marker-driven (set not passed, marker detected): rustc's EXPORTED_FUNCTIONS is the raw wasm export set the generated glue reaches by name (the method shims, the __wbindgen_* runtime, the marker, main), not a user-facing API. The user-facing API is exactly what wasm-bindgen self-registers via its library (wasm-bindgen 0.2.126), so the rustc-supplied set is dropped from every user-export layer: the ESM wrapper (WASM_ESM_INTEGRATION, via user_requested_exports) and the factory Module attachment (MODULARIZE, via EXPORTED_FUNCTIONS / should_export), and the keepalive wasm exports are not surfaced. main still runs automatically on init, matching the emscripten idiom, even though _main is not exported. Both output modes then expose only the clean API (e.g. a `Greeter` class). - Strip the placeholder symbols wasm-bindgen consumes (__wbindgen_describe*, __externref_*, ...) from EXPORTED_FUNCTIONS so they aren't reported as undefined exports. - Wire imported JS: feed library_bindgen.extern-pre.js as extern-pre-js and copy the snippets/ dir next to the output so relative imports resolve. - The WASM_ESM_INTEGRATION wrapper re-exports the JS library symbols that were exported (MODULARIZE=instance), derived from the JS compiler's librarySymbols and EXPORTED_FUNCTIONS. - Under WASM_ESM_INTEGRATION, provide wasmExports via a namespace import of the wasm so by-name export access (e.g. wasm-bindgen's glue) works. Add an end-to-end test (test/rust/bindgen_greeter) parameterized over the ESM and factory output modes, and install a pinned wasm-bindgen-cli alongside rust in CI so the flow is always exercised.
099c7c5 to
bcade0d
Compare
This implements the cargo/rustc-driven wasm-bindgen flow, where cargo/rustc drives the build with emcc as the linker instead of building a staticlib and passing
-sWASM_BINDGENexplicitly.When rustc links via emcc, the linked wasm carries a
__wasm_bindgen_emscripten_markercustom section and rustc supplies the exact-sEXPORTED_FUNCTIONS. We detect that marker inphase_post_linkand run wasm-bindgen as a post-link step, the same way the-sWASM_BINDGENstaticlib flow does, with no export discovery.The key correctness insight is that wasm exports are distinct from user exports. Everything rustc lists in
EXPORTED_FUNCTIONSis a wasm export the generated glue reaches by name (the method shims, the__wbindgen_*runtime, the marker,main), not a user-facing export. With recent wasm-bindgen (0.2.126), the user-facing API is exactly what wasm-bindgen self-registers via its JS library. So we drop the rustc-supplied set from every user-export layer:WASM_ESM_INTEGRATION), viauser_requested_exportsModuleattachment (MODULARIZE), viaEXPORTED_FUNCTIONS/should_exportBoth output modes then expose only the clean API. For example a crate exporting a
Greeterclass yields:with no
__wbindgen_*, method shims, marker, ormainleaking into the user-facing surface.Other pieces of the flow:
__wbindgen_describe*,__externref_*, ...) fromEXPORTED_FUNCTIONS/USER_EXPORTSso they aren't reported as undefined exports.library_bindgen.extern-pre.jsas extern-pre-js and copy thesnippets/dir next to the output so relative imports resolve.MODULARIZE=instance) so theWASM_ESM_INTEGRATIONwrapper re-exports them.WASM_ESM_INTEGRATION && WASM_BINDGEN, providewasmExportsvia a namespace import of the wasm so the glue's by-name access works.Test coverage adds an end-to-end test (
test/rust/bindgen_greeter) parameterized over both the ESM and factory output modes, building the crate viacargo build(emcc as linker, marker auto-detected) and verifying in Node that the clean class works and that no raw wasm exports leak. CI installs a pinnedwasm-bindgen-clialongside rust (the two are always installed together) so the flow is always exercised, with arequires_wasm_bindgenskip for local runs.This PR was made with AI assistance, under my review.