Skip to content

Support cargo-driven wasm-bindgen linking#27179

Open
guybedford wants to merge 1 commit into
emscripten-core:mainfrom
guybedford:wasm-bindgen-rustc-driven
Open

Support cargo-driven wasm-bindgen linking#27179
guybedford wants to merge 1 commit into
emscripten-core:mainfrom
guybedford:wasm-bindgen-rustc-driven

Conversation

@guybedford

Copy link
Copy Markdown
Collaborator

This implements the cargo/rustc-driven wasm-bindgen flow, where cargo/rustc drives the build with emcc as the linker instead of building a staticlib and passing -sWASM_BINDGEN explicitly.

When rustc links via emcc, the linked wasm carries a __wasm_bindgen_emscripten_marker custom section and rustc supplies the exact -sEXPORTED_FUNCTIONS. We detect that marker in phase_post_link and run wasm-bindgen as a post-link step, the same way the -sWASM_BINDGEN staticlib flow does, with no export discovery.

The key correctness insight is that wasm exports are distinct from user exports. Everything rustc lists in EXPORTED_FUNCTIONS is a wasm export the generated glue reaches by name (the method shims, the __wbindgen_* runtime, the marker, main), not a user-facing export. With recent wasm-bindgen (0.2.126), the user-facing API is exactly what wasm-bindgen self-registers via its JS library. So we drop the rustc-supplied set from every user-export layer:

  • the ESM wrapper (WASM_ESM_INTEGRATION), via user_requested_exports
  • the factory Module attachment (MODULARIZE), via EXPORTED_FUNCTIONS / should_export
  • the keepalive wasm exports are never surfaced

Both output modes then expose only the clean API. For example a crate exporting a Greeter class yields:

// ESM integration
import init, { Greeter } from './bindgen_greeter.js';
await init();
new Greeter('Hello').greet('world'); // "Hello, world!"

// factory (MODULARIZE)
import Module from './bindgen_greeter.js';
const m = await Module();
new m.Greeter('Hello').greet('world'); // "Hello, world!"

with no __wbindgen_*, method shims, marker, or main leaking into the user-facing surface.

Other pieces of the flow:

  • Strip the placeholder symbols wasm-bindgen consumes (__wbindgen_describe*, __externref_*, ...) from EXPORTED_FUNCTIONS/USER_EXPORTS so they aren't reported as undefined exports.
  • Wire imported JS: feed library_bindgen.extern-pre.js as extern-pre-js and copy the snippets/ dir next to the output so relative imports resolve.
  • Forward the JS library symbols that get a top-level export (MODULARIZE=instance) so the WASM_ESM_INTEGRATION wrapper re-exports them.
  • Under WASM_ESM_INTEGRATION && WASM_BINDGEN, provide wasmExports via a namespace import of the wasm so the glue's by-name access works.

Test coverage adds an end-to-end test (test/rust/bindgen_greeter) parameterized over both the ESM and factory output modes, building the crate via cargo build (emcc as linker, marker auto-detected) and verifying in Node that the clean class works and that no raw wasm exports leak. CI installs a pinned wasm-bindgen-cli alongside rust (the two are always installed together) so the flow is always exercised, with a requires_wasm_bindgen skip for local runs.

This PR was made with AI assistance, under my review.

@guybedford guybedford force-pushed the wasm-bindgen-rustc-driven branch from 15227f6 to 7017b20 Compare June 24, 2026 23:23
Comment thread src/jsifier.mjs Outdated

// JS library symbols emitted with a top-level `export` (MODULARIZE=instance),
// forwarded so the WASM_ESM_INTEGRATION wrapper can re-export them.
const exportedLibrarySymbols = [];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope we could avoid creating yet another type of symbol list in emscripten.

The caller of the JS compiler already has the EXPORTED_FUNCTIONS list, and the list of all JS symbols (librarySymbols) so maybe we don't need this new list?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — dropped the separate list. The exported JS library symbols are now derived in the caller from the existing librarySymbolsEXPORTED_FUNCTIONS. The one wrinkle: wasm-bindgen self-registers its exports via EXPORTED_FUNCTIONS.add(...) on the JS-compiler side, and those additions don't otherwise round-trip to Python, so jsifier now forwards the final EXPORTED_FUNCTIONS (instead of a bespoke list) and the caller intersects it with librarySymbols.

Comment thread src/postamble.js
// wasm-bindgen's glue reaches the exports by name off a `wasmExports` object,
// so provide the aggregate via a namespace import. Emscripten's own named
// imports are unaffected and remain tree-shakable.
import * as wasmExports from './{{{ WASM_BINARY_FILE }}}';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other places in codebase that also use wasmExports.. maybe we should remove the && WASM_BINDGEN so that they start working under WASM_ESM_INTEGRATION?

Actually I wonder how those codepaths work today? For example registerTLSInit(wasmExports['_emscripten_tls_init']);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — dropped the && WASM_BINDGEN so the namespace import applies to all WASM_ESM_INTEGRATION builds. It's safe because every wasmExports = ... assignment lives under #if !WASM_ESM_INTEGRATION (inside createWasm) or the non-instance branch, and WASM_ESM_INTEGRATION requires MODULARIZE=instance — so nothing assigns to the now read-only binding. Verified a plain C++ ESM build still builds and runs.

Re your TLS question: under ESM integration registerTLSInit already uses the named import (registerTLSInit(__emscripten_tls_init)) rather than wasmExports['_emscripten_tls_init'], which is why that path worked. This change makes the generic by-name wasmExports[...] accesses work under ESM integration too.

Comment thread tools/building.py
# section so emcc, when used as the linker (e.g. by cargo/rustc), knows to run
# wasm-bindgen as a post-link step.
with webassembly.Module(wasm_file) as module:
return module.get_custom_section('__wasm_bindgen_emscripten_marker') is not None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have the rust compiler pass a flag rather than having magic behaviour like this? Maybe -sWASM_BINDGEN=auto (or some better name)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to a flag. For rustc users it would be possible to pass as a -Clink-arg=-s... then. -sWASM_BINDGEN currently means a C/C++ build with Rust being linked in. We'd need to think more about if we could fully unify with that path, since right now it's a different type of bindgen path. Maybe it can be unified I'm not sure.

-sWASM_BINDGEN_RUST or something that makes it clear Rust is in charge of the top-level entry point bindings might make sense though.

Not sure what the best term is here either!

@guybedford guybedford force-pushed the wasm-bindgen-rustc-driven branch from 7017b20 to 099c7c5 Compare June 25, 2026 00:31
When cargo/rustc drives the build with emcc as the linker, the linked wasm
carries a __wasm_bindgen_emscripten_marker custom section and rustc supplies
the exact -sEXPORTED_FUNCTIONS. Detect that marker in phase_post_link and run
wasm-bindgen as a post-link step, the same way the -sWASM_BINDGEN staticlib
flow does, without any export discovery.

This defines two clearly-separated modes:

- C++-driven (-sWASM_BINDGEN set): the user owns EXPORTED_FUNCTIONS; their
  exports are left untouched.
- marker-driven (set not passed, marker detected): rustc's EXPORTED_FUNCTIONS is
  the raw wasm export set the generated glue reaches by name (the method shims,
  the __wbindgen_* runtime, the marker, main), not a user-facing API. The
  user-facing API is exactly what wasm-bindgen self-registers via its library
  (wasm-bindgen 0.2.126), so the rustc-supplied set is dropped from every
  user-export layer: the ESM wrapper (WASM_ESM_INTEGRATION, via
  user_requested_exports) and the factory Module attachment (MODULARIZE, via
  EXPORTED_FUNCTIONS / should_export), and the keepalive wasm exports are not
  surfaced. main still runs automatically on init, matching the emscripten
  idiom, even though _main is not exported.

Both output modes then expose only the clean API (e.g. a `Greeter` class).

- Strip the placeholder symbols wasm-bindgen consumes (__wbindgen_describe*,
  __externref_*, ...) from EXPORTED_FUNCTIONS so they aren't reported as
  undefined exports.
- Wire imported JS: feed library_bindgen.extern-pre.js as extern-pre-js and
  copy the snippets/ dir next to the output so relative imports resolve.
- The WASM_ESM_INTEGRATION wrapper re-exports the JS library symbols that were
  exported (MODULARIZE=instance), derived from the JS compiler's librarySymbols
  and EXPORTED_FUNCTIONS.
- Under WASM_ESM_INTEGRATION, provide wasmExports via a namespace import of the
  wasm so by-name export access (e.g. wasm-bindgen's glue) works.

Add an end-to-end test (test/rust/bindgen_greeter) parameterized over the ESM
and factory output modes, and install a pinned wasm-bindgen-cli alongside rust
in CI so the flow is always exercised.
@guybedford guybedford force-pushed the wasm-bindgen-rustc-driven branch from 099c7c5 to bcade0d Compare June 25, 2026 19:00
@guybedford guybedford changed the title Run wasm-bindgen when its marker is present (cargo/rustc-driven flow) Support cargo-driven wasm-bindgen linking Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants