From 8f8b3e1a70e78def6fb55d0bb72e77955b44d609 Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Tue, 4 Jun 2024 16:21:20 +0200 Subject: [PATCH 01/10] doc: add synchronous hooks proposal --- doc/design/proposal-synchronous-hooks.md | 256 +++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 doc/design/proposal-synchronous-hooks.md diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md new file mode 100644 index 0000000..a82b3d7 --- /dev/null +++ b/doc/design/proposal-synchronous-hooks.md @@ -0,0 +1,256 @@ +# Universal, synchronous and in-thread loader hooks + +For background and motivation, see https://github.com/nodejs/node/issues/52219 + +TL;DR: the top priority of this proposal is to allow sun-setting CJS loader monkey patching as non-breakingly as possible (which is why it needs to be synchronous and in-thread because that's how the CJS loader and `require()` works). Then it's API consistency with the existing `module.register()` off-thread hooks. + +Existing users of CJS loader monkey patching to look into: + +- pirates (used by Babel and nyc) +- require-in-the-middle (used by many tracing agents) +- yarn pnp +- tsx +- ts-node + +## API design + +A high-level overview of the API. + +```js +const { addHooks, removeHooks } = require('module'); + +function resolve(specifier, context, nextResolve) { + const resolved = nextResolve(specifier, context); + return { url: resolved.url.replaceAll(/foo/g, 'bar'); } +} + +function load(url, context, nextLoad) { + const loaded = nextLoad(specifier, context); + return { source: loaded.source.toString().replaceAll(/foo/g, 'bar'); } +} + +// id is a symbol +const id = addHooks({ resolve, load, ... }); +removeHooks(id); +``` + +1. The names `addHooks` and `removeHooks` take inspiration from pirates. Can be changed to other better names. +2. An alternative design to remove the hooks could be `addHooks(...).unhook()`, in this case `addHooks()` returns an object that could have other methods. + 1. This may allow third-party hooks to query itself and avoid double-registering. Though this functionality is probably out of scope of the MVP. +3. It seems useful to allow the results returned by the hooks to be partial i.e. hooks don't have to clone the result returned by the next (default) hook to override it, instead they only need to return an object with properties that they wish to override. This can save the overhead of excessive clones. +4. It seems `shortCircuit` is not really necessary if hooks can just choose to not call the next hook? + +## Hooks + +### `resolve`: from specifier to url + +```js +/** + * @typedef {{ + * parentURL?: string, + * conditions?: string[], + * importAttributes?: object[] + * }} ModuleResolveContext + */ +/** + * @typedef {{ + * url?: string, + * format?: string + * }} ModuleResolveResult + */ +/** + * @param {string} specifier + * @param {ModuleResolveContext} context + * @param {(specifier: string, context: ModuleResolveContext) => ModuleResolveResult} nextResolve + * @returns {ModuleResolveResult} + */ +function resolve(specifier, context, nextResolve) { + if (shouldOverride(specifier, context.parentURL)) { + const url = customResolve(specifier, context.parentURL); + return { url }; + } + + const resolved = nextResolve(specifier, context); + if (resolved.url.endsWith('.zip')) { + return { format: 'zip' }; + } + return {}; // no override +} +``` + +Notes: + +1. Example use case: yarn pnp ([esm hooks](https://github.com/yarnpkg/berry/blob/master/packages/yarnpkg-pnp/sources/esm-loader/hooks/resolve.ts), [cjs hooks](https://github.com/yarnpkg/berry/blob/master/packages/yarnpkg-pnp/sources/loader/applyPatch.ts)) +2. `importAttributes` are only available when the module request is initiated with `import`. +3. For the CJS loader, `Module._cache` is keyed using file names, and it's used in the wild (usually in the form of `require.cache`) so changing it to key on URL would be breaking. We'll need to figure out another way to map it with an url. + 1. It might be non-breaking to maintain an additional map on the side for mapping the same url with different searches and hashes to a different module instance, or leave `Module._cache` as an alias map for the first module instance mapped by the URL minus search/hash. If in the same realm, `Module._cache` gets directly accessed from the outside, we can emit a warning about the incompatibility. Down the road this may require making `Module._cache` a proxy to propagate delete requests to the actual URL-based map. + 2. Changing CJS modules to be mapped with URL will have performance consequences from but might also help with the hot module replacement use case. Note that for parent packages with `exports` or `imports` in their `package.json` the URL conversion is already done (and not even cached) even in the CJS loader. + +### `load`: from url to source code + +```js +/** + * @typedef {{ + * format?: string, + * conditions?: string[], + * importAttributes?: object[] + * }} ModuleLoadContext + */ +/** + * @typedef {{ +* format?: string, +* source: string | Buffer +* }} ModuleLoadResult +*/ +/** + * @param {string} url + * @param {ModuleLoadContext} context + * @param {(context: ModuleLoadContext) => {ModuleLoadResult}} nextLoad + * @returns {ModuleLoadResult} + */ +function load(url, context, nextLoad) { + const loaded = nextLoad(context); + const { source: rawSource, format } = loaded; + if (url.endsWith('.ts')) { + const transpiled = ts.transpileModule(rawSource, { + compilerOptions: { module: ts.ModuleKind.NodeNext } + }); + + return { + format: 'commonjs', + source: transpiled.outputText, + }; + } + + return {}; +} +``` + +Notes: + +1. `context.format` is only present when the format is already determined by Node.js or a previous hook +2. It seems useful for the default load to always return a buffer, or add an option to `context` for the default hook to load it as a buffer, in case the resolved file point to a binary file (e.g. a zip file, a wasm, an addon). For the ESM loader it's (almost?) always a buffer. For CJS loader some changes are needed to keep the content in a buffer. +3. Some changes may be needed in both the CJS and ESM loader to allow loading arbitrary format in a raw buffer in a way that plays well with the internal cache and format detection. +4. This may allow us to finally deprecate `Module.wrap` properly + +## `exports` (require-only): invoked after execution of the module + +This only works for `require()` including `require(esm)`. It manipulates the exports object after execution of the original module completes. If the `exports` returned is not reference equal to the original exports object, it will affect later `module.exports` access in the original module but it does not affect direct `exports.foo` accesses (since the original exports are already passed through the context during module execution). If the module loaded is ESM (`context.format` is `module`) all the direct modification to `exports` are no-ops because ESM namespaces are not mutable. Returning a new `exports` for `require(esm)` is meaningless either - the only thing users can do is to read from the namespace, or to manipulate properties of the exported properties. + +```js +/** + * @typedef {{ + * format: string + * }} ModuleExportsContext + */ +/** + * @typedef {{ +* exports: any +* }} ModuleExportsResult +*/ +/** + * @param {string} url + * @param {ModuleExportsContext} context + * @param {(context: ModuleExportsContext) => {ModuleExportsResult}} nextExports + * @returns {ModuleExportsResult} + */ +function exports(url, context, nextExports) { // Mocking require-in-the-middle + let { exports: originalExports } = nextExports(url, context); + const stats = getStats(url); + if (stats.name && modules.includes(stats.name)) { + const newExports = userCallback(originalExports, stats.name, stats.basedir); + return { exports: newExports } + } + return { + exports: originalExports + }; +} +``` + +This can only be meaningfully implemented for modules loaded by `require()`. ESM's design is completely different from CJS in that resolution of the graph and evaluation of the graph are separated, so a similar timing in ESM would be "after module evaluation". However, Node.js only gets to control the timing of the evaluation of the root module. The inner module evaluation is currently completely internal to V8. It may be possible to upstream a post-evaluation hook to V8, but calling from C++ to JS would also incur a non-trivial performance cost, especially if it needs to be done for every single module. Also, since the ESM namespace is specified to be immutable, what users can do after ESM evaluation is very limited - they cannot replace anything in the namespace or switch it to a different module. That's why the `link` hook was devised below to better work with the design of ESM (before linking completes it's possible to swap the module resolved by `import` to a different module). + +### Alternative design: `requires` that encompasses `resolve` and `load` + +An alternative design would be to span it across the `resolve` and `load` steps - take the specifier as argument, return exports in the result, and rename it to something like `requires()` (to avoid clashing with `require()`). The `nextRequires` hook would for example, invoke the default implementation of `requires` which in turn encompasses `resolve` and `load`. If `nextRequires` is not invoked then the default `resolve` and `load` will be skipped. + +```js +function requires(specifier, context, nextRequires) { // Mocking require-in-the-middle + let { exports: originalExports, url } = nextRequires(specifier, context); + const stats = getStats(url); + if (stats.name && modules.includes(stats.name)) { + const newExports = userCallback(originalExports, stats.name, stats.basedir); + return { exports: newExports } + } + return { + exports: originalExports + }; +} +``` + +It's not yet investigated whether it is possible to make it work with the CJS loader cache at all, but it looks closer to what developers generally try to monkey-patch the CJS loader for. + +## `link` (import-only): invoked before linking + +This is invoked after `load` but prior to Node.js passing the final result to V8 for linking, so it can be composed with `resolve` and `load` if necessary. This needs to work with experimental vm modules for passing module instances around. + +```js +/** + * @typedef {{ + * source: string | Buffer + * }} ModuleLinkContext + */ +/** + * @typedef {{ +* module: vm.Module +* }} ModuleLinkResult +*/ +/** + * @param {string} url + * @param {ModuleLinkContext} context + * @param {(context: ModuleLinkContext) => {ModuleLinkResult}} nextLink + * @returns {ModuleLinkResult} + */ +function link(url, context, nextLink) { // Mocking import-in-the-middle + const { module: originalModule } = nextLink(url, context); + assert.strictEqual(module.status, 'linked'); // Original module is linked at this point + let source = `import * as original from 'original';`; + source += `import { userCallback, name, basedir } from 'util'`; + source += `const exported = {}`; + for (const key of originalModule.namespace) { + source += `let $${key} = original.${key};`; + source += `export { $${key} as ${key} }`; + source += `Object.defineProperty(exported, '${key}', { get() { return $${key}; }, set (value) { $${key} = value; }});`; + } + source += `userCallback(exported, name, basedir);`; + const m = vm.SourceTextModule(source); + m.linkSync((specifier) => { // This is not yet implemented but should be trivial to implement. + if (specifier === 'original') return originalModule; + // Contains a synthetic module with userCallback, name & basedir computed from url + if (specifier === 'util') return util; + }); + return { module: m }; +} +``` + +### Alternative design: `link` that encompasses `resolve` and `load` + +An alternative design would be, instead of invoking a hook after `load`, make it wrap around `resolve` and `load` steps. The `link` hooks would take `specifier` and return `vm.Module` instances. If `nextLink()` is invoked, the next `resolve` and `load` (e.g. the default ones) will be invoked inside that. If `nextLink()` is not invoked, the default `resolve` and `load` will be skipped. + +```js +function link(specifier, context, nextImport) { // Mocking import-in-the-middle + const { module: originalModule } = nextLink(specifier, context); + assert.strictEqual(module.status, 'linked'); // Original module is linked at this point + let source = '...'; + const m = vm.SourceTextModule(source); + m.linkSync((specifier) => { + // ... + }); + return { module: m }; +} +``` + +## Other use cases that may need some thought + +1. Hot module replacement +2. Source map support (e.g. see [babel-register](https://github.com/babel/babel/blob/07bd0003cbdaa8525279c6dfa84e435471eb5797/packages/babel-register/src/hook.js#L38)) +3. Virtual file system in packaged apps (single executable applications). From 95e449c37564d2b53a41e5797afb83d71985bf05 Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Tue, 4 Jun 2024 17:36:36 +0200 Subject: [PATCH 02/10] fixup! doc: add synchronous hooks proposal --- doc/design/pirates.js | 0 doc/design/proposal-synchronous-hooks.md | 10 ++++++---- 2 files changed, 6 insertions(+), 4 deletions(-) create mode 100644 doc/design/pirates.js diff --git a/doc/design/pirates.js b/doc/design/pirates.js new file mode 100644 index 0000000..e69de29 diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index a82b3d7..8f8ead5 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -1,6 +1,6 @@ # Universal, synchronous and in-thread loader hooks -For background and motivation, see https://github.com/nodejs/node/issues/52219 +For background and motivation, see https://github.com/nodejs/node/issues/52219. Prototype implementation is in https://github.com/joyeecheung/node/tree/sync-hooks. TL;DR: the top priority of this proposal is to allow sun-setting CJS loader monkey patching as non-breakingly as possible (which is why it needs to be synchronous and in-thread because that's how the CJS loader and `require()` works). Then it's API consistency with the existing `module.register()` off-thread hooks. @@ -37,8 +37,9 @@ removeHooks(id); 1. The names `addHooks` and `removeHooks` take inspiration from pirates. Can be changed to other better names. 2. An alternative design to remove the hooks could be `addHooks(...).unhook()`, in this case `addHooks()` returns an object that could have other methods. 1. This may allow third-party hooks to query itself and avoid double-registering. Though this functionality is probably out of scope of the MVP. -3. It seems useful to allow the results returned by the hooks to be partial i.e. hooks don't have to clone the result returned by the next (default) hook to override it, instead they only need to return an object with properties that they wish to override. This can save the overhead of excessive clones. -4. It seems `shortCircuit` is not really necessary if hooks can just choose to not call the next hook? +3. Another alternative design would be like what pirates offer: `const revert = addHooks(..); revert();`. +4. It seems useful to allow the results returned by the hooks to be partial i.e. hooks don't have to clone the result returned by the next (default) hook to override it, instead they only need to return an object with properties that they wish to override. This can save the overhead of excessive clones. +5. It seems `shortCircuit` is not really necessary if hooks can just choose to not call the next hook? ## Hooks @@ -131,7 +132,8 @@ Notes: 1. `context.format` is only present when the format is already determined by Node.js or a previous hook 2. It seems useful for the default load to always return a buffer, or add an option to `context` for the default hook to load it as a buffer, in case the resolved file point to a binary file (e.g. a zip file, a wasm, an addon). For the ESM loader it's (almost?) always a buffer. For CJS loader some changes are needed to keep the content in a buffer. 3. Some changes may be needed in both the CJS and ESM loader to allow loading arbitrary format in a raw buffer in a way that plays well with the internal cache and format detection. -4. This may allow us to finally deprecate `Module.wrap` properly +4. This may allow us to finally deprecate `Module.wrap` properly. +5. It may be useful to provide the computed extension in the context. An important use case is module format override (based on extensions?). ## `exports` (require-only): invoked after execution of the module From 7f6d0ac4a9c709638f55e39e9ebe1d1db699df29 Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Wed, 5 Jun 2024 01:52:32 +0200 Subject: [PATCH 03/10] Update doc/design/proposal-synchronous-hooks.md Co-authored-by: Geoffrey Booth --- doc/design/proposal-synchronous-hooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index 8f8ead5..12a049b 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -137,7 +137,7 @@ Notes: ## `exports` (require-only): invoked after execution of the module -This only works for `require()` including `require(esm)`. It manipulates the exports object after execution of the original module completes. If the `exports` returned is not reference equal to the original exports object, it will affect later `module.exports` access in the original module but it does not affect direct `exports.foo` accesses (since the original exports are already passed through the context during module execution). If the module loaded is ESM (`context.format` is `module`) all the direct modification to `exports` are no-ops because ESM namespaces are not mutable. Returning a new `exports` for `require(esm)` is meaningless either - the only thing users can do is to read from the namespace, or to manipulate properties of the exported properties. +This only runs for `require()` including `require(esm)`. It manipulates the exports object after execution of the original module completes. If the `exports` returned is not reference equal to the original exports object, it will affect later `module.exports` access in the original module but it does not affect direct `exports.foo` accesses (since the original exports are already passed through the context during module execution). If the module loaded is ESM (`context.format` is `module`) all the direct modification to `exports` are no-ops because ESM namespaces are not mutable. Returning a new `exports` for `require(esm)` is meaningless either - the only thing users can do is to read from the namespace, or to manipulate properties of the exported properties. ```js /** From 1a588c8cadc3e7c190e6123557cc38b2adf68f5b Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Wed, 5 Jun 2024 11:17:10 +0200 Subject: [PATCH 04/10] fixup! fixup! doc: add synchronous hooks proposal --- doc/design/pirates.js | 0 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 doc/design/pirates.js diff --git a/doc/design/pirates.js b/doc/design/pirates.js deleted file mode 100644 index e69de29..0000000 From 5259a80c8d98e10ef2534799ba291ec381ee8e89 Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Thu, 20 Jun 2024 18:04:57 +0200 Subject: [PATCH 05/10] fixup! fixup! fixup! doc: add synchronous hooks proposal --- doc/design/proposal-synchronous-hooks.md | 219 +++++++++++++++-------- 1 file changed, 146 insertions(+), 73 deletions(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index 12a049b..f946921 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -8,38 +8,35 @@ Existing users of CJS loader monkey patching to look into: - pirates (used by Babel and nyc) - require-in-the-middle (used by many tracing agents) -- yarn pnp +- Yarn PnP - tsx - ts-node +- proxyquire, quibble, mockery ## API design A high-level overview of the API. ```js -const { addHooks, removeHooks } = require('module'); +const { registerHooks } = require('module'); function resolve(specifier, context, nextResolve) { const resolved = nextResolve(specifier, context); - return { url: resolved.url.replaceAll(/foo/g, 'bar'); } + resolved.url = resolved.url.replaceAll(/foo/g, 'bar'); + return resolved; } function load(url, context, nextLoad) { const loaded = nextLoad(specifier, context); - return { source: loaded.source.toString().replaceAll(/foo/g, 'bar'); } + loaded.source = loaded.source.toString().replaceAll(/foo/g, 'bar'); + return loaded; } -// id is a symbol -const id = addHooks({ resolve, load, ... }); -removeHooks(id); +const hook = registerHooks({ resolve, load }); +hook.deregister(); // Calls hook[Symbol.dispose]() ``` -1. The names `addHooks` and `removeHooks` take inspiration from pirates. Can be changed to other better names. -2. An alternative design to remove the hooks could be `addHooks(...).unhook()`, in this case `addHooks()` returns an object that could have other methods. - 1. This may allow third-party hooks to query itself and avoid double-registering. Though this functionality is probably out of scope of the MVP. -3. Another alternative design would be like what pirates offer: `const revert = addHooks(..); revert();`. -4. It seems useful to allow the results returned by the hooks to be partial i.e. hooks don't have to clone the result returned by the next (default) hook to override it, instead they only need to return an object with properties that they wish to override. This can save the overhead of excessive clones. -5. It seems `shortCircuit` is not really necessary if hooks can just choose to not call the next hook? +* The name `registerHooks` and `deregister` can be changed. ## Hooks @@ -50,13 +47,14 @@ removeHooks(id); * @typedef {{ * parentURL?: string, * conditions?: string[], - * importAttributes?: object[] + * importAttributes?: Record, * }} ModuleResolveContext */ /** * @typedef {{ - * url?: string, - * format?: string + * url: string,, + * format?: string, + * shortCircuit?: boolean * }} ModuleResolveResult */ /** @@ -68,14 +66,15 @@ removeHooks(id); function resolve(specifier, context, nextResolve) { if (shouldOverride(specifier, context.parentURL)) { const url = customResolve(specifier, context.parentURL); - return { url }; + return { url, shortCircuit: true }; } const resolved = nextResolve(specifier, context); if (resolved.url.endsWith('.zip')) { - return { format: 'zip' }; + resolved.format = 'zip'; + return resolved; } - return {}; // no override + return resolved; // no override } ``` @@ -89,6 +88,8 @@ Notes: ### `load`: from url to source code +When `--experimental-network-import` is enabled, the default `load` step throws an error when encountering network imports. Although, users can implement blocking network imports by spawning a worker and use `Atomics.wait` to wait for the results. An example for doing blocking fetch synchronously can be found [here](https://github.com/addaleax/synchronous-worker?tab=readme-ov-file#how-can-i-avoid-using-this-package). + ```js /** * @typedef {{ @@ -109,6 +110,8 @@ Notes: * @param {(context: ModuleLoadContext) => {ModuleLoadResult}} nextLoad * @returns {ModuleLoadResult} */ + +// Mini TypeScript transpiler: function load(url, context, nextLoad) { const loaded = nextLoad(context); const { source: rawSource, format } = loaded; @@ -117,32 +120,61 @@ function load(url, context, nextLoad) { compilerOptions: { module: ts.ModuleKind.NodeNext } }); - return { - format: 'commonjs', - source: transpiled.outputText, - }; + loaded.format = 'commonjs'; + loaded.source = transpiled.outputText; + return loaded; } - return {}; + return loaded; } ``` Notes: -1. `context.format` is only present when the format is already determined by Node.js or a previous hook +1. `context.format` is only present when the format is already determined by Node.js or a previous hook. 2. It seems useful for the default load to always return a buffer, or add an option to `context` for the default hook to load it as a buffer, in case the resolved file point to a binary file (e.g. a zip file, a wasm, an addon). For the ESM loader it's (almost?) always a buffer. For CJS loader some changes are needed to keep the content in a buffer. 3. Some changes may be needed in both the CJS and ESM loader to allow loading arbitrary format in a raw buffer in a way that plays well with the internal cache and format detection. 4. This may allow us to finally deprecate `Module.wrap` properly. 5. It may be useful to provide the computed extension in the context. An important use case is module format override (based on extensions?). -## `exports` (require-only): invoked after execution of the module +Example mock of pirates: + +```js +function addHook(hook, options) { + function load(url, context, nextLoad) { + const loaded = nextLoad(url, context); + const index = url.lastIndexOf('.'); + const ext = url.slice(index); + if (!options.exts.includes(ext)) { + return loaded; + } + const filename = fileURLToPath(url); + if (!options.matcher(filename)) { + return loaded; + } + loaded.source = hook(loaded.source, filename); + return loaded; + } + + const hook = module.registerHooks({ load }); + + return function revert() { + hook.deregister(); + }; +} +``` + +## `exports` (require-only): from compiled CJS wrapper to exports object + +This only runs for `require()` including `require(esm)`. It can only be meaningfully implemented for modules loaded by `require()`, since for ESM Node.js only gets to control the timing of the evaluation of the root module. The inner module evaluation is currently completely internal to V8. It may be possible to upstream a post-evaluation hook to V8, but calling from C++ to JS would also incur a non-trivial performance cost, especially if it needs to be done for every single module. Also, since the ESM namespace is specified to be immutable, what users can do after ESM evaluation is very limited - they cannot replace anything in the namespace or switch it to a different module. That's why the `link` hook was devised below to better work with the design of ESM (before linking completes it's possible to swap the module resolved by `import` to a different module). -This only runs for `require()` including `require(esm)`. It manipulates the exports object after execution of the original module completes. If the `exports` returned is not reference equal to the original exports object, it will affect later `module.exports` access in the original module but it does not affect direct `exports.foo` accesses (since the original exports are already passed through the context during module execution). If the module loaded is ESM (`context.format` is `module`) all the direct modification to `exports` are no-ops because ESM namespaces are not mutable. Returning a new `exports` for `require(esm)` is meaningless either - the only thing users can do is to read from the namespace, or to manipulate properties of the exported properties. +Example mocking require-in-the-middle: ```js /** * @typedef {{ - * format: string + * format: string, + * source: string | Buffer, * }} ModuleExportsContext */ /** @@ -156,8 +188,8 @@ This only runs for `require()` including `require(esm)`. It manipulates the expo * @param {(context: ModuleExportsContext) => {ModuleExportsResult}} nextExports * @returns {ModuleExportsResult} */ -function exports(url, context, nextExports) { // Mocking require-in-the-middle - let { exports: originalExports } = nextExports(url, context); +function exports(url, context, nextExports) { + let exported = nextExports(url, context); const stats = getStats(url); if (stats.name && modules.includes(stats.name)) { const newExports = userCallback(originalExports, stats.name, stats.basedir); @@ -169,36 +201,26 @@ function exports(url, context, nextExports) { // Mocking require-in-the-middle } ``` -This can only be meaningfully implemented for modules loaded by `require()`. ESM's design is completely different from CJS in that resolution of the graph and evaluation of the graph are separated, so a similar timing in ESM would be "after module evaluation". However, Node.js only gets to control the timing of the evaluation of the root module. The inner module evaluation is currently completely internal to V8. It may be possible to upstream a post-evaluation hook to V8, but calling from C++ to JS would also incur a non-trivial performance cost, especially if it needs to be done for every single module. Also, since the ESM namespace is specified to be immutable, what users can do after ESM evaluation is very limited - they cannot replace anything in the namespace or switch it to a different module. That's why the `link` hook was devised below to better work with the design of ESM (before linking completes it's possible to swap the module resolved by `import` to a different module). +If the module loaded is CJS (`context.format` is `"commonjs"`), If the default step is run via `nextExports` run but the `exports` returned later is not reference equal to the original exports object, it will only affect later `module.exports` access in the original module, but does not affect contextual `exports.foo` accesses within the module. There are two options to address this: -### Alternative design: `requires` that encompasses `resolve` and `load` +1. We allow the default `exports` step to take an optional `context.exports`. If it's configured, it will be used during the wrapper invocation, which unifies the contextual access. Hooks will need to use a proxy if they want the new exports object to be based on the result from default evaluation (since by the time the new overridden exports object is created, the module hasn't been evaluated yet). +2. We expose the compiled CJS wrapper function in `context` so that users get to invoke it themselves and skip the default `exports` step. -An alternative design would be to span it across the `resolve` and `load` steps - take the specifier as argument, return exports in the result, and rename it to something like `requires()` (to avoid clashing with `require()`). The `nextRequires` hook would for example, invoke the default implementation of `requires` which in turn encompasses `resolve` and `load`. If `nextRequires` is not invoked then the default `resolve` and `load` will be skipped. +1 may provide better encapsulation, because with 2 would mean that changes to the CJS wrapper function parameters becomes breaking. -```js -function requires(specifier, context, nextRequires) { // Mocking require-in-the-middle - let { exports: originalExports, url } = nextRequires(specifier, context); - const stats = getStats(url); - if (stats.name && modules.includes(stats.name)) { - const newExports = userCallback(originalExports, stats.name, stats.basedir); - return { exports: newExports } - } - return { - exports: originalExports - }; -} -``` +If the module loaded is ESM (`context.format` is `"module"`) all the direct modification to `exports` are no-ops because ESM namespaces are not mutable. Returning a new `exports` for `require(esm)` only affects the caller of `require(esm)`, unless the hook returns a proxy to connect the changes between the new exports object and the module namespace object. -It's not yet investigated whether it is possible to make it work with the CJS loader cache at all, but it looks closer to what developers generally try to monkey-patch the CJS loader for. +## `link` (import-only): from source code to compiled module records -## `link` (import-only): invoked before linking +This needs to work with experimental vm modules for passing module instances around. -This is invoked after `load` but prior to Node.js passing the final result to V8 for linking, so it can be composed with `resolve` and `load` if necessary. This needs to work with experimental vm modules for passing module instances around. +Example mock of import-in-the-middle: ```js /** * @typedef {{ - * source: string | Buffer + * specifier: string, + * source: string | Buffer, * }} ModuleLinkContext */ /** @@ -212,47 +234,98 @@ This is invoked after `load` but prior to Node.js passing the final result to V8 * @param {(context: ModuleLinkContext) => {ModuleLinkResult}} nextLink * @returns {ModuleLinkResult} */ -function link(url, context, nextLink) { // Mocking import-in-the-middle - const { module: originalModule } = nextLink(url, context); - assert.strictEqual(module.status, 'linked'); // Original module is linked at this point +function link(url, context, nextLink) { + const linked = nextLink(url, context); + if (!shouldOverride(url, context.specifier)) { + return linked; + } + assert.strictEqual(linked.module.status, 'linked'); // Original module is linked at this point let source = `import * as original from 'original';`; source += `import { userCallback, name, basedir } from 'util'`; source += `const exported = {}`; - for (const key of originalModule.namespace) { + for (const key of linked.module.namespace) { source += `let $${key} = original.${key};`; source += `export { $${key} as ${key} }`; source += `Object.defineProperty(exported, '${key}', { get() { return $${key}; }, set (value) { $${key} = value; }});`; } source += `userCallback(exported, name, basedir);`; - const m = vm.SourceTextModule(source); - m.linkSync((specifier) => { // This is not yet implemented but should be trivial to implement. - if (specifier === 'original') return originalModule; + // This is not yet implemented but should be trivial to implement. + linked.module = new vm.SourceTextModuleSync(source, function linkSync(specifier) { + if (specifier === 'original') return linked; // Contains a synthetic module with userCallback, name & basedir computed from url if (specifier === 'util') return util; }); - return { module: m }; + return linked; } ``` -### Alternative design: `link` that encompasses `resolve` and `load` +## Invocation order of `resolve` and `load` in ESM -An alternative design would be, instead of invoking a hook after `load`, make it wrap around `resolve` and `load` steps. The `link` hooks would take `specifier` and return `vm.Module` instances. If `nextLink()` is invoked, the next `resolve` and `load` (e.g. the default ones) will be invoked inside that. If `nextLink()` is not invoked, the default `resolve` and `load` will be skipped. +With synchronous hooks, when overriding loading in a way may take a significant amount of time (for example, involving network access), the module requests are processed sequentially, losing the ability to make processing concurrent with a thread pool or event loop, which was what the design of ESM took care to make possible thanks to the `import` syntax. One trick can be used to start concurrent loading requests in `resolve` before blocking for the results in the `load` hook. ```js -function link(specifier, context, nextImport) { // Mocking import-in-the-middle - const { module: originalModule } = nextLink(specifier, context); - assert.strictEqual(module.status, 'linked'); // Original module is linked at this point - let source = '...'; - const m = vm.SourceTextModule(source); - m.linkSync((specifier) => { - // ... - }); - return { module: m }; +function resolve(specifier, context, nextResolve) { + const resolved = nextResolve(specifier, context); + if (resolved.url.startsWith('http')) { + // This adds a fetch request to a pool + const syncResponse = startFetch(resolved.url); + responseMap.set(url, syncResponse); + } + return resolved; +} + +function load(url, context, nextLoad) { + const syncResponse = responseMap.get(url); + if (syncResponse) { + const source = syncResponse.wait(); + return { source, shortCircuit: true }; + } + return nextLoad(url, context); } ``` -## Other use cases that may need some thought +However, to reuse `resolve` for concurrent loading, we need to implement it to be run in BFS order - that is, for a module like this: + +```js +import 'a'; +import 'b'; +import 'c'; +``` + +Instead of running the hooks as: + +1. `resolve(a)` -> `load(a)` -> `link(a)` +2. `resolve(b)` -> `load(b)` -> `link(a)` +3. `resolve(c)` -> `load(c)` -> `link(a)` + +We need to run the hook as: + +1. `resolve(a)` -> `resolve(b)` -> `resolve(c)` (starts the off-thread fetching requests concurrently) +2. `load(a)` -> `load(b)` -> `load(c)` (block on the concurrently fetched results) +3. `link(a)` -> `link(b)` -> `link(c)` + +Otherwise, we need to invent another hook that gets run in BFS order before both `resolve` and `load`, but in that case, the hook could only get the specifiers in its arguments, which may or may not be a problem. For network imports it's okay, because the specifiers are supposed to be URLs already, but other forms of concurrent processing (e.g. parallel transpilation with a worker pool) may not be possible without the result of `resolve`. + +On the other hand, for CJS we have to run the hooks in DFS order since that's how CJS module requests have to be resolved. + +## Child worker/process hook inheritance + +Unlike the old `module.register()` design, this new API takes an object with functions so that it's possible to migrate existing monkey-patching packages over. One reason for the `path`/`url` parameter in the `module.register()` and the `initialize` was to aid hook inheritance in child worker and processes. With the new API, we can compose it differently by adding a new API specifically for registering a script that should be preloaded by child workers or processes. Hook authors can either put the call together with the hook registration (so that the inheritance is recursive automatically for grandchildren etc.), or put the hook registration code in a different file and configure a preload of that file. + +```js +process.preloadInChildren(__filename); // Or `import.meta.filename` +module.registerHooks({ load }); +``` + +If the hook doesn't want it to be inherited in child workers or processes, it can opt out by not calling the preload API, or only call it under certain conditions. + +## Additional use cases to consider + +### Source map support + +In most cases, source maps for Node.js apps are consumed externally (e.g. in a client to the inspector protocol) instead of in the Node.js instance that loads the modules, with the notable exception being error stack trace translation, enabled via `--enable-source-maps`. In this case, Node.js loads the source maps from disk using the URLs from the magic comments (currently, some of these are extracted using regular expressions, some are parsed by V8. It was discussed to rely on V8 instead of parsing them ourselves). To allow hooks to customize this behavior, we would need a hook that runs after module parsing and ideally before evaluation (otherwise, if an error is thrown during evaluation, the hook won't be able to run in time to affect the stack traces of the error being thrown). This can be done in the `link` or `exports` hooks proposed above if they take an optional property for source maps in the hook result, and the implementation can put the overriden sourcemap into the internal cache for later source position translations. + +### Cache manipulation + +It's common for users of existing CJS hooks to wipe the `require.cache` in their tests. Some tools may also update the `require.cache` for use cases like hot module replacement. This should probably be invented as a separate API to manipulate the cache (both the CJS and the ESM one). -1. Hot module replacement -2. Source map support (e.g. see [babel-register](https://github.com/babel/babel/blob/07bd0003cbdaa8525279c6dfa84e435471eb5797/packages/babel-register/src/hook.js#L38)) -3. Virtual file system in packaged apps (single executable applications). From a7fd7006fb247bab23cbbcd81f39f1e20fcae299 Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Mon, 1 Jul 2024 22:56:20 +0200 Subject: [PATCH 06/10] fixup! doc: add synchronous hooks proposal Co-authored-by: Geoffrey Booth --- doc/design/proposal-synchronous-hooks.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index f946921..b300b2b 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -137,10 +137,11 @@ Notes: 4. This may allow us to finally deprecate `Module.wrap` properly. 5. It may be useful to provide the computed extension in the context. An important use case is module format override (based on extensions?). -Example mock of pirates: +Example migration for Pirates: ```js -function addHook(hook, options) { +// Preserve Pirates’ existing public API where users call `addHook` to register a function +export function addHook(hook, options) { function load(url, context, nextLoad) { const loaded = nextLoad(url, context); const index = url.lastIndexOf('.'); From 0cd91d9d60b1ba7c5aa73e5b1d99b973ae9ef9c8 Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Mon, 1 Jul 2024 22:58:11 +0200 Subject: [PATCH 07/10] fixup! doc: add synchronous hooks proposal Co-authored-by: Geoffrey Booth --- doc/design/proposal-synchronous-hooks.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index b300b2b..8dbfb4e 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -180,7 +180,8 @@ Example mocking require-in-the-middle: */ /** * @typedef {{ -* exports: any +* exports: any, +* shortCircuit?: boolean * }} ModuleExportsResult */ /** @@ -226,7 +227,8 @@ Example mock of import-in-the-middle: */ /** * @typedef {{ -* module: vm.Module +* module: vm.Module, +* shortCircuit?: boolean * }} ModuleLinkResult */ /** From bcfab66b12b36550703dd09a2d2c80b0fb715cbc Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Mon, 1 Jul 2024 22:59:02 +0200 Subject: [PATCH 08/10] fixup! fixup! doc: add synchronous hooks proposal --- doc/design/proposal-synchronous-hooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index 8dbfb4e..4684246 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -100,7 +100,7 @@ When `--experimental-network-import` is enabled, the default `load` step throws */ /** * @typedef {{ -* format?: string, +* format: string, * source: string | Buffer * }} ModuleLoadResult */ From 1c8877ec58265dd28f13aaaeb0e754d1d3698f9e Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Wed, 31 Jul 2024 18:42:08 +0200 Subject: [PATCH 09/10] Apply suggestions from code review Co-authored-by: Geoffrey Booth --- doc/design/proposal-synchronous-hooks.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index 4684246..8b5613e 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -100,10 +100,10 @@ When `--experimental-network-import` is enabled, the default `load` step throws */ /** * @typedef {{ -* format: string, -* source: string | Buffer -* }} ModuleLoadResult -*/ + * format: string, + * source: string | Buffer + * }} ModuleLoadResult + */ /** * @param {string} url * @param {ModuleLoadContext} context From 570dd050b4434542cb5aaeab5a2e89b4267399bf Mon Sep 17 00:00:00 2001 From: Joyee Cheung Date: Tue, 20 Aug 2024 21:55:29 +0200 Subject: [PATCH 10/10] link -> instantiate, update about specifier/URL -> functions --- doc/design/proposal-synchronous-hooks.md | 97 ++++++++++++++---------- 1 file changed, 59 insertions(+), 38 deletions(-) diff --git a/doc/design/proposal-synchronous-hooks.md b/doc/design/proposal-synchronous-hooks.md index 8b5613e..1b6340a 100644 --- a/doc/design/proposal-synchronous-hooks.md +++ b/doc/design/proposal-synchronous-hooks.md @@ -33,10 +33,10 @@ function load(url, context, nextLoad) { } const hook = registerHooks({ resolve, load }); -hook.deregister(); // Calls hook[Symbol.dispose]() +hook.deregister(); // Calls [Symbol.dispose](), or the other way around? ``` -* The name `registerHooks` and `deregister` can be changed. +* The name `registerHooks` and `deregister` are subject to change. ## Hooks @@ -82,13 +82,13 @@ Notes: 1. Example use case: yarn pnp ([esm hooks](https://github.com/yarnpkg/berry/blob/master/packages/yarnpkg-pnp/sources/esm-loader/hooks/resolve.ts), [cjs hooks](https://github.com/yarnpkg/berry/blob/master/packages/yarnpkg-pnp/sources/loader/applyPatch.ts)) 2. `importAttributes` are only available when the module request is initiated with `import`. -3. For the CJS loader, `Module._cache` is keyed using file names, and it's used in the wild (usually in the form of `require.cache`) so changing it to key on URL would be breaking. We'll need to figure out another way to map it with an url. - 1. It might be non-breaking to maintain an additional map on the side for mapping the same url with different searches and hashes to a different module instance, or leave `Module._cache` as an alias map for the first module instance mapped by the URL minus search/hash. If in the same realm, `Module._cache` gets directly accessed from the outside, we can emit a warning about the incompatibility. Down the road this may require making `Module._cache` a proxy to propagate delete requests to the actual URL-based map. - 2. Changing CJS modules to be mapped with URL will have performance consequences from but might also help with the hot module replacement use case. Note that for parent packages with `exports` or `imports` in their `package.json` the URL conversion is already done (and not even cached) even in the CJS loader. +3. For the CJS loader, `Module._cache` is keyed using filenames, and it's used in the wild (usually in the form of `require.cache`) so changing it to key on URL would be breaking. We'll attach a URL to the modules which can be converted to filenames or ids that are the same as their keys in `Module._cache` and use that in the hooks. ### `load`: from url to source code -When `--experimental-network-import` is enabled, the default `load` step throws an error when encountering network imports. Although, users can implement blocking network imports by spawning a worker and use `Atomics.wait` to wait for the results. An example for doing blocking fetch synchronously can be found [here](https://github.com/addaleax/synchronous-worker?tab=readme-ov-file#how-can-i-avoid-using-this-package). +Typically `load` is where hook authors might want to implement it asynchronously. In this case they can spawn their own worker and use `Atomics.wait()` the hooks are synchronous, which has already been done by `module.register()`'s off-thread hooks anyway, and this new API is just giving the control/choice back to the hook authors. An example for doing blocking fetch synchronously using workers can be found [here](https://github.com/addaleax/synchronous-worker?tab=readme-ov-file#how-can-i-avoid-using-this-package). + +When the hooks only access the file system and do not access the network, it is recommended to just use the synchronous FS APIs, which has lower overhead compared to the asynchronous versions. ```js /** @@ -132,15 +132,14 @@ function load(url, context, nextLoad) { Notes: 1. `context.format` is only present when the format is already determined by Node.js or a previous hook. -2. It seems useful for the default load to always return a buffer, or add an option to `context` for the default hook to load it as a buffer, in case the resolved file point to a binary file (e.g. a zip file, a wasm, an addon). For the ESM loader it's (almost?) always a buffer. For CJS loader some changes are needed to keep the content in a buffer. -3. Some changes may be needed in both the CJS and ESM loader to allow loading arbitrary format in a raw buffer in a way that plays well with the internal cache and format detection. -4. This may allow us to finally deprecate `Module.wrap` properly. -5. It may be useful to provide the computed extension in the context. An important use case is module format override (based on extensions?). +2. It seems useful for the default load to always return a buffer, or add an option to `context` for the default hook to load it as a buffer, in case the resolved file point to a binary file (e.g. a zip file, a wasm, an addon). For the ESM loader it's (almost?) always a buffer. For CJS loader some changes are needed to keep the content in a buffer. For now users need to assume that the source could be either string or a buffer and need to handle both, this is also the case for existing `module.register()` hooks. +3. This may allow us to finally deprecate `Module.wrap` properly. +4. It may be useful to provide the computed extension in the context. An important use case is module format override (based on extensions?) -Example migration for Pirates: +Example migration for the pirates package: ```js -// Preserve Pirates’ existing public API where users call `addHook` to register a function +// Preserve pirates’ existing public API where users call `addHook` to register a function export function addHook(hook, options) { function load(url, context, nextLoad) { const loaded = nextLoad(url, context); @@ -175,7 +174,7 @@ Example mocking require-in-the-middle: /** * @typedef {{ * format: string, - * source: string | Buffer, + * module: module.Module * }} ModuleExportsContext */ /** @@ -212,9 +211,11 @@ If the module loaded is CJS (`context.format` is `"commonjs"`), If the default s If the module loaded is ESM (`context.format` is `"module"`) all the direct modification to `exports` are no-ops because ESM namespaces are not mutable. Returning a new `exports` for `require(esm)` only affects the caller of `require(esm)`, unless the hook returns a proxy to connect the changes between the new exports object and the module namespace object. -## `link` (import-only): from source code to compiled module records +## `instantiate` (import-only): from compiled ES modules to instantiated ES modules (with dependency resolved) + +Previously with only the `load` hooks for overriding module behavior, hooks typically have to parse the provided source again to understand the exports, which can be both slow and prone to bugs. `instantiate` gives the hooks an opportunity to override module behavior using the export names parsed by V8 and perform customization before the the compiled code gets evaluated. For ESM, it needs to work with experimental vm modules for passing customized module instances around. -This needs to work with experimental vm modules for passing module instances around. +TODO(joyeecheung): we may consider exposing `compile` (source code to compiled modules) too, which happens before `instantiate` and can be overloaded for CJS as well. The `instantiate` hook has to be separate because merely compiling the modules is not enough for V8 to provide the export names. V8 requires the `import` dependencies to be resolved before it can return the namespace, since there might be `export * from '...'` coming from the dependencies. Example mock of import-in-the-middle: @@ -222,46 +223,52 @@ Example mock of import-in-the-middle: /** * @typedef {{ * specifier: string, - * source: string | Buffer, - * }} ModuleLinkContext + * module: vm.Module, + * }} ModuleInstantiateContext + * ModuleWrap can be too internal to be exposed. We'll need to wrap it with a more public + * type before passing it to users. However, users should not expect the wrapper to own + * the ModuleWrap/V8 module records or use this for memory management. */ /** * @typedef {{ -* module: vm.Module, +* module?: vm.Module, * shortCircuit?: boolean -* }} ModuleLinkResult +* }} ModuleInstantiateResult */ /** * @param {string} url - * @param {ModuleLinkContext} context - * @param {(context: ModuleLinkContext) => {ModuleLinkResult}} nextLink - * @returns {ModuleLinkResult} + * @param {ModuleInstantiateContext} context + * @param {(context: ModuleInstantiateContext) => {ModuleInstantiateResult}} nextInstantiate + * @returns {ModuleInstantiateResult} */ -function link(url, context, nextLink) { - const linked = nextLink(url, context); - if (!shouldOverride(url, context.specifier)) { - return linked; +function instantiate(url, context, nextInstantiate) { + const instantiated = nextInstantiate(url, context); + if (!instantiated.module || !shouldOverride(url, context.specifier)) { // Only overrides ESM. + return instantiated; } - assert.strictEqual(linked.module.status, 'linked'); // Original module is linked at this point + assert.strictEqual(instantiated.module.status, 'instantiated'); // Original module has resolved its dependencies at this point let source = `import * as original from 'original';`; source += `import { userCallback, name, basedir } from 'util'`; source += `const exported = {}`; - for (const key of linked.module.namespace) { + for (const key of instantiated.module.namespace) { // Note how instantiated.module.namespace is the real exports parsed by V8. source += `let $${key} = original.${key};`; source += `export { $${key} as ${key} }`; source += `Object.defineProperty(exported, '${key}', { get() { return $${key}; }, set (value) { $${key} = value; }});`; } source += `userCallback(exported, name, basedir);`; // This is not yet implemented but should be trivial to implement. - linked.module = new vm.SourceTextModuleSync(source, function linkSync(specifier) { - if (specifier === 'original') return linked; + instantiated.module = new vm.SourceTextModuleSync(source, function linkSync(specifier) { + if (specifier === 'original') return instantiated; // Contains a synthetic module with userCallback, name & basedir computed from url if (specifier === 'util') return util; }); - return linked; + assert.strictEqual(instantiated.module.status, 'instantiated'); + return instantiated; } ``` +Alternatively, for overriding the compiled module instance with a `SourceTextModule` that doesn't have additional dependencies itself, we can assume that if `result.module` is a string for a `SourceTextModule` that needs to be recompiled using the original configurations, and perform the recompilation without requiring users to use `vm.SourceTextModule`. + ## Invocation order of `resolve` and `load` in ESM With synchronous hooks, when overriding loading in a way may take a significant amount of time (for example, involving network access), the module requests are processed sequentially, losing the ability to make processing concurrent with a thread pool or event loop, which was what the design of ESM took care to make possible thanks to the `import` syntax. One trick can be used to start concurrent loading requests in `resolve` before blocking for the results in the `load` hook. @@ -311,16 +318,27 @@ Otherwise, we need to invent another hook that gets run in BFS order before both On the other hand, for CJS we have to run the hooks in DFS order since that's how CJS module requests have to be resolved. -## Child worker/process hook inheritance +## The API takes functions directly, delegating worker inheritance to another API -Unlike the old `module.register()` design, this new API takes an object with functions so that it's possible to migrate existing monkey-patching packages over. One reason for the `path`/`url` parameter in the `module.register()` and the `initialize` was to aid hook inheritance in child worker and processes. With the new API, we can compose it differently by adding a new API specifically for registering a script that should be preloaded by child workers or processes. Hook authors can either put the call together with the hook registration (so that the inheritance is recursive automatically for grandchildren etc.), or put the hook registration code in a different file and configure a preload of that file. +Unlike the old `module.register()` design, this new API takes an object with functions so that it's possible to migrate existing monkey-patching packages over, for several reasons: -```js -process.preloadInChildren(__filename); // Or `import.meta.filename` -module.registerHooks({ load }); -``` +1. It's unclear how an API that takes a `specifier` & `parentURL` combination should resolve the request in a universal manner - `module.register()` simply reuses the `import` resolution which is assuming too much for universal hooks e.g. `import` conditions in `package.json` would be used over other conditions, which can be surprising if the hooks are supposed to be `require()`-ed. ESM resolution is deliberately different from CJS resolution, so choosing either of them or trying to invent a new resolution rule adds unnecessary quirks to the API, it's better to just leave the choice to users. +2. The purpose of the `specifier` & `parentURL` combination in `module.register()` was to help child worker inherit the hooks. In practice, however, inheriting the module request pointed by the registration API alone is not necessarily enough for the entire customization to work. For example, in a typical `instrumentation.js` -> `@opentelemetry/instrumentation` -> `import-in-the-middle` -> `import-in-the-middle/hook.mjs` customization dependency chain, only automatically preloading `import-in-the-middle/hook.mjs` doesn't complete customization for workers if the higher-level code don't get preloaded too. In the end the high-level code would still need a way to register themselves as preloads for child workers, making the inheritance of the low-level hooks alone redundant. -If the hook doesn't want it to be inherited in child workers or processes, it can opt out by not calling the preload API, or only call it under certain conditions. +So this new API will simply take the functions directly and delegate the worker inheritance/preload registration to a separate API. Currently, existing hooks users have been using `--require`/`--import` and the `process.execArgv` inheritance in workers to deal with registration of the entire customization, and this will continue to work with the new API. In the future we can explore a few other options: + +1. A file-based configuration to specify preloads that are automatically discovered by Node.js upon startup. See [the `noderc` proposal](https://github.com/nodejs/node/issues/53787). +2. A JavaScript API to register preloads in child workers. For example: + + ```js + // As if `require(require.resolve(specifier, { paths: require.resolve.paths(base) }))` + // is used to preload the file. + process.requireInChildren(specifier, base); + // As if import(new URL(specifier, parentURL)) is used to preload the file. + process.importInChildren(specifier, parentURL); + ``` + + This API is supposed to be invoked by final customizations or end user code, or intermediate dependency that wish to spawn a worker with different preloads, because only they have access to the entire graph of the customizations that should be inherited into the workers. For example, in the case of a typical open-telemetry user, the file being preloaded should be their `instrumentation.js`. We could also add accompanying APIs for removing/querying preloads. ## Additional use cases to consider @@ -332,3 +350,6 @@ In most cases, source maps for Node.js apps are consumed externally (e.g. in a c It's common for users of existing CJS hooks to wipe the `require.cache` in their tests. Some tools may also update the `require.cache` for use cases like hot module replacement. This should probably be invented as a separate API to manipulate the cache (both the CJS and the ESM one). +### Locking the module hooks + +To prevent accidental mistakes or to aid analysis of the dependency, it may be helpful to provide an API that stops further module hook registration after its invocation.