-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
esm: implement the getFileSystem hook #41076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
console.log(`foo`); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
import './foo/index.mjs'; | ||
import hash from './foo/index-sha512.mjs'; | ||
|
||
console.log(`demo`); | ||
console.log(`demo hash:`, hash); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import crypto from 'crypto'; | ||
import path from 'path'; | ||
|
||
const shaRegExp = /-sha512(\.mjs)$/; | ||
|
||
function getSourcePath(p) { | ||
if (p.protocol !== `file:`) | ||
return p; | ||
|
||
const pString = p.toString(); | ||
const pFixed = pString.replace(shaRegExp, `$1`); | ||
if (pFixed === pString) | ||
return p; | ||
|
||
return new URL(pFixed); | ||
} | ||
|
||
export function getFileSystem(defaultGetFileSystem) { | ||
const fileSystem = defaultGetFileSystem(); | ||
|
||
return { | ||
readFileSync(p) { | ||
const fixedP = getSourcePath(p); | ||
if (fixedP === p) | ||
return fileSystem.readFileSync(p); | ||
|
||
const content = fileSystem.readFileSync(fixedP); | ||
const hash = crypto.createHash(`sha512`).update(content).digest(`hex`); | ||
|
||
return Buffer.from(`export default ${JSON.stringify(hash)};`); | ||
}, | ||
|
||
statEntrySync(p) { | ||
const fixedP = getSourcePath(p); | ||
return fileSystem.statEntrySync(fixedP); | ||
}, | ||
|
||
realpathSync(p) { | ||
const fixedP = getSourcePath(p); | ||
if (fixedP === p) | ||
return fileSystem.realpathSync(p); | ||
|
||
const realpath = fileSystem.realpathSync(fixedP); | ||
if (path.extname(realpath) !== `.mjs`) | ||
throw new Error(`Paths must be .mjs extension to go through the sha512 loader`); | ||
|
||
return realpath.replace(/\.mjs$/, `-sha512.mjs`); | ||
}, | ||
}; | ||
} |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,69 @@ | ||||||||||||||
'use strict'; | ||||||||||||||
|
||||||||||||||
const { | ||||||||||||||
FunctionPrototypeBind, | ||||||||||||||
ObjectCreate, | ||||||||||||||
ObjectKeys, | ||||||||||||||
SafeMap, | ||||||||||||||
} = primordials; | ||||||||||||||
|
||||||||||||||
const realpathCache = new SafeMap(); | ||||||||||||||
|
||||||||||||||
const internalFS = require('internal/fs/utils'); | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think |
||||||||||||||
const fs = require('fs'); | ||||||||||||||
const fsPromises = require('internal/fs/promises').exports; | ||||||||||||||
Comment on lines
+12
to
+13
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to use destructuring here I think, otherwise monkey-patching of |
||||||||||||||
const packageJsonReader = require('internal/modules/package_json_reader'); | ||||||||||||||
const { fileURLToPath } = require('url'); | ||||||||||||||
const { internalModuleStat } = internalBinding('fs'); | ||||||||||||||
|
||||||||||||||
const implementation = { | ||||||||||||||
|
const implementation = { | |
const defaultFileSystem = { |
But I think this might be better as a class (like Guy suggests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should this be used to read json files that are not a package.json? If not, I think the name should be more explicit:
async readJson(p) { | |
return packageJsonReader.read(fileURLToPath(p)); | |
}, | |
async readPackageJson(p) { | |
return packageJsonReader.read(fileURLToPath(p)); | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't just have readJson
as a well-defined abstraction on top of readFile
? Package virtualization can still work by returning serialized JSON - which is likely needed anyway with threading workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make perf analysis on internalReadJson
, but I'd intuitively agree that a JS abstraction over readFile
(that would just apply a regex to get the containsKey
value) would fast enough without requiring a dedicated native function. I can experiment with that. Or do you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory realpath can be implemented on top just an lstat
and stat
implementation, although that would be a slightly more complex refactoring of the realpath algorithm. It might well simplify the model though in just having three base-level FS primitive hooks to virtualize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about that - it'd likely be much slower, and the semantic loss could bring unforeseen issues. I'd not feel confident dropping it in this PR 🤔
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can't/don't change, and they're also used in ESMLoader, so I think this should be moved outside defaultGetFileSystem()
and be exported to avoid needlessly re-compiling this list.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps make the implementation a class so this is just instantiation?
One benefit of that would be treating the realpathcache as an instance property so that would give it a proper lifetime.
That does mean we have to think a bit more carefully about the lifetimes of the hook though too - this function is only called once though currently right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at that, I think an object should be preferred over a class: first because we don't really use class features (in particular we can't use this
since it'd break destructuring), but more importantly because it'd prevent the default fs functions from being observable via ObjectKeys
. Given that it's currently the source of truth for the loader code to know what are the supported fs functions, using a class would require further special casing which I think makes the object form preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see pros and cons to both, but re-creating native functionality sounds slightly more on the con-side.
You could create an instance (the default instance / singleton) and get the keys from it:
class FSLoaderUtils { /* … */ }
const defaultFSLoaderUtils = new FSLoaderUtils();
const defaultFSLoaderUtilNames = ObjectKeys(defaultFSLoaderUtils);
function defaultGetFSLoaderUtils() {
return {
defaultFSLoaderUtils,
defaultFSLoaderUtilNames,
};
}
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,28 +5,28 @@ const { | |
decodeURIComponent, | ||
} = primordials; | ||
const { getOptionValue } = require('internal/options'); | ||
const esmLoader = require('internal/process/esm_loader'); | ||
|
||
// Do not eagerly grab .manifest, it may be in TDZ | ||
const policy = getOptionValue('--experimental-policy') ? | ||
require('internal/process/policy') : | ||
null; | ||
|
||
const { Buffer } = require('buffer'); | ||
|
||
const fs = require('internal/fs/promises').exports; | ||
const { URL } = require('internal/url'); | ||
const { | ||
ERR_INVALID_URL, | ||
ERR_INVALID_URL_SCHEME, | ||
} = require('internal/errors').codes; | ||
const readFileAsync = fs.readFile; | ||
|
||
const DATA_URL_PATTERN = /^[^/]+\/[^,;]+(?:[^,]*?)(;base64)?,([\s\S]*)$/; | ||
|
||
async function defaultGetSource(url, { format } = {}, defaultGetSource) { | ||
const parsed = new URL(url); | ||
let source; | ||
if (parsed.protocol === 'file:') { | ||
source = await readFileAsync(parsed); | ||
source = await esmLoader.getFileSystem().readFile(parsed); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be preferable to have this passed in as a param instead of grabbing off of the application context There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In which case does that happen? It could be a blocker for the If that's the case of the "loader loader" that @JakobJingleheimer mentions there, since from what I understand they aren't used at the same time, perhaps the global access would be fine? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I THINK their sequence is fully serial: the node-land esmLoaderNodeland finishes its work and disappears into the void, and then esmLoaderUserland steps in and starts its work. Aside from Bradley's concern of bloating params, I think it would be better for this to be passed (and also makes it easier to test). One of our Loader team's side/wish-list projects is to make ESMLoader pure, and this goes against that. Not necessarily a show-stopper, but something to mention: If we do decide to make it global and preclude fully pure, we should do it consciously. |
||
} else if (parsed.protocol === 'data:') { | ||
const match = RegExpPrototypeExec(DATA_URL_PATTERN, parsed.pathname); | ||
if (!match) { | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6,17 +6,21 @@ require('internal/modules/cjs/loader'); | |||||||||||||||||||
const { | ||||||||||||||||||||
Array, | ||||||||||||||||||||
ArrayIsArray, | ||||||||||||||||||||
ArrayPrototypeFilter, | ||||||||||||||||||||
ArrayPrototypeJoin, | ||||||||||||||||||||
ArrayPrototypePush, | ||||||||||||||||||||
FunctionPrototypeBind, | ||||||||||||||||||||
FunctionPrototypeCall, | ||||||||||||||||||||
ObjectAssign, | ||||||||||||||||||||
ObjectCreate, | ||||||||||||||||||||
ObjectKeys, | ||||||||||||||||||||
ObjectPrototypeHasOwnProperty, | ||||||||||||||||||||
ObjectSetPrototypeOf, | ||||||||||||||||||||
PromiseAll, | ||||||||||||||||||||
RegExpPrototypeExec, | ||||||||||||||||||||
SafeArrayIterator, | ||||||||||||||||||||
SafeWeakMap, | ||||||||||||||||||||
StringPrototypeEndsWith, | ||||||||||||||||||||
globalThis, | ||||||||||||||||||||
} = primordials; | ||||||||||||||||||||
const { MessageChannel } = require('internal/worker/io'); | ||||||||||||||||||||
|
@@ -45,6 +49,9 @@ const { | |||||||||||||||||||
initializeImportMeta | ||||||||||||||||||||
} = require('internal/modules/esm/initialize_import_meta'); | ||||||||||||||||||||
const { defaultLoad } = require('internal/modules/esm/load'); | ||||||||||||||||||||
const { | ||||||||||||||||||||
defaultGetFileSystem | ||||||||||||||||||||
} = require('internal/modules/esm/get_file_system'); | ||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the plan to only use this in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the moment the other hooks are only used in the esm code path so I didn't want to change that since it's not required strictly speaking. A followup PR however will be to consider |
||||||||||||||||||||
const { translators } = require( | ||||||||||||||||||||
'internal/modules/esm/translators'); | ||||||||||||||||||||
const { getOptionValue } = require('internal/options'); | ||||||||||||||||||||
|
@@ -81,6 +88,14 @@ class ESMLoader { | |||||||||||||||||||
defaultResolve, | ||||||||||||||||||||
]; | ||||||||||||||||||||
|
||||||||||||||||||||
/** | ||||||||||||||||||||
* @private | ||||||||||||||||||||
* @property {Function[]} resolvers First-in-first-out list of resolver hooks | ||||||||||||||||||||
|
* @property {Function[]} resolvers First-in-first-out list of resolver hooks | |
* @property {Function[]} fileSystemBuilders First-in-first-out list of file system utilities compositors |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the sequence of these was intended to follow their operational sequence. It's not terribly important to keep that, but just FYI there was a rhyme to this reason 🙂
If you were to persist that, I think getFileSystem
would come either before globalPreload
or between globalPreload
and resolve
.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shallow-clone this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear which guarantees are made as to whether addCustomLoaders
can/may ever be called from userland code, so I prefer to make the code handle that in a consistent way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong, but I believe it is not possible to access esmLoader directly from user-land without something very naughty like --expose-internals
. But also, esmLoader#fileSystemBuilders
is private, so it's inaccessible anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary til chaining is finalised, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I left a comment about that right before this snippet; I mostly wanted to show one working example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Prefixing the increment operator is not the same as postfixing it and can lead to unexpected results that are rarely intentional. It doesn't make a difference with the code as currently written, but it easily could with a number of a small changes, and the cause is very easy to miss. So the postfix is generally preferred unless the prefix is specifically needed.
for (let i = 1; i < fileSystemFactories.length; ++i) { | |
for (let i = 1; i < fileSystemFactories.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh that's the first time I hear this - preincrementation are usually "preferred" in for loops since they don't require a temporary variable, although any decent compiler like v8 will surely treat them equally in this instance.
Do you have a particular case in mind where preincrementation leads to problems in a for loop expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this a function that returns the current FS utils rather than just passing the current utils? (Ex this doesn't protect against mutating currentFileSystem
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly to match the signature of other hooks, which provide the next hook function rather than their results (granted, they accept inputs whereas this one doesn't, so it could be possible, but by consistency I feel more comfortable keeping the same pattern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually this should do a copy operation rather than direct reference to avoid people purposefully doing mutation becoming breaking change worries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure to throw though if a specific loader implements an async hook but not a sync hook for the same method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I woulda chosen k
for "key" 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a key, though, but the index of a key. I looked at the existing codebase to pick the iterator name and saw many references to i,j
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merely a side comment—I wasn't suggesting a change. Sorry for the confusion. I use k
as an index in a set of keys
for (const key of keys) // key = 'foo'
vs
for (let k = 0; k < keys.length; k++) // key = keys[k] = 'foo'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not make our user expect a promise here, if/when we end up getting rid of the sync methods in this hook, loader hook authors should expect the previous hook may have passed a synchronous function:
fileSystem[asyncKey] = async (...args) => { | |
return fileSystem[syncKey](...args); | |
}; | |
fileSystem[asyncKey] = fileSystem[syncKey]; |
and if we want it to return a promise, we still need to avoid the spread operator on arrays (which relies on globally mutable Array.prototype[Symbol.iterator]
):
fileSystem[asyncKey] = async (...args) => { | |
return fileSystem[syncKey](...args); | |
}; | |
fileSystem[asyncKey] = | |
async (...args) => ReflectApply(fileSystem[syncKey], fileSystem, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo it'd be best to attempt to keep a consistent API - otherwise loader authors would be effectively forced to do:
const content = await Promise.resolve(fs.readFile(path));
The extra Promise.resolve
wrapper wouldn't look very idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution is perhaps to just drop this "automatically bind sync functions into async slots", since it comes with its own drawbacks (it can be seen as a footgun, since it makes all async filesystem operations silently turn sync).
I'm not that attached to it, so dropping it would be fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise loader authors would be effectively forced to do:
I don't think that this is correct, you can await non-Promise objects:
console.log({}); // {}
console.log(await {}); // {}
console.log(await Promise.resolve({})); // {}
I don't feel strongly about this, but imho it wouldn't be too bad as an API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: being able to await
a non-promise was explicitly for this scenario (can find the citation if needed).
Uh oh!
There was an error while loading. Please reload this page.