Skip to content
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions demo/foo/index.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
console.log(`foo`);
5 changes: 5 additions & 0 deletions demo/index.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import './foo/index.mjs';
import hash from './foo/index-sha512.mjs';

console.log(`demo`);
console.log(`demo hash:`, hash);
50 changes: 50 additions & 0 deletions demo/loader-sha512.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import crypto from 'crypto';
import path from 'path';

const shaRegExp = /-sha512(\.mjs)$/;

function getSourcePath(p) {
if (p.protocol !== `file:`)
return p;

const pString = p.toString();
const pFixed = pString.replace(shaRegExp, `$1`);
if (pFixed === pString)
return p;

return new URL(pFixed);
}

export function getFileSystem(defaultGetFileSystem) {
const fileSystem = defaultGetFileSystem();

return {
readFileSync(p) {
const fixedP = getSourcePath(p);
if (fixedP === p)
return fileSystem.readFileSync(p);

const content = fileSystem.readFileSync(fixedP);
const hash = crypto.createHash(`sha512`).update(content).digest(`hex`);

return Buffer.from(`export default ${JSON.stringify(hash)};`);
},

statEntrySync(p) {
const fixedP = getSourcePath(p);
return fileSystem.statEntrySync(fixedP);
},

realpathSync(p) {
const fixedP = getSourcePath(p);
if (fixedP === p)
return fileSystem.realpathSync(p);

const realpath = fileSystem.realpathSync(fixedP);
if (path.extname(realpath) !== `.mjs`)
throw new Error(`Paths must be .mjs extension to go through the sha512 loader`);

return realpath.replace(/\.mjs$/, `-sha512.mjs`);
},
};
}
69 changes: 69 additions & 0 deletions lib/internal/modules/esm/get_file_system.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
'use strict';

const {
FunctionPrototypeBind,
ObjectCreate,
ObjectKeys,
SafeMap,
} = primordials;

const realpathCache = new SafeMap();

const internalFS = require('internal/fs/utils');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fsUtils might be a better name for this. Otherwise it sounds like internal vs external/public versions, but I think the important distinction is that these are utilities.

const fs = require('fs');
const fsPromises = require('internal/fs/promises').exports;
Comment on lines +12 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use destructuring here I think, otherwise monkey-patching of fs module would impact this module.

const packageJsonReader = require('internal/modules/package_json_reader');
const { fileURLToPath } = require('url');
const { internalModuleStat } = internalBinding('fs');

const implementation = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this whole thing is the implementation

Suggested change
const implementation = {
const defaultFileSystem = {

But I think this might be better as a class (like Guy suggests)

async readFile(p) {
return fsPromises.readFile(p);
},

async statEntry(p) {
return internalModuleStat(fileURLToPath(p));
},

async readJson(p) {
return packageJsonReader.read(fileURLToPath(p));
},
Comment on lines +27 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should this be used to read json files that are not a package.json? If not, I think the name should be more explicit:

Suggested change
async readJson(p) {
return packageJsonReader.read(fileURLToPath(p));
},
async readPackageJson(p) {
return packageJsonReader.read(fileURLToPath(p));
},


async realpath(p) {
return fsPromises.realpath(p, {
[internalFS.realpathCacheKey]: realpathCache
});
},

readFileSync(p) {
return fs.readFileSync(p);
},

statEntrySync(p) {
return internalModuleStat(fileURLToPath(p));
},

readJsonSync(p) {
return packageJsonReader.read(fileURLToPath(p));
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't just have readJson as a well-defined abstraction on top of readFile? Package virtualization can still work by returning serialized JSON - which is likely needed anyway with threading workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make perf analysis on internalReadJson, but I'd intuitively agree that a JS abstraction over readFile (that would just apply a regex to get the containsKey value) would fast enough without requiring a dedicated native function. I can experiment with that. Or do you have something else in mind?


realpathSync(p) {
return fs.realpathSync(p, {
[internalFS.realpathCacheKey]: realpathCache
});
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory realpath can be implemented on top just an lstat and stat implementation, although that would be a slightly more complex refactoring of the realpath algorithm. It might well simplify the model though in just having three base-level FS primitive hooks to virtualize.

Copy link
Contributor Author

@arcanis arcanis Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about that - it'd likely be much slower, and the semantic loss could bring unforeseen issues. I'd not feel confident dropping it in this PR 🤔

};

function defaultGetFileSystem(defaultGetFileSystem) {
const copy = ObjectCreate(null);

const keys = ObjectKeys(implementation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can't/don't change, and they're also used in ESMLoader, so I think this should be moved outside defaultGetFileSystem() and be exported to avoid needlessly re-compiling this list.

for (let t = 0; t < keys.length; ++t)
copy[keys[t]] = FunctionPrototypeBind(implementation[keys[t]], null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make the implementation a class so this is just instantiation?

One benefit of that would be treating the realpathcache as an instance property so that would give it a proper lifetime.

That does mean we have to think a bit more carefully about the lifetimes of the hook though too - this function is only called once though currently right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at that, I think an object should be preferred over a class: first because we don't really use class features (in particular we can't use this since it'd break destructuring), but more importantly because it'd prevent the default fs functions from being observable via ObjectKeys. Given that it's currently the source of truth for the loader code to know what are the supported fs functions, using a class would require further special casing which I think makes the object form preferable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see pros and cons to both, but re-creating native functionality sounds slightly more on the con-side.

You could create an instance (the default instance / singleton) and get the keys from it:

class FSLoaderUtils { /* … */ }

const defaultFSLoaderUtils = new FSLoaderUtils();
const defaultFSLoaderUtilNames = ObjectKeys(defaultFSLoaderUtils);

function defaultGetFSLoaderUtils() {
  return {
    defaultFSLoaderUtils,
    defaultFSLoaderUtilNames,
  };
}


return copy;
}

module.exports = {
defaultGetFileSystem,
};
6 changes: 3 additions & 3 deletions lib/internal/modules/esm/get_source.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,28 @@ const {
decodeURIComponent,
} = primordials;
const { getOptionValue } = require('internal/options');
const esmLoader = require('internal/process/esm_loader');

// Do not eagerly grab .manifest, it may be in TDZ
const policy = getOptionValue('--experimental-policy') ?
require('internal/process/policy') :
null;

const { Buffer } = require('buffer');

const fs = require('internal/fs/promises').exports;
const { URL } = require('internal/url');
const {
ERR_INVALID_URL,
ERR_INVALID_URL_SCHEME,
} = require('internal/errors').codes;
const readFileAsync = fs.readFile;

const DATA_URL_PATTERN = /^[^/]+\/[^,;]+(?:[^,]*?)(;base64)?,([\s\S]*)$/;

async function defaultGetSource(url, { format } = {}, defaultGetSource) {
const parsed = new URL(url);
let source;
if (parsed.protocol === 'file:') {
source = await readFileAsync(parsed);
source = await esmLoader.getFileSystem().readFile(parsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be preferable to have this passed in as a param instead of grabbing off of the application context esmLoader (there can be multiple Loaders).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case does that happen? It could be a blocker for the node:resolution accessors I mentioned earlier, as it relies on loader utils being singletons.

If that's the case of the "loader loader" that @JakobJingleheimer mentions there, since from what I understand they aren't used at the same time, perhaps the global access would be fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I THINK their sequence is fully serial: the node-land esmLoaderNodeland finishes its work and disappears into the void, and then esmLoaderUserland steps in and starts its work.

Aside from Bradley's concern of bloating params, I think it would be better for this to be passed (and also makes it easier to test). One of our Loader team's side/wish-list projects is to make ESMLoader pure, and this goes against that. Not necessarily a show-stopper, but something to mention: If we do decide to make it global and preclude fully pure, we should do it consciously.

} else if (parsed.protocol === 'data:') {
const match = RegExpPrototypeExec(DATA_URL_PATTERN, parsed.pathname);
if (!match) {
Expand Down
74 changes: 74 additions & 0 deletions lib/internal/modules/esm/loader.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,21 @@ require('internal/modules/cjs/loader');
const {
Array,
ArrayIsArray,
ArrayPrototypeFilter,
ArrayPrototypeJoin,
ArrayPrototypePush,
FunctionPrototypeBind,
FunctionPrototypeCall,
ObjectAssign,
ObjectCreate,
ObjectKeys,
ObjectPrototypeHasOwnProperty,
ObjectSetPrototypeOf,
PromiseAll,
RegExpPrototypeExec,
SafeArrayIterator,
SafeWeakMap,
StringPrototypeEndsWith,
globalThis,
} = primordials;
const { MessageChannel } = require('internal/worker/io');
Expand Down Expand Up @@ -45,6 +49,9 @@ const {
initializeImportMeta
} = require('internal/modules/esm/initialize_import_meta');
const { defaultLoad } = require('internal/modules/esm/load');
const {
defaultGetFileSystem
} = require('internal/modules/esm/get_file_system');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to only use this in the esm/loader.js and not in the CJS loader? That seems like it might limit the applicability of these hooks, especially when CJS interfaces with eg exports resolutions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the other hooks are only used in the esm code path so I didn't want to change that since it's not required strictly speaking. A followup PR however will be to consider getFileSystem for cjs.

const { translators } = require(
'internal/modules/esm/translators');
const { getOptionValue } = require('internal/options');
Expand Down Expand Up @@ -81,6 +88,14 @@ class ESMLoader {
defaultResolve,
];

/**
* @private
* @property {Function[]} resolvers First-in-first-out list of resolver hooks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @property {Function[]} resolvers First-in-first-out list of resolver hooks
* @property {Function[]} fileSystemBuilders First-in-first-out list of file system utilities compositors

*/
#fileSystemBuilders = [
defaultGetFileSystem,
];

#importMetaInitializer = initializeImportMeta;

/**
Expand All @@ -107,6 +122,7 @@ class ESMLoader {
globalPreload,
resolve,
load,
getFileSystem,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the sequence of these was intended to follow their operational sequence. It's not terribly important to keep that, but just FYI there was a rhyme to this reason 🙂

If you were to persist that, I think getFileSystem would come either before globalPreload or between globalPreload and resolve.

// obsolete hooks:
dynamicInstantiate,
getFormat,
Expand Down Expand Up @@ -159,10 +175,17 @@ class ESMLoader {
if (load) {
acceptedHooks.loader = FunctionPrototypeBind(load, null);
}
if (getFileSystem) {
acceptedHooks.getFileSystem = FunctionPrototypeBind(getFileSystem, null);
}

return acceptedHooks;
}

constructor() {
this.buildFileSystem();
}

/**
* Collect custom/user-defined hook(s). After all hooks have been collected,
* calls global preload hook(s).
Expand All @@ -180,6 +203,7 @@ class ESMLoader {
globalPreloader,
resolver,
loader,
getFileSystem,
} = ESMLoader.pluckHooks(exports);

if (globalPreloader) ArrayPrototypePush(
Expand All @@ -194,13 +218,63 @@ class ESMLoader {
this.#loaders,
FunctionPrototypeBind(loader, null), // [1]
);
if (getFileSystem) ArrayPrototypePush(
this.#fileSystemBuilders,
FunctionPrototypeBind(getFileSystem, null), // [1]
);
}

// [1] ensure hook function is not bound to ESMLoader instance

this.buildFileSystem();
this.preload();
}

buildFileSystem() {
// Note: makes assumptions as to how chaining will work to demonstrate
// the capability; subject to change once chaining's API is finalized.
const fileSystemFactories = [...this.#fileSystemBuilders];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shallow-clone this list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear which guarantees are made as to whether addCustomLoaders can/may ever be called from userland code, so I prefer to make the code handle that in a consistent way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but I believe it is not possible to access esmLoader directly from user-land without something very naughty like --expose-internals. But also, esmLoader#fileSystemBuilders is private, so it's inaccessible anyway.


const defaultFileSystemFactory = fileSystemFactories[0];
let finalFileSystem =
defaultFileSystemFactory();

const asyncKeys = ArrayPrototypeFilter(
ObjectKeys(finalFileSystem),
(name) => !StringPrototypeEndsWith(name, 'Sync'),
);
Comment on lines +240 to +247
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary til chaining is finalised, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I left a comment about that right before this snippet; I mostly wanted to show one working example


for (let i = 1; i < fileSystemFactories.length; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Prefixing the increment operator is not the same as postfixing it and can lead to unexpected results that are rarely intentional. It doesn't make a difference with the code as currently written, but it easily could with a number of a small changes, and the cause is very easy to miss. So the postfix is generally preferred unless the prefix is specifically needed.

Suggested change
for (let i = 1; i < fileSystemFactories.length; ++i) {
for (let i = 1; i < fileSystemFactories.length; i++) {

Copy link
Contributor Author

@arcanis arcanis Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh that's the first time I hear this - preincrementation are usually "preferred" in for loops since they don't require a temporary variable, although any decent compiler like v8 will surely treat them equally in this instance.

Do you have a particular case in mind where preincrementation leads to problems in a for loop expression?

const currentFileSystem = finalFileSystem;
const fileSystem = fileSystemFactories[i](() => currentFileSystem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this a function that returns the current FS utils rather than just passing the current utils? (Ex this doesn't protect against mutating currentFileSystem)

Copy link
Contributor Author

@arcanis arcanis Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to match the signature of other hooks, which provide the next hook function rather than their results (granted, they accept inputs whereas this one doesn't, so it could be possible, but by consistency I feel more comfortable keeping the same pattern).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this should do a copy operation rather than direct reference to avoid people purposefully doing mutation becoming breaking change worries.


// If the loader specifies a sync hook but omits the async one we
// leverage the sync version by default, so that most hook authors
// don't have to write their implementations twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure to throw though if a specific loader implements an async hook but not a sync hook for the same method?

for (let j = 0; j < asyncKeys.length; ++j) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I woulda chosen k for "key" 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a key, though, but the index of a key. I looked at the existing codebase to pick the iterator name and saw many references to i,j

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merely a side comment—I wasn't suggesting a change. Sorry for the confusion. I use k as an index in a set of keys

for (const key of keys) // key = 'foo'

vs

for (let k = 0; k < keys.length; k++) // key = keys[k] = 'foo'

const asyncKey = asyncKeys[j];
const syncKey = `${asyncKey}Sync`;

if (
!ObjectPrototypeHasOwnProperty(fileSystem, asyncKey) &&
ObjectPrototypeHasOwnProperty(fileSystem, syncKey)
) {
fileSystem[asyncKey] = async (...args) => {
return fileSystem[syncKey](...args);
};
Comment on lines +267 to +269
Copy link
Contributor

@aduh95 aduh95 Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not make our user expect a promise here, if/when we end up getting rid of the sync methods in this hook, loader hook authors should expect the previous hook may have passed a synchronous function:

Suggested change
fileSystem[asyncKey] = async (...args) => {
return fileSystem[syncKey](...args);
};
fileSystem[asyncKey] = fileSystem[syncKey];

and if we want it to return a promise, we still need to avoid the spread operator on arrays (which relies on globally mutable Array.prototype[Symbol.iterator]):

Suggested change
fileSystem[asyncKey] = async (...args) => {
return fileSystem[syncKey](...args);
};
fileSystem[asyncKey] =
async (...args) => ReflectApply(fileSystem[syncKey], fileSystem, args);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo it'd be best to attempt to keep a consistent API - otherwise loader authors would be effectively forced to do:

const content = await Promise.resolve(fs.readFile(path));

The extra Promise.resolve wrapper wouldn't look very idiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution is perhaps to just drop this "automatically bind sync functions into async slots", since it comes with its own drawbacks (it can be seen as a footgun, since it makes all async filesystem operations silently turn sync).

I'm not that attached to it, so dropping it would be fine by me.

Copy link
Contributor

@aduh95 aduh95 Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise loader authors would be effectively forced to do:

I don't think that this is correct, you can await non-Promise objects:

console.log({}); // {}
console.log(await {}); // {}
console.log(await Promise.resolve({})); // {}

I don't feel strongly about this, but imho it wouldn't be too bad as an API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: being able to await a non-promise was explicitly for this scenario (can find the citation if needed).

}
}

finalFileSystem = ObjectAssign(
ObjectCreate(null),
currentFileSystem,
fileSystem,
);
}

this.fileSystem = finalFileSystem;
}

async eval(
source,
url = pathToFileURL(`${process.cwd()}/[eval${++this.evalIndex}]`).href
Expand Down
Loading