Skip to content

Potential background process leak when requiring at top-level #59

@agapedimas

Description

@agapedimas

I encountered a potential issue with this module causing extra background processes to spawn and persist.

const officeParser = require("officeparser");

at the top-level of my Node.js app (e.g., in index.js), the number of background processes noticeably increases (from ~10 to 22 in my case), even before any actual parsing is done. I tried moving the require into an inner function instead, but it seems to have no effect because the extra processes still appear as soon as the function is first invoked, and remain until the app exits. The only reliable workaround I’ve found is to completely isolate the parsing in a child process, then kill it after parsing is complete. This is what I’m currently doing:

// convert.js
const officeParser = require("officeparser");

(async () => {
	const [filePath] = process.argv.slice(2);
	const content = await officeParser.parseOfficeAsync(filePath);
	process.send(content);
})();
// index.js
const { fork } = require("child_process");
const path = require("path");

function convert(filePath) {
	return new Promise((resolve, reject) => {
		const child = fork(path.resolve(__dirname, "convert.js"), [filePath], {
			silent: true
		});

		child.on("message", (data) => {
			resolve(data);
			child.kill();
		});

		child.on("error", reject);
		child.on("exit", (code) => {
			if (code !== 0) reject(new Error("Convert failed"));
		});
	});
}

This works fine, but I wonder if there’s something in the module that could be cleaned up or made optional to avoid this behavior. Ideally, require("officeparser") alone shouldn’t spawn or hold onto long-lived resources before any function is even called.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions