Skip to content

Conversation

@abendi
Copy link
Contributor

@abendi abendi commented Sep 19, 2025

In-memory async-lock works nicely when testcontainers code is used in a single Node process, but when multiple processes are used then, for instance, reusable containers can run into race conditions.
We're using jest for testing and it runs test files in separate workers that don't share the same module cache and thus bypass in-memory locks, because they don't "see" each other. The race condition we're seeing is that when one of the workers is in the middle of starting a Postgres container (containerStarted lifecycle hook hasn't completed yet, where we initialize the database), then other worker(s) think that the pending container is already ready for use and once queries are made against the uninitialized DB, errors are propagated back.

The workaround we came up with was to implement file locking in the container itself, something along the lines of this:

public class PostgresContainer extends GenericContainer {

  async start() {
    const startedContainer = await super.start();
    await startedContainer.exec(['/bin/sh', '-c', 'while [ ! -f /initialize.done ]; do sleep 0.1; done']);
    return startedContainer;
  }

  async containerStarted(startedContainer) {
    await initializeDatabase();
    await startedContainer.exec(['/bin/sh', '-c', 'touch /initialize.done']);
  }
}

But then I figured why not try to fix it in the library itself, for everyone. And so this PR proposes to replace all usages of async-lock with proper-lockfile, which apparently was already used in a few places via withFileLock. Something that scares me a little bit there is the 3 second maxTimeout, which can add extra delay to various callsites, depending on how unlucky one gets with the timings and race conditions. Maybe I should also make this timeout a little bit more aggressive?

If you'd like I can also add a test to showcase the fix, but it can get quite messy as I'd have to use child_process.fork + IPC to synchronize with the main process to correctly simulate a race condition in parallel process setup (unless there's some cleaner way that I'm not aware of?).

@netlify
Copy link

netlify bot commented Sep 19, 2025

Deploy Preview for testcontainers-node ready!

Name Link
🔨 Latest commit 5f064bb
🔍 Latest deploy log https://app.netlify.com/projects/testcontainers-node/deploys/68cdd9ccd37c120008a996a3
😎 Deploy Preview https://deploy-preview-1140--testcontainers-node.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

log.debug(`Container reuse has been enabled with hash "${containerHash}"`);

return reusableContainerCreationLock.acquire(containerHash, async () => {
return withFileLock(`testcontainers-node-reuse-${containerHash.substring(0, 12)}.lock`, async () => {
Copy link
Contributor Author

@abendi abendi Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also worth mentioning that async-lock preserved the order of call queue within a single process, but withFileLock doesn't have that guarantee (especially with randomized delay). Should I look into it or is it a non-issue?

@cristianrgreco
Copy link
Collaborator

cristianrgreco commented Sep 24, 2025

Hi @abendi, thanks for raising. The reason there's a mix of file locks and in-memory locks is for performance. This project used to use Jest and now uses Vitest and so these mechanisms were designed with the concept of separate workers already in mind. For example we use file locks when we need to synchronise across workers (we want a single resource reaper container, and to reuse it if exists), and in-memory when we don't (we don't mind if the work is duplicated, e.g. checking if an image exists).

It seems your issue here is that you're starting and configuring a PG container, which you want to re-use across workers, but you want each worker to wait until the PG container is started. Adding a file lock here is too much, firstly 99.9% of the time the container is not shared across workers, so adding file locks will negatively affect performance most of the time, and secondly this can be worked around by better using the test framework. It seems for your use case you should be using Jest's setupFiles to configure the PG container once before all the tests run, then there isn't an issue. It's more complicated trying to synchronise workers to the state of a single container

@abendi
Copy link
Contributor Author

abendi commented Sep 24, 2025

Thanks for the feedback @cristianrgreco. I agree with your arguments, they make sense. I'll close the PR now.

@abendi abendi closed this Sep 24, 2025
@abendi abendi deleted the replace-in-mem-lock-with-file-lock branch September 24, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants