Skip to content

Conversation

@barjin
Copy link
Member

@barjin barjin commented Jan 6, 2026

BasicCrawler.stop() calls asynchronous functions without awaiting them, which can cause unexpected race conditions. This PR ensures that multiple .stop() calls only result in one AutoscaledPool.abort() call and that the .stop()-induced promises are resolved before the main BasicCrawler.run() call resolves.

Closes #3257

@barjin barjin requested review from B4nan and janbuchar January 6, 2026 14:07
@barjin barjin self-assigned this Jan 6, 2026
@github-actions github-actions bot added this to the 131st sprint - Tooling team milestone Jan 6, 2026
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 6, 2026
.catch((err) => {
this.log.error('An error occurred when stopping the crawler:', err);
});
if (!this.stoppingPromise) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much more complex than what we have in the Python version (stop() sets a boolean flag and is_finished_function picks it up and terminates the AutoscaledPool). Makes me think... what is the advantage of the approach that we use here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe both these solutions were developed around the same time, independently. I agree the Python implementation is considerably simpler, let's go with that (I'll edit this PR).

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nits

@barjin
Copy link
Member Author

barjin commented Jan 7, 2026

Thank you both for your reviews, my answer to all the comments is trying to conform eslint rules :)

E.g. without as any, tsc thinks that this.promise is undefined because of the assignment (line 9 below), regardless what's happening between the assignment and the property read. It almost feels like a bug / design flaw of TS(C).

image

I'll go with the flag-solution from Python as mentioned here anyway, so all of this should be irrelevant now.

@barjin
Copy link
Member Author

barjin commented Jan 7, 2026

The most recent commits clone the Python implementation from apify/crawlee-python#807 including the logged messages.

@barjin barjin requested review from B4nan and janbuchar January 7, 2026 08:07
@barjin barjin changed the title fix: await multiple BasicCrawler.stop() calls correctly fix: handle multiple BasicCrawler.stop() calls correctly Jan 7, 2026
this.log.info(
'The crawler has finished all the remaining ongoing requests and will shut down now.',
);
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also set shouldLogShuttingDown = false here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once isFinishedFunction returns true, it's not called again (and neither is the isTaskReadyFunction, since AutoscaledPool.run() will resolve).

const isFinished = await this.isFinishedFunction();
if (isFinished && this.resolve) this.resolve();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better safe than sorry? I wouldn't bet on the internals of AutoscaledPool here

@barjin barjin requested a review from janbuchar January 8, 2026 11:42
@barjin barjin merged commit 9c0580b into master Jan 8, 2026
9 checks passed
@barjin barjin deleted the fix/track-stop-call-promise branch January 8, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restarted crawler is stalling indefinitely on Apify Platform at random

4 participants