-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: handle multiple BasicCrawler.stop() calls correctly
#3324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .catch((err) => { | ||
| this.log.error('An error occurred when stopping the crawler:', err); | ||
| }); | ||
| if (!this.stoppingPromise) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much more complex than what we have in the Python version (stop() sets a boolean flag and is_finished_function picks it up and terminates the AutoscaledPool). Makes me think... what is the advantage of the approach that we use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe both these solutions were developed around the same time, independently. I agree the Python implementation is considerably simpler, let's go with that (I'll edit this PR).
B4nan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few nits
|
Thank you both for your reviews, my answer to all the comments is trying to conform eslint rules :) E.g. without
I'll go with the flag-solution from Python as mentioned here anyway, so all of this should be irrelevant now. |
|
The most recent commits clone the Python implementation from apify/crawlee-python#807 including the logged messages. |
BasicCrawler.stop() calls correctlyBasicCrawler.stop() calls correctly
| this.log.info( | ||
| 'The crawler has finished all the remaining ongoing requests and will shut down now.', | ||
| ); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also set shouldLogShuttingDown = false here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once isFinishedFunction returns true, it's not called again (and neither is the isTaskReadyFunction, since AutoscaledPool.run() will resolve).
crawlee/packages/core/src/autoscaling/autoscaled_pool.ts
Lines 677 to 678 in 2037ea9
| const isFinished = await this.isFinishedFunction(); | |
| if (isFinished && this.resolve) this.resolve(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better safe than sorry? I wouldn't bet on the internals of AutoscaledPool here

BasicCrawler.stop()calls asynchronous functions without awaiting them, which can cause unexpected race conditions. This PR ensures that multiple.stop()calls only result in oneAutoscaledPool.abort()call and that the.stop()-induced promises are resolved before the mainBasicCrawler.run()call resolves.Closes #3257