Skip to content

fix(scrapy): async-thread shutdown, duplicate error logs, and timeout setting#980

Closed
vdusek wants to merge 3 commits into
masterfrom
fix/scrapy-async-thread-lifecycle
Closed

fix(scrapy): async-thread shutdown, duplicate error logs, and timeout setting#980
vdusek wants to merge 3 commits into
masterfrom
fix/scrapy-async-thread-lifecycle

Conversation

@vdusek

@vdusek vdusek commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Fixes several smaller defects in the Scrapy integration's background event-loop thread, scheduler, and HTTP cache storage.

  • HTTP cache cleanup could leak the event-loop thread. In ApifyCacheStorage.close_spider, the expiration sweep ran outside the try, so a failure there skipped AsyncThread.close(). The sweep now runs inside try, with close() in a finally so it always runs.
  • AsyncThread.close() raised on a second call. A repeated close (e.g. ApifyScheduler.open() closing on failure, then Scrapy closing again) called into the already-closed loop and raised RuntimeError: Event loop is closed. Added an is_closed() early-return guard so a second close is a no-op.
  • AsyncThread.close() ignored its timeout argument for the task-cancellation step (it used the constructor default). It now passes the caller's timeout to that step too.
  • run_coro timeout left the coroutine running. On timeout it now cancels the future so the coroutine does not outlive the timeout.
  • Scheduler errors were reported 2–3×. run_coro logged-and-raised and each scheduler call added traceback.print_exc() (which bypasses ActorLogFormatter), on top of Scrapy's own logging. Removed the redundant traceback.print_exc() wrappers and run_coro's log-and-raise; errors now surface once, formatted, via Scrapy.
  • refactor(scrapy): make AsyncThread timeout configurable #955's configurable timeout was unreachable by users. Added the APIFY_ASYNC_THREAD_TIMEOUT_SECS Scrapy setting, wired into the scheduler (via from_crawler) and the cache storage so the AsyncThread timeout is actually configurable.

Adds unit tests for each fix (new tests/unit/scrapy/test_async_thread.py plus additions to the scheduler and HTTP cache test modules).

@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Jun 12, 2026
@vdusek vdusek self-assigned this Jun 12, 2026
@github-actions github-actions Bot added this to the 142nd sprint - Tooling team milestone Jun 12, 2026
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.27273% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.01%. Comparing base (0daca28) to head (50aa539).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/apify/scrapy/extensions/_httpcache.py 72.22% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #980      +/-   ##
==========================================
+ Coverage   89.90%   91.01%   +1.10%     
==========================================
  Files          49       49              
  Lines        3091     3116      +25     
==========================================
+ Hits         2779     2836      +57     
+ Misses        312      280      -32     
Flag Coverage Δ
e2e 35.62% <0.00%> (-0.29%) ⬇️
integration 56.41% <0.00%> (-0.46%) ⬇️
unit 79.94% <77.27%> (+1.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vdusek

vdusek commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Consolidated into #979, which now carries these lifecycle fixes on top of the startup-race fix. Closing in favor of #979.

@vdusek vdusek closed this Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants