Skip to content

[pull] master from apify:master#175

Merged
pull[bot] merged 3 commits intozanachka:masterfrom
apify:master
Feb 26, 2026
Merged

[pull] master from apify:master#175
pull[bot] merged 3 commits intozanachka:masterfrom
apify:master

Conversation

@pull
Copy link

@pull pull bot commented Feb 26, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

barjin and others added 3 commits February 24, 2026 17:10
At the moment lint-staged is installed but it is just dangling in repo
and it is not used. Pre-commit hook runs `yarn format` on whole codebase
instead of staged files. This also create state where after commit
multiple files can be changed, usually followed by `lint:fix` commit (or
similar)

After this merged:
- lint-staged will run biome --format only on staged files,
automatically add changes to commit (if any)
- as it not working on a whole repo, execution should be even faster
…#3434)

In v3, `discoverValidSitemaps` could occasionally hang during
initialization (before crawler startup), especially on proxy-heavy
targets used by Website Content Crawler.

Root cause:
Discovery requests (`GET /robots.txt` and `HEAD` sitemap checks) used
default `got-scraping` behavior. In this path, HTTP/2 + browser-header
generation could become unstable and stall on some targets/proxy
combinations.

What changed:
Updated `discoverValidSitemaps` internals in
`packages/utils/src/internals/sitemap.ts`.
Added dedicated discovery request options:
  - `http2: false`
  - `useHeaderGenerator: false`
  
  Applied these options consistently to:
  - robots.txt fetch
  - sitemap candidate `HEAD` checks

Note: this PR intentionally keeps got-scraping since we’re on v3; this
gives us a minimal, safer fix for the hang without replacing the HTTP
stack or introducing broader regressions.

Tested on local with patched `@crawlee/utils`

Closes #3412
@pull pull bot locked and limited conversation to collaborators Feb 26, 2026
@pull pull bot added the ⤵️ pull label Feb 26, 2026
@pull pull bot merged commit 6c04f92 into zanachka:master Feb 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants