Skip to content

defect Pages may lead to Docker Exit Code 10 #927

@gitreich

Description

@gitreich

The URL https://www.ra-design.at leads to corrupted/defect page
But if this is not the only seed the next seeds will not be crawled anymore due a internal Browser Crash
Version 1.9.2
Expected: Skip the defect seed and go on with the other seeds of the Crawl.

Minimum Example to reproduce Browser Crash
docker run -d --name test_seed_ra_design -v /home/antares/browsertrix/crawls/:/crawls/ webrecorder/browsertrix-crawler:1.9.2 crawl --scopeType page --depth 0 --headless --delay 0 --behaviorTimeout 60 --pageLoadTimeout 60 --waitUntil networkidle0 --saveState always --logging stats,info --sitemap --url https://www.ra-design.at

Logs:


{"timestamp":"2025-12-01T11:26:06.275Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 1.9.2 (with warcio.js 2.4.7)","details":{}}
{"timestamp":"2025-12-01T11:26:06.276Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"https://www.ra-design.at/","scopeType":"page","include":[],"exclude":[],"allowHash":false,"depth":-1,"sitemap":"<detect>","auth":null,"_authEncoded":null,"maxExtraHops":0,"maxDepth":0}]}
{"timestamp":"2025-12-01T11:26:06.276Z","logLevel":"info","context":"general","message":"Link Selectors","details":[{"selector":"a[href]","extract":"href","isAttribute":false}]}
{"timestamp":"2025-12-01T11:26:06.276Z","logLevel":"info","context":"general","message":"Behavior Options","details":{"message":"{\"autoplay\":true,\"autofetch\":true,\"autoscroll\":true,\"siteSpecific\":true,\"log\":\"__bx_log\",\"startEarly\":true,\"clickSelector\":\"a\"}"}}
{"timestamp":"2025-12-01T11:26:06.314Z","logLevel":"info","context":"sitemap","message":"Fetching sitemap","details":{"from":"<any date>","to":"<any date>"}}
{"timestamp":"2025-12-01T11:26:36.333Z","logLevel":"error","context":"sitemap","message":"Sitemap initial fetch timed out","details":{"seconds":30,"sitemap":"<detect>","seed":"https://www.ra-design.at/"}}
{"timestamp":"2025-12-01T11:26:36.736Z","logLevel":"info","context":"worker","message":"Creating 1 workers","details":{}}
{"timestamp":"2025-12-01T11:26:36.736Z","logLevel":"info","context":"worker","message":"Worker starting","details":{"workerid":0}}
{"timestamp":"2025-12-01T11:26:36.851Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.ra-design.at/"}}
{"timestamp":"2025-12-01T11:26:36.851Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-12-01T11:26:36.737Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.ra-design.at\\/\",\"added\":\"2025-12-01T11:26:06.313Z\",\"depth\":0}"]}}
{"timestamp":"2025-12-01T11:26:37.061Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.ra-design.at/","workerid":0}}
{"timestamp":"2025-12-01T11:27:09.885Z","logLevel":"error","context":"browser","message":"Browser disconnected (crashed?), interrupting crawl","details":{}}
{"timestamp":"2025-12-01T11:27:09.886Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://www.ra-design.at/","networkId":"1EE3EA76E3707327D5F0DAE0102202CE","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:79:36)\n    at CdpCDPSession.onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CdpSession.js:112:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:186:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:42:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:524:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:524:28)","page":"https://www.ra-design.at/","workerid":0}}
{"timestamp":"2025-12-01T11:27:09.886Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigating frame was detached","url":"https://www.ra-design.at/","loadState":0,"page":"https://www.ra-design.at/","workerid":0}}
{"timestamp":"2025-12-01T11:27:09.901Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2025-12-01T11:27:09.909Z","logLevel":"info","context":"general","message":"Saving crawl state to: /crawls/collections/crawl-20251201112606244/crawls/20251201112709906-5da9cce6b5d1-crawl-20251201112606244.yaml","details":{}}
{"timestamp":"2025-12-01T11:27:09.910Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-12-01T11:27:09.910Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-12-01T11:27:09.911Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions