-
-
Notifications
You must be signed in to change notification settings - Fork 127
Open
Description
The URL https://www.ra-design.at leads to corrupted/defect page
But if this is not the only seed the next seeds will not be crawled anymore due a internal Browser Crash
Version 1.9.2
Expected: Skip the defect seed and go on with the other seeds of the Crawl.
Minimum Example to reproduce Browser Crash
docker run -d --name test_seed_ra_design -v /home/antares/browsertrix/crawls/:/crawls/ webrecorder/browsertrix-crawler:1.9.2 crawl --scopeType page --depth 0 --headless --delay 0 --behaviorTimeout 60 --pageLoadTimeout 60 --waitUntil networkidle0 --saveState always --logging stats,info --sitemap --url https://www.ra-design.at
Logs:
{"timestamp":"2025-12-01T11:26:06.275Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 1.9.2 (with warcio.js 2.4.7)","details":{}}
{"timestamp":"2025-12-01T11:26:06.276Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"https://www.ra-design.at/","scopeType":"page","include":[],"exclude":[],"allowHash":false,"depth":-1,"sitemap":"<detect>","auth":null,"_authEncoded":null,"maxExtraHops":0,"maxDepth":0}]}
{"timestamp":"2025-12-01T11:26:06.276Z","logLevel":"info","context":"general","message":"Link Selectors","details":[{"selector":"a[href]","extract":"href","isAttribute":false}]}
{"timestamp":"2025-12-01T11:26:06.276Z","logLevel":"info","context":"general","message":"Behavior Options","details":{"message":"{\"autoplay\":true,\"autofetch\":true,\"autoscroll\":true,\"siteSpecific\":true,\"log\":\"__bx_log\",\"startEarly\":true,\"clickSelector\":\"a\"}"}}
{"timestamp":"2025-12-01T11:26:06.314Z","logLevel":"info","context":"sitemap","message":"Fetching sitemap","details":{"from":"<any date>","to":"<any date>"}}
{"timestamp":"2025-12-01T11:26:36.333Z","logLevel":"error","context":"sitemap","message":"Sitemap initial fetch timed out","details":{"seconds":30,"sitemap":"<detect>","seed":"https://www.ra-design.at/"}}
{"timestamp":"2025-12-01T11:26:36.736Z","logLevel":"info","context":"worker","message":"Creating 1 workers","details":{}}
{"timestamp":"2025-12-01T11:26:36.736Z","logLevel":"info","context":"worker","message":"Worker starting","details":{"workerid":0}}
{"timestamp":"2025-12-01T11:26:36.851Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.ra-design.at/"}}
{"timestamp":"2025-12-01T11:26:36.851Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-12-01T11:26:36.737Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.ra-design.at\\/\",\"added\":\"2025-12-01T11:26:06.313Z\",\"depth\":0}"]}}
{"timestamp":"2025-12-01T11:26:37.061Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.ra-design.at/","workerid":0}}
{"timestamp":"2025-12-01T11:27:09.885Z","logLevel":"error","context":"browser","message":"Browser disconnected (crashed?), interrupting crawl","details":{}}
{"timestamp":"2025-12-01T11:27:09.886Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://www.ra-design.at/","networkId":"1EE3EA76E3707327D5F0DAE0102202CE","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:79:36)\n at CdpCDPSession.onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CdpSession.js:112:25)\n at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:186:21)\n at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:42:30)\n at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n at WebSocket.emit (node:events:524:28)\n at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n at Socket.emit (node:events:524:28)","page":"https://www.ra-design.at/","workerid":0}}
{"timestamp":"2025-12-01T11:27:09.886Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigating frame was detached","url":"https://www.ra-design.at/","loadState":0,"page":"https://www.ra-design.at/","workerid":0}}
{"timestamp":"2025-12-01T11:27:09.901Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2025-12-01T11:27:09.909Z","logLevel":"info","context":"general","message":"Saving crawl state to: /crawls/collections/crawl-20251201112606244/crawls/20251201112709906-5da9cce6b5d1-crawl-20251201112606244.yaml","details":{}}
{"timestamp":"2025-12-01T11:27:09.910Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-12-01T11:27:09.910Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-12-01T11:27:09.911Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Triage