-
Notifications
You must be signed in to change notification settings - Fork 9.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🔍 问题检查清单
- 我已经仔细阅读了项目使用过程中的常见问题汇总
- 我已经搜索并查看了已关闭的issues
- 我确认这不是由于滑块验证码、Cookie过期、Cookie提取错误、平台风控等常见原因导致的问题
🐛 问题描述
爬取小红书出错
📝 复现步骤
💻 运行环境
- 操作系统:
- Python版本:
- 是否使用IP代理:
- 是否使用VPN翻墙软件:
- 目标平台(抖音/小红书/微博等):
📋 错误日志
PS C:\Users\Guo\Downloads\MediaCrawler-main> uv run main.py --platform xhs --lt qrcode --type search
2025-11-21 21:47:18 MediaCrawler INFO (core.py:74) - [XiaoHongShuCrawler] 使用CDP模式启动浏览器
2025-11-21 21:47:18 MediaCrawler INFO (cdp_browser.py:142) - [CDPBrowserManager] 检测到浏览器: Microsoft Edge (正在现有浏览器会话中打开。)
2025-11-21 21:47:18 MediaCrawler INFO (cdp_browser.py:145) - [CDPBrowserManager] 浏览器路径: C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
2025-11-21 21:47:18 MediaCrawler INFO (cdp_browser.py:185) - [CDPBrowserManager] 用户数据目录: C:\Users\Guo\Downloads\MediaCrawler-main\browser_data\cdp_xhs_user_data_dir
2025-11-21 21:47:18 MediaCrawler INFO (browser_launcher.py:163) - [BrowserLauncher] 启动浏览器: C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
2025-11-21 21:47:18 MediaCrawler INFO (browser_launcher.py:164) - [BrowserLauncher] 调试端口: 9222
2025-11-21 21:47:18 MediaCrawler INFO (browser_launcher.py:165) - [BrowserLauncher] 无头模式: False
2025-11-21 21:47:18 MediaCrawler INFO (browser_launcher.py:195) - [BrowserLauncher] 等待浏览器在端口 9222 上准备就绪...
2025-11-21 21:47:18 MediaCrawler INFO (browser_launcher.py:204) - [BrowserLauncher] 浏览器已在端口 9222 上准备就绪
2025-11-21 21:47:19 MediaCrawler INFO (cdp_browser.py:159) - [CDPBrowserManager] CDP端口 9222 可访问
2025-11-21 21:47:19 MediaCrawler INFO (cdp_browser.py:78) - [CDPBrowserManager] 清理处理器已注册
2025-11-21 21:47:19 MediaCrawler INFO (cdp_browser.py:223) - [CDPBrowserManager] 获取到浏览器WebSocket URL: ws://localhost:9222/devtools/browser/126647b9-7d25-40e1-8c63-799c1a8f7243
2025-11-21 21:47:19 MediaCrawler INFO (cdp_browser.py:242) - [CDPBrowserManager] 正在通过CDP连接到浏览器: ws://localhost:9222/devtools/browser/126647b9-7d25-40e1-8c63-799c1a8f7243
2025-11-21 21:47:20 MediaCrawler INFO (cdp_browser.py:248) - [CDPBrowserManager] 成功连接到浏览器
2025-11-21 21:47:20 MediaCrawler INFO (cdp_browser.py:249) - [CDPBrowserManager] 浏览器上下文数量: 1
2025-11-21 21:47:20 MediaCrawler INFO (cdp_browser.py:274) - [CDPBrowserManager] 使用现有的浏览器上下文
2025-11-21 21:47:20 MediaCrawler INFO (core.py:437) - [XiaoHongShuCrawler] CDP浏览器信息: {'version': '142.0.3595.90', 'contexts_count': 1, 'debug_port': 9222, 'is_connected': True}
2025-11-21 21:47:25 MediaCrawler INFO (core.py:359) - [XiaoHongShuCrawler.create_xhs_client] Begin create xiaohongshu API client ...
2025-11-21 21:47:25 MediaCrawler INFO (client.py:230) - [XiaoHongShuClient.pong] Begin to pong xhs...
2025-11-21 21:47:29 MediaCrawler ERROR (client.py:237) - [XiaoHongShuClient.pong] Ping xhs failed: RetryError[<Future at 0x1e4d9c52190 state=finished raised JSONDecodeError>], and try to login again...
2025-11-21 21:47:29 MediaCrawler INFO (login.py:71) - [XiaoHongShuLogin.begin] Begin login xiaohongshu ...
2025-11-21 21:47:29 MediaCrawler INFO (login.py:151) - [XiaoHongShuLogin.login_by_qrcode] Begin login xiaohongshu by qrcode ...
2025-11-21 21:47:29 MediaCrawler INFO (login.py:184) - [XiaoHongShuLogin.login_by_qrcode] waiting for scan code login, remaining time is 120s
2025-11-21 21:48:01 MediaCrawler INFO (login.py:192) - [XiaoHongShuLogin.login_by_qrcode] Login successful then wait for 5 seconds redirect ...
2025-11-21 21:48:06 MediaCrawler INFO (core.py:127) - [XiaoHongShuCrawler.search] Begin search xiaohongshu keywords
2025-11-21 21:48:06 MediaCrawler INFO (core.py:134) - [XiaoHongShuCrawler.search] Current search keyword: 编程副业
2025-11-21 21:48:06 MediaCrawler INFO (core.py:144) - [XiaoHongShuCrawler.search] search xhs keyword: 编程副业, page: 1
2025-11-21 21:48:10 MediaCrawler INFO (cdp_browser.py:374) - [CDPBrowserManager] 浏览器连接已断开
2025-11-21 21:48:10 MediaCrawler INFO (browser_launcher.py:255) - [BrowserLauncher] 正在关闭浏览器进程...
2025-11-21 21:48:10 MediaCrawler INFO (browser_launcher.py:285) - [BrowserLauncher] 浏览器进程已关闭
Traceback (most recent call last):
File "C:\Users\Guo\Downloads\MediaCrawler-main\.venv\Lib\site-packages\tenacity\_asyncio.py", line 50, in __call__
result = await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\media_platform\xhs\client.py", line 150, in request
data: Dict = response.json()
^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\.venv\Lib\site-packages\httpx\_models.py", line 832, in json
return jsonlib.loads(self.content, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\AppData\Roaming\uv\python\cpython-3.11.14-windows-x86_64-none\Lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\AppData\Roaming\uv\python\cpython-3.11.14-windows-x86_64-none\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\AppData\Roaming\uv\python\cpython-3.11.14-windows-x86_64-none\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Guo\Downloads\MediaCrawler-main\main.py", line 154, in <module>
asyncio.get_event_loop().run_until_complete(main())
File "C:\Users\Guo\AppData\Roaming\uv\python\cpython-3.11.14-windows-x86_64-none\Lib\asyncio\base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\main.py", line 85, in main
await crawler.start()
File "C:\Users\Guo\Downloads\MediaCrawler-main\media_platform\xhs\core.py", line 113, in start
await self.search()
File "C:\Users\Guo\Downloads\MediaCrawler-main\media_platform\xhs\core.py", line 147, in search
notes_res = await self.xhs_client.get_note_by_keyword(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\media_platform\xhs\client.py", line 286, in get_note_by_keyword
return await self.post(uri, data)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\media_platform\xhs\client.py", line 195, in post
return await self.request(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\.venv\Lib\site-packages\tenacity\_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\.venv\Lib\site-packages\tenacity\_asyncio.py", line 47, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Guo\Downloads\MediaCrawler-main\.venv\Lib\site-packages\tenacity\__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x1e4d9d49990 state=finished raised JSONDecodeError>]
PS C:\Users\Guo\Downloads\MediaCrawler-main>📷 错误截图
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working