-
Notifications
You must be signed in to change notification settings - Fork 9.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🔍 问题检查清单
- [ ✅] 我已经仔细阅读了项目使用过程中的常见问题汇总
- [ ✅] 我已经搜索并查看了已关闭的issues
- [ ✅] 我确认这不是由于滑块验证码、Cookie过期、Cookie提取错误、平台风控等常见原因导致的问题
🐛 问题描述
抖音跳页逻辑是否出错,第一页开始获取数据没问题,如果从第二页开始返回的是空列表;合理逻辑应该是每一页的内容都获取,最后保存第二页之后的数据吧
📝 复现步骤
####以下在基础配置上改动
- PLATFORM = "dy"
- CRAWLER_TYPE ="search"
- START_PAGE = 2
💻 运行环境
- 操作系统: MacOS 14
- Python版本: 3.11
- 是否使用IP代理: 否
- 是否使用VPN翻墙软件:否
- 目标平台(抖音/小红书/微博等): 抖音
📋 错误日志
shujieli@shujiedeMacBook-Pro MediaCrawler % uv run main.py --platform dy --lt qrcode --type search
2025-10-27 20:43:23 MediaCrawler INFO (core.py:60) - [DouYinCrawler] 使用CDP模式启动浏览器
2025-10-27 20:43:23 MediaCrawler INFO (cdp_browser.py:94) - [CDPBrowserManager] 检测到浏览器: Google Chrome (Google Chrome 141.0.7390.123)
2025-10-27 20:43:23 MediaCrawler INFO (cdp_browser.py:97) - [CDPBrowserManager] 浏览器路径: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
2025-10-27 20:43:23 MediaCrawler INFO (cdp_browser.py:137) - [CDPBrowserManager] 用户数据目录: /Users/shujieli/shujuxiangmu/data_get/MediaCrawler/browser_data/cdp_dy_user_data_dir
2025-10-27 20:43:23 MediaCrawler INFO (browser_launcher.py:154) - [BrowserLauncher] 启动浏览器: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
2025-10-27 20:43:23 MediaCrawler INFO (browser_launcher.py:155) - [BrowserLauncher] 调试端口: 9222
2025-10-27 20:43:23 MediaCrawler INFO (browser_launcher.py:156) - [BrowserLauncher] 无头模式: False
2025-10-27 20:43:23 MediaCrawler INFO (browser_launcher.py:186) - [BrowserLauncher] 等待浏览器在端口 9222 上准备就绪...
2025-10-27 20:43:23 MediaCrawler INFO (browser_launcher.py:195) - [BrowserLauncher] 浏览器已在端口 9222 上准备就绪
2025-10-27 20:43:24 MediaCrawler INFO (cdp_browser.py:111) - [CDPBrowserManager] CDP端口 9222 可访问
2025-10-27 20:43:24 httpx INFO (_client.py:1740) - HTTP Request: GET http://localhost:9222/json/version "HTTP/1.1 200 OK"
2025-10-27 20:43:24 MediaCrawler INFO (cdp_browser.py:175) - [CDPBrowserManager] 获取到浏览器WebSocket URL: ws://localhost:9222/devtools/browser/9dc72c7b-7391-44be-b41e-3bd36da27223
2025-10-27 20:43:24 MediaCrawler INFO (cdp_browser.py:194) - [CDPBrowserManager] 正在通过CDP连接到浏览器: ws://localhost:9222/devtools/browser/9dc72c7b-7391-44be-b41e-3bd36da27223
2025-10-27 20:43:25 MediaCrawler INFO (cdp_browser.py:200) - [CDPBrowserManager] 成功连接到浏览器
2025-10-27 20:43:25 MediaCrawler INFO (cdp_browser.py:201) - [CDPBrowserManager] 浏览器上下文数量: 1
2025-10-27 20:43:25 MediaCrawler INFO (cdp_browser.py:226) - [CDPBrowserManager] 使用现有的浏览器上下文
2025-10-27 20:43:25 MediaCrawler INFO (cdp_browser.py:258) - [CDPBrowserManager] 已添加反检测脚本: libs/stealth.min.js
2025-10-27 20:43:25 MediaCrawler INFO (core.py:357) - [DouYinCrawler] CDP浏览器信息: {'version': '141.0.7390.123', 'contexts_count': 1, 'debug_port': 9222, 'is_connected': True}
2025-10-27 20:43:29 MediaCrawler INFO (core.py:108) - [DouYinCrawler.search] Begin search douyin keywords
2025-10-27 20:43:29 MediaCrawler INFO (core.py:115) - [DouYinCrawler.search] Current keyword: 男士护肤
2025-10-27 20:43:29 MediaCrawler INFO (core.py:121) - [DouYinCrawler.search] Skip 0
2025-10-27 20:43:29 MediaCrawler INFO (core.py:121) - [DouYinCrawler.search] Skip 1
2025-10-27 20:43:29 MediaCrawler INFO (core.py:125) - [DouYinCrawler.search] search douyin keyword: 男士护肤, page: 2
2025-10-27 20:43:30 httpx INFO (_client.py:1740) - HTTP Request: GET https://www.douyin.com/aweme/v1/web/general/search/single/?search_channel=aweme_general&enable_history=1&keyword=%E7%94%B7%E5%A3%AB%E6%8A%A4%E8%82%A4&search_source=tab_search&query_correct_type=1&is_filter_search=0&from_group_id=7378810571505847586&offset=10&count=15&need_filter_settings=1&list_type=multi&search_id=&device_platform=webapp&aid=6383&channel=channel_pc_web&version_code=190600&version_name=19.6.0&update_version_code=170400&pc_client_type=1&cookie_enabled=true&browser_language=zh-CN&browser_platform=MacIntel&browser_name=Chrome&browser_version=125.0.0.0&browser_online=true&engine_name=Blink&os_name=Mac+OS&os_version=10.15.7&cpu_core_num=8&device_memory=8&engine_version=109.0&platform=PC&screen_width=2560&screen_height=1440&effective_type=4g&round_trip_time=50&webid=1093618313152544765&msToken=j-1H-nwdn2tg2_rrNc8YHLhNaMz6RJEWjPsuqgyt412foACUcwDXoHhZfRH_V8xp6XuFg_SVu54K8pNvfp0bZ9rm8cgWPsU06ymnKTHR-vnqx7x7mzX-uoITQE2UZBA2fmB_q-t0LS1P9-b_vRoxLwsj0moeCNzqPAZoXVLjXLKT_TKDjF3jZA%3D%3D&a_bogus=dJRMB5uXdk6BfDSk552LfY3q6VP3YpMd0trEMD2fFV3Yky39HMOS9exouKTvryfjiT%2FQIeYjy4hbT3ohrQ2y8qwf9W0L%2F25gsDSkKl12so0j53inCLf%2FE0iE5hsAtFH8svr4iKi8owICSYyhldAJ5kIlO62-zo0%2F9Xj%3D "HTTP/1.1 200 OK"
2025-10-27 20:43:30 MediaCrawler INFO (core.py:133) - [DouYinCrawler.search] search douyin keyword: 男士护肤, page: 2 is empty,[]`
2025-10-27 20:43:30 MediaCrawler INFO (core.py:156) - [DouYinCrawler.search] keyword:男士护肤, aweme_list:[]
2025-10-27 20:43:30 MediaCrawler INFO (core.py:105) - [DouYinCrawler.start] Douyin Crawler finished ...📷 错误截图

Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working