feat: add h_comic spider and GUI entry#144
Conversation
Reviewer's GuideAdds full support for the h-comic website, including a new spider, website utilities, data model, GUI selector entry, and locale/config wiring so users can search and download from h-comic via the app. Sequence diagram for h_comic search flow from GUI to spider and utilssequenceDiagram
actor User
participant GUI_MainWindow
participant Variables as Variables_module
participant SpiderScheduler
participant HComicSpider
participant HComicUtils
participant HComicSite as h_comic_website
User ->> GUI_MainWindow: Select index 8 and enter keyword
GUI_MainWindow ->> Variables: Resolve site index 8 to h_comic
Variables -->> GUI_MainWindow: Site key h_comic
GUI_MainWindow ->> SpiderScheduler: Request search for h_comic with keyword
SpiderScheduler ->> HComicSpider: Instantiate spider and start search
HComicSpider ->> HComicUtils: Access ua headers
HComicSpider ->> HComicSite: GET search_url_head + keyword
HComicSite -->> HComicSpider: HTML with embedded payload
HComicSpider ->> HComicUtils: parse_search(response.text)
HComicUtils ->> HComicUtils: _extract_payload_data(resp_text)
HComicUtils ->> HComicUtils: parse_search_item(target) for each comic
HComicUtils ->> HComicBookInfo: Construct book model per result
HComicBookInfo -->> HComicUtils: Book instances
HComicUtils -->> HComicSpider: List of HComicBookInfo
HComicSpider ->> GUI_MainWindow: Formatted search results for display
Sequence diagram for h_comic book download page resolutionsequenceDiagram
actor User
participant GUI_MainWindow
participant SpiderScheduler
participant HComicSpider
participant HComicUtils
participant HComicSite as h_comic_website
User ->> GUI_MainWindow: Confirm download for selected h_comic book
GUI_MainWindow ->> SpiderScheduler: Start section task with book_id_url
SpiderScheduler ->> HComicSpider: Request frame_section(response)
HComicSpider ->> HComicSite: GET book page URL
HComicSite -->> HComicSpider: HTML with comic payload
HComicSpider ->> HComicUtils: parse_book(response.text)
HComicUtils ->> HComicUtils: _extract_payload_data(resp_text)
HComicUtils ->> HComicUtils: parse_search_item(comic)
HComicUtils ->> HComicBookInfo: Build book with media_id and comic_source
HComicBookInfo -->> HComicUtils: Book instance
HComicUtils -->> HComicSpider: Parsed HComicBookInfo
HComicSpider ->> HComicUtils: _get_image_prefix(comic_source)
HComicUtils -->> HComicSpider: image_prefix
HComicSpider ->> HComicSpider: Build page URL map image_prefix/media_id/pages/page
HComicSpider -->> SpiderScheduler: Frame results for download tasks
Class diagram for new h_comic spider, utils, and book modelclassDiagram
class Ero
class EroUtils
class Req
class BaseComicSpider2
class HComicBookInfo {
+source: str = h_comic
}
class HComicUtils {
+name: str = h_comic
+index: str
+image_server: str
+headers: dict
+book_hea: dict
+uuid_regex
+book_url_regex: str
+payload_regex
+object_key_regex
+__init__(_conf)
+test_index()
+build_search_url(key)
+_format_public_date(unix_ts)
+_jsobj_to_dict(js_obj_text)
+_extract_payload_data(resp_text)
+_get_image_prefix(comic_source)
+_build_cover_url(comic)
+_build_book_urls(comic)
+parse_search_item(target)
+parse_search(resp_text)
+parse_book(resp_text)
}
class HComicSpider {
+name: str = h_comic
+domain: str = h-comic.com
+num_of_row: int = 4
+search_url_head: str
+turn_page_info: tuple
+book_id_url: str
+mappings: dict
+ua
+frame_book(response)
+frame_section(response)
}
Ero <|-- HComicBookInfo
EroUtils <|-- HComicUtils
Req <|-- HComicUtils
BaseComicSpider2 <|-- HComicSpider
HComicUtils --> HComicBookInfo : builds
HComicSpider --> HComicUtils : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
HComicUtils.parse_searchandparse_search_item, the broadexcept Exceptionblocks silently drop errors; consider narrowing the exceptions you catch or at least logging unexpected payload issues so real parsing problems are visible during debugging. HComicUtils.uuid_regexandbook_url_regexare defined but never used; remove these until they’re needed to keep the utils class focused and avoid confusion about which URL shapes are actually supported.- The
payload_regexand_jsobj_to_dictlogic tightly couples the parser to the current frontend JS structure; consider adding a small validation step (e.g., checking for required keys and raising a clear error) or a fallback parsing path so minor frontend changes don’t silently break search/book parsing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `HComicUtils.parse_search` and `parse_search_item`, the broad `except Exception` blocks silently drop errors; consider narrowing the exceptions you catch or at least logging unexpected payload issues so real parsing problems are visible during debugging.
- `HComicUtils.uuid_regex` and `book_url_regex` are defined but never used; remove these until they’re needed to keep the utils class focused and avoid confusion about which URL shapes are actually supported.
- The `payload_regex` and `_jsobj_to_dict` logic tightly couples the parser to the current frontend JS structure; consider adding a small validation step (e.g., checking for required keys and raising a clear error) or a fallback parsing path so minor frontend changes don’t silently break search/book parsing.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR adds support for the h-comic website, a frontend-rendered adult content site. The implementation follows established patterns in the codebase for adult content spiders, providing search functionality, book parsing, and image downloading capabilities.
Changes:
- Adds new spider implementation with search and book parsing methods
- Implements HComicUtils class with JavaScript object parsing and URL building
- Adds HComicBookInfo data model for h-comic specific metadata
- Updates configuration dictionaries and GUI dropdown to register the new site
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ComicSpider/spiders/h_comic.py | New spider implementation for h-comic with frame_book and frame_section methods |
| utils/website/ins.py | HComicUtils class with payload extraction, JSON parsing, and book/search parsing |
| utils/website/info.py | HComicBookInfo data model extending Ero base class |
| variables/init.py | Configuration updates for SPIDERS, SPECIAL_WEBSITES, STATUS_TIP, and other settings |
| GUI/mainwindow.py | GUI dropdown entry for h-comic site selection |
| assets/res/locale/zh_CN.yml | Chinese localization for h-comic site description |
| assets/res/locale/en_US.yml | English localization for h-comic site description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
variables/__init__.py
Outdated
| 5: f"mangabz: {res.GUI.SearchInputStatusTip.mangabz}", | ||
| 6: f"hitomi: {res.GUI.SearchInputStatusTip.hitomi}" | ||
| 6: f"hitomi: {res.GUI.SearchInputStatusTip.hitomi}", | ||
| 8: "h_comic: 直接输入关键词搜索,例如 NTR / 中文 / 作者名" |
There was a problem hiding this comment.
The STATUS_TIP entry for h_comic at line 51 hardcodes a Chinese description directly, which is inconsistent with other sites that use the internationalization pattern (res.GUI.SearchInputStatusTip.). This breaks the i18n system and prevents proper localization. The value should be changed to: f"h_comic: {res.GUI.SearchInputStatusTip.h_comic}" to match the pattern used by other sites and enable proper localization support.
variables/__init__.py
Outdated
| } | ||
| SPECIAL_WEBSITES_IDXES = [2, 3, 4, 6] | ||
| SPECIAL_WEBSITES_IDXES = [2, 3, 4, 6, 8] | ||
| CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6] |
There was a problem hiding this comment.
The h_comic site is marked as an international site (🌎) similar to ehentai and hitomi, and likely requires proxy access for users in China. Consider adding index 8 to CN_PREVIEW_NEED_PROXIES_IDXES list to ensure Chinese users can properly preview h-comic content through the browser window. Currently CN_PREVIEW_NEED_PROXIES_IDXES includes [3, 4, 6] for wnacg, ehentai, and hitomi.
| CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6] | |
| CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6, 8] |
|
感谢贡献 目前发现以下问题
|
- enable ComicDlProxyMiddleware and UAMiddleware for HComicSpider - add proxy_domains for h-comic domains - improve upload_date parsing for second/ms timestamps - raise explicit HComicParseError for invalid payload/search entries
- switch to ComicDlAllProxyMiddleware for all requests - remove unused proxy_domains setting
|
感谢详细 review 和建议,已按你说的方向处理并推送:
|
|
🎉 feat 将会跟随 下个稳定版 发布 |
Description
新增
h_comic站点支持,覆盖爬虫、站点解析工具、GUI 入口和文案配置,用户可在下拉框中直接选择并搜索h-comic内容。主要改动:
ComicSpider/spiders/h_comic.pyutils/website/info.py中HComicBookInfoutils/website/ins.py中HComicUtils并加入spider_utils_mapassets/res/locale/zh_CN.yml、assets/res/locale/en_US.ymlvariables/__init__.pyRelated Issues
Related to #<issue_id>
Checklist:
Summary by Sourcery
Add support for the h-comic website across the crawler core, site utilities, and GUI selection.
New Features:
Enhancements: