feat: add h_comic spider and GUI entry by xulingran · Pull Request #144 · jasoneri/ComicGUISpider

xulingran · 2026-02-10T15:35:14Z

Description

新增 h_comic 站点支持，覆盖爬虫、站点解析工具、GUI 入口和文案配置，用户可在下拉框中直接选择并搜索 h-comic 内容。

主要改动：

新增爬虫：ComicSpider/spiders/h_comic.py
新增站点数据模型：utils/website/info.py 中 HComicBookInfo
新增站点解析与注册：utils/website/ins.py 中 HComicUtils 并加入 spider_utils_map
多语言文案更新：assets/res/locale/zh_CN.yml、assets/res/locale/en_US.yml
站点索引配置更新：variables/__init__.py

Related Issues

Related to #<issue_id>

Checklist:

Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you linted your code locally prior to submission?
Have you successfully ran app with your changes locally?

Summary by Sourcery

Add support for the h-comic website across the crawler core, site utilities, and GUI selection.

New Features:

Introduce HComicSpider to crawl content from h-comic.com, including search and chapter image retrieval.
Add HComicUtils and HComicBookInfo models to parse h-comic search and book pages into internal data structures.
Expose h-comic as a selectable source in the GUI and site registry with dedicated status tips and search guidance.

Enhancements:

Update global website indices, special-website flags, and default completer/status tip mappings to integrate h-comic into existing configuration.
Extend locale resources with English and Chinese descriptions for the new h-comic source.

sourcery-ai · 2026-02-10T15:35:26Z

Reviewer's Guide

Adds full support for the h-comic website, including a new spider, website utilities, data model, GUI selector entry, and locale/config wiring so users can search and download from h-comic via the app.

Sequence diagram for h_comic search flow from GUI to spider and utils

sequenceDiagram
actor User
participant GUI_MainWindow
participant Variables as Variables_module
participant SpiderScheduler
participant HComicSpider
participant HComicUtils
participant HComicSite as h_comic_website

User ->> GUI_MainWindow: Select index 8 and enter keyword
GUI_MainWindow ->> Variables: Resolve site index 8 to h_comic
Variables -->> GUI_MainWindow: Site key h_comic
GUI_MainWindow ->> SpiderScheduler: Request search for h_comic with keyword
SpiderScheduler ->> HComicSpider: Instantiate spider and start search
HComicSpider ->> HComicUtils: Access ua headers
HComicSpider ->> HComicSite: GET search_url_head + keyword
HComicSite -->> HComicSpider: HTML with embedded payload
HComicSpider ->> HComicUtils: parse_search(response.text)
HComicUtils ->> HComicUtils: _extract_payload_data(resp_text)
HComicUtils ->> HComicUtils: parse_search_item(target) for each comic
HComicUtils ->> HComicBookInfo: Construct book model per result
HComicBookInfo -->> HComicUtils: Book instances
HComicUtils -->> HComicSpider: List of HComicBookInfo
HComicSpider ->> GUI_MainWindow: Formatted search results for display

Sequence diagram for h_comic book download page resolution

sequenceDiagram
actor User
participant GUI_MainWindow
participant SpiderScheduler
participant HComicSpider
participant HComicUtils
participant HComicSite as h_comic_website

User ->> GUI_MainWindow: Confirm download for selected h_comic book
GUI_MainWindow ->> SpiderScheduler: Start section task with book_id_url
SpiderScheduler ->> HComicSpider: Request frame_section(response)
HComicSpider ->> HComicSite: GET book page URL
HComicSite -->> HComicSpider: HTML with comic payload
HComicSpider ->> HComicUtils: parse_book(response.text)
HComicUtils ->> HComicUtils: _extract_payload_data(resp_text)
HComicUtils ->> HComicUtils: parse_search_item(comic)
HComicUtils ->> HComicBookInfo: Build book with media_id and comic_source
HComicBookInfo -->> HComicUtils: Book instance
HComicUtils -->> HComicSpider: Parsed HComicBookInfo
HComicSpider ->> HComicUtils: _get_image_prefix(comic_source)
HComicUtils -->> HComicSpider: image_prefix
HComicSpider ->> HComicSpider: Build page URL map image_prefix/media_id/pages/page
HComicSpider -->> SpiderScheduler: Frame results for download tasks

Class diagram for new h_comic spider, utils, and book model

classDiagram
class Ero
class EroUtils
class Req
class BaseComicSpider2

class HComicBookInfo {
  +source: str = h_comic
}

class HComicUtils {
  +name: str = h_comic
  +index: str
  +image_server: str
  +headers: dict
  +book_hea: dict
  +uuid_regex
  +book_url_regex: str
  +payload_regex
  +object_key_regex
  +__init__(_conf)
  +test_index()
  +build_search_url(key)
  +_format_public_date(unix_ts)
  +_jsobj_to_dict(js_obj_text)
  +_extract_payload_data(resp_text)
  +_get_image_prefix(comic_source)
  +_build_cover_url(comic)
  +_build_book_urls(comic)
  +parse_search_item(target)
  +parse_search(resp_text)
  +parse_book(resp_text)
}

class HComicSpider {
  +name: str = h_comic
  +domain: str = h-comic.com
  +num_of_row: int = 4
  +search_url_head: str
  +turn_page_info: tuple
  +book_id_url: str
  +mappings: dict
  +ua
  +frame_book(response)
  +frame_section(response)
}

Ero <|-- HComicBookInfo
EroUtils <|-- HComicUtils
Req <|-- HComicUtils
BaseComicSpider2 <|-- HComicSpider

HComicUtils --> HComicBookInfo : builds
HComicSpider --> HComicUtils : uses

File-Level Changes

Change	Details	Files
Implement h-comic site parsing utilities and register them in the website utils registry.	Add HComicUtils class with HTTP client setup, index availability check, and custom headers/image server config for h-comic. Implement helpers to extract and normalize JSON-like payloads from h-comic HTML responses, including timestamp formatting and image prefix resolution. Provide parsing methods for search results and single-book pages that construct HComicBookInfo instances and integrate with existing Ero/EroUtils flow. Register HComicUtils in registry.spider_utils_map under both numeric (8) and string ('h_comic') keys.	`utils/website/ins.py` `utils/website/info.py`
Wire h-comic into global configuration, GUI site selection, and status text/locale entries.	Add h_comic to global website index mappings, special-website lists, default completer map, and status tips with a custom search usage string. Extend the GUI dropdown to include an "8、h-comic🔞" option matching the new index. Add Chinese and English locale description strings describing h-comic search behavior.	`variables/__init__.py` `GUI/mainwindow.py` `assets/res/locale/en_US.yml` `assets/res/locale/zh_CN.yml`
Add a Scrapy spider for h-comic that uses the new utilities to search and build download tasks.	Create HComicSpider with site-specific URLs, pagination pattern, and referer middleware configuration. Use HComicUtils headers as the spider user-agent and parse search responses into indexed book results. Implement frame_section to compute per-page image URLs from media_id/comic_source and enqueue the download task with user-facing messages.	`ComicSpider/spiders/h_comic.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

In HComicUtils.parse_search and parse_search_item, the broad except Exception blocks silently drop errors; consider narrowing the exceptions you catch or at least logging unexpected payload issues so real parsing problems are visible during debugging.
HComicUtils.uuid_regex and book_url_regex are defined but never used; remove these until they’re needed to keep the utils class focused and avoid confusion about which URL shapes are actually supported.
The payload_regex and _jsobj_to_dict logic tightly couples the parser to the current frontend JS structure; consider adding a small validation step (e.g., checking for required keys and raising a clear error) or a fallback parsing path so minor frontend changes don’t silently break search/book parsing.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `HComicUtils.parse_search` and `parse_search_item`, the broad `except Exception` blocks silently drop errors; consider narrowing the exceptions you catch or at least logging unexpected payload issues so real parsing problems are visible during debugging.
- `HComicUtils.uuid_regex` and `book_url_regex` are defined but never used; remove these until they’re needed to keep the utils class focused and avoid confusion about which URL shapes are actually supported.
- The `payload_regex` and `_jsobj_to_dict` logic tightly couples the parser to the current frontend JS structure; consider adding a small validation step (e.g., checking for required keys and raising a clear error) or a fallback parsing path so minor frontend changes don’t silently break search/book parsing.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Copilot

Pull request overview

This PR adds support for the h-comic website, a frontend-rendered adult content site. The implementation follows established patterns in the codebase for adult content spiders, providing search functionality, book parsing, and image downloading capabilities.

Changes:

Adds new spider implementation with search and book parsing methods
Implements HComicUtils class with JavaScript object parsing and URL building
Adds HComicBookInfo data model for h-comic specific metadata
Updates configuration dictionaries and GUI dropdown to register the new site

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
ComicSpider/spiders/h_comic.py	New spider implementation for h-comic with frame_book and frame_section methods
utils/website/ins.py	HComicUtils class with payload extraction, JSON parsing, and book/search parsing
utils/website/info.py	HComicBookInfo data model extending Ero base class
variables/init.py	Configuration updates for SPIDERS, SPECIAL_WEBSITES, STATUS_TIP, and other settings
GUI/mainwindow.py	GUI dropdown entry for h-comic site selection
assets/res/locale/zh_CN.yml	Chinese localization for h-comic site description
assets/res/locale/en_US.yml	English localization for h-comic site description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

utils/website/ins.py

Copilot · 2026-02-10T15:42:11Z

variables/__init__.py

    5: f"mangabz: {res.GUI.SearchInputStatusTip.mangabz}",
-    6: f"hitomi: {res.GUI.SearchInputStatusTip.hitomi}"
+    6: f"hitomi: {res.GUI.SearchInputStatusTip.hitomi}",
+    8: "h_comic: 直接输入关键词搜索，例如 NTR / 中文 / 作者名"


The STATUS_TIP entry for h_comic at line 51 hardcodes a Chinese description directly, which is inconsistent with other sites that use the internationalization pattern (res.GUI.SearchInputStatusTip.). This breaks the i18n system and prevents proper localization. The value should be changed to: f"h_comic: {res.GUI.SearchInputStatusTip.h_comic}" to match the pattern used by other sites and enable proper localization support.

Copilot · 2026-02-10T15:42:12Z

variables/__init__.py

 }
-SPECIAL_WEBSITES_IDXES = [2, 3, 4, 6]
+SPECIAL_WEBSITES_IDXES = [2, 3, 4, 6, 8]
 CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6]


The h_comic site is marked as an international site (🌎) similar to ehentai and hitomi, and likely requires proxy access for users in China. Consider adding index 8 to CN_PREVIEW_NEED_PROXIES_IDXES list to ensure Chinese users can properly preview h-comic content through the browser window. Currently CN_PREVIEW_NEED_PROXIES_IDXES includes [3, 4, 6] for wnacg, ehentai, and hitomi.

Suggested change

CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6]

CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6, 8]

jsonmaki · 2026-02-10T17:31:35Z

感谢贡献

目前发现以下问题

HComicUtils.parse_search 的异常捕捉，CGS 本身有兜底异常反馈和全局日志机制，
看返回值是空列表，流也会中断，所以如果此处用 try catch 会影响用户判断，需要直接抛出自定义错误信息
HComicSpider 似乎缺漏 ComicDlProxyMiddleware 或 ComicDlAllProxyMiddleware，因为实测不开代理无法进入
还有 copilot 提的 _format_public_date 是否能更正

- enable ComicDlProxyMiddleware and UAMiddleware for HComicSpider - add proxy_domains for h-comic domains - improve upload_date parsing for second/ms timestamps - raise explicit HComicParseError for invalid payload/search entries

- switch to ComicDlAllProxyMiddleware for all requests - remove unused proxy_domains setting

xulingran · 2026-02-11T01:46:54Z

感谢详细 review 和建议，已按你说的方向处理并推送：

HComicUtils.parse_search保持抛出自定义异常，交给 CGS 全局异常反馈和日志链路统一处理。
HComicSpider 代理中间件已补齐，并按“该站全站需代理”调整为 ComicDlAllProxyMiddleware。
_format_public_date 已修正，兼容秒/毫秒时间戳，同时补了 OSError/OverflowError 边界处理。
对应提交：1fcf035、ebb1ae6。

jasoneri · 2026-02-11T05:32:08Z

🎉 feat 将会跟随下个稳定版发布

feat: add h_comic spider and GUI entry

9c1a682

Copilot AI review requested due to automatic review settings February 10, 2026 15:35

Copilot started reviewing on behalf of xulingran February 10, 2026 15:35 View session

sourcery-ai bot reviewed Feb 10, 2026

View reviewed changes

fix: harden h_comic payload parsing and error handling

f813047

Copilot AI reviewed Feb 10, 2026

View reviewed changes

fix: address review comments for h_comic integration

804bd3e

Zhong added 2 commits February 11, 2026 09:37

fix(h_comic): add proxy UA middleware and harden parsing

1fcf035

- enable ComicDlProxyMiddleware and UAMiddleware for HComicSpider - add proxy_domains for h-comic domains - improve upload_date parsing for second/ms timestamps - raise explicit HComicParseError for invalid payload/search entries

fix(h_comic): use all-proxy middleware for full-site access

ebb1ae6

- switch to ComicDlAllProxyMiddleware for all requests - remove unused proxy_domains setting

jasoneri approved these changes Feb 11, 2026

View reviewed changes

jasoneri merged commit 95db5f3 into jasoneri:2.8-dev Feb 11, 2026
1 check passed

jasoneri added the dev spider add spider label Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add h_comic spider and GUI entry#144

feat: add h_comic spider and GUI entry#144
jasoneri merged 5 commits intojasoneri:2.8-devfrom
xulingran:codex/feat-h-comic

xulingran commented Feb 10, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Feb 10, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

jsonmaki commented Feb 10, 2026 •

edited

Loading

Uh oh!

xulingran commented Feb 11, 2026

Uh oh!

Uh oh!

jasoneri commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6]
	CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6, 8]

Uh oh!

Conversation

xulingran commented Feb 10, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Checklist:

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for h_comic search flow from GUI to spider and utils

Sequence diagram for h_comic book download page resolution

Class diagram for new h_comic spider, utils, and book model

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

jsonmaki commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xulingran commented Feb 11, 2026

Uh oh!

Uh oh!

jasoneri commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xulingran commented Feb 10, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 10, 2026 •

edited

Loading

jsonmaki commented Feb 10, 2026 •

edited

Loading

jasoneri commented Feb 11, 2026 •

edited

Loading