Skip to content

feat: add h_comic spider and GUI entry#144

Merged
jasoneri merged 5 commits intojasoneri:2.8-devfrom
xulingran:codex/feat-h-comic
Feb 11, 2026
Merged

feat: add h_comic spider and GUI entry#144
jasoneri merged 5 commits intojasoneri:2.8-devfrom
xulingran:codex/feat-h-comic

Conversation

@xulingran
Copy link

@xulingran xulingran commented Feb 10, 2026

Description

新增 h_comic 站点支持,覆盖爬虫、站点解析工具、GUI 入口和文案配置,用户可在下拉框中直接选择并搜索 h-comic 内容。

主要改动:

  • 新增爬虫:ComicSpider/spiders/h_comic.py
  • 新增站点数据模型:utils/website/info.pyHComicBookInfo
  • 新增站点解析与注册:utils/website/ins.pyHComicUtils 并加入 spider_utils_map
  • 多语言文案更新:assets/res/locale/zh_CN.ymlassets/res/locale/en_US.yml
  • 站点索引配置更新:variables/__init__.py

Related Issues

Related to #<issue_id>

Checklist:

  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you linted your code locally prior to submission?
  • Have you successfully ran app with your changes locally?

Summary by Sourcery

Add support for the h-comic website across the crawler core, site utilities, and GUI selection.

New Features:

  • Introduce HComicSpider to crawl content from h-comic.com, including search and chapter image retrieval.
  • Add HComicUtils and HComicBookInfo models to parse h-comic search and book pages into internal data structures.
  • Expose h-comic as a selectable source in the GUI and site registry with dedicated status tips and search guidance.

Enhancements:

  • Update global website indices, special-website flags, and default completer/status tip mappings to integrate h-comic into existing configuration.
  • Extend locale resources with English and Chinese descriptions for the new h-comic source.

Copilot AI review requested due to automatic review settings February 10, 2026 15:35
@sourcery-ai
Copy link

sourcery-ai bot commented Feb 10, 2026

Reviewer's Guide

Adds full support for the h-comic website, including a new spider, website utilities, data model, GUI selector entry, and locale/config wiring so users can search and download from h-comic via the app.

Sequence diagram for h_comic search flow from GUI to spider and utils

sequenceDiagram
actor User
participant GUI_MainWindow
participant Variables as Variables_module
participant SpiderScheduler
participant HComicSpider
participant HComicUtils
participant HComicSite as h_comic_website

User ->> GUI_MainWindow: Select index 8 and enter keyword
GUI_MainWindow ->> Variables: Resolve site index 8 to h_comic
Variables -->> GUI_MainWindow: Site key h_comic
GUI_MainWindow ->> SpiderScheduler: Request search for h_comic with keyword
SpiderScheduler ->> HComicSpider: Instantiate spider and start search
HComicSpider ->> HComicUtils: Access ua headers
HComicSpider ->> HComicSite: GET search_url_head + keyword
HComicSite -->> HComicSpider: HTML with embedded payload
HComicSpider ->> HComicUtils: parse_search(response.text)
HComicUtils ->> HComicUtils: _extract_payload_data(resp_text)
HComicUtils ->> HComicUtils: parse_search_item(target) for each comic
HComicUtils ->> HComicBookInfo: Construct book model per result
HComicBookInfo -->> HComicUtils: Book instances
HComicUtils -->> HComicSpider: List of HComicBookInfo
HComicSpider ->> GUI_MainWindow: Formatted search results for display
Loading

Sequence diagram for h_comic book download page resolution

sequenceDiagram
actor User
participant GUI_MainWindow
participant SpiderScheduler
participant HComicSpider
participant HComicUtils
participant HComicSite as h_comic_website

User ->> GUI_MainWindow: Confirm download for selected h_comic book
GUI_MainWindow ->> SpiderScheduler: Start section task with book_id_url
SpiderScheduler ->> HComicSpider: Request frame_section(response)
HComicSpider ->> HComicSite: GET book page URL
HComicSite -->> HComicSpider: HTML with comic payload
HComicSpider ->> HComicUtils: parse_book(response.text)
HComicUtils ->> HComicUtils: _extract_payload_data(resp_text)
HComicUtils ->> HComicUtils: parse_search_item(comic)
HComicUtils ->> HComicBookInfo: Build book with media_id and comic_source
HComicBookInfo -->> HComicUtils: Book instance
HComicUtils -->> HComicSpider: Parsed HComicBookInfo
HComicSpider ->> HComicUtils: _get_image_prefix(comic_source)
HComicUtils -->> HComicSpider: image_prefix
HComicSpider ->> HComicSpider: Build page URL map image_prefix/media_id/pages/page
HComicSpider -->> SpiderScheduler: Frame results for download tasks
Loading

Class diagram for new h_comic spider, utils, and book model

classDiagram
class Ero
class EroUtils
class Req
class BaseComicSpider2

class HComicBookInfo {
  +source: str = h_comic
}

class HComicUtils {
  +name: str = h_comic
  +index: str
  +image_server: str
  +headers: dict
  +book_hea: dict
  +uuid_regex
  +book_url_regex: str
  +payload_regex
  +object_key_regex
  +__init__(_conf)
  +test_index()
  +build_search_url(key)
  +_format_public_date(unix_ts)
  +_jsobj_to_dict(js_obj_text)
  +_extract_payload_data(resp_text)
  +_get_image_prefix(comic_source)
  +_build_cover_url(comic)
  +_build_book_urls(comic)
  +parse_search_item(target)
  +parse_search(resp_text)
  +parse_book(resp_text)
}

class HComicSpider {
  +name: str = h_comic
  +domain: str = h-comic.com
  +num_of_row: int = 4
  +search_url_head: str
  +turn_page_info: tuple
  +book_id_url: str
  +mappings: dict
  +ua
  +frame_book(response)
  +frame_section(response)
}

Ero <|-- HComicBookInfo
EroUtils <|-- HComicUtils
Req <|-- HComicUtils
BaseComicSpider2 <|-- HComicSpider

HComicUtils --> HComicBookInfo : builds
HComicSpider --> HComicUtils : uses
Loading

File-Level Changes

Change Details Files
Implement h-comic site parsing utilities and register them in the website utils registry.
  • Add HComicUtils class with HTTP client setup, index availability check, and custom headers/image server config for h-comic.
  • Implement helpers to extract and normalize JSON-like payloads from h-comic HTML responses, including timestamp formatting and image prefix resolution.
  • Provide parsing methods for search results and single-book pages that construct HComicBookInfo instances and integrate with existing Ero/EroUtils flow.
  • Register HComicUtils in registry.spider_utils_map under both numeric (8) and string ('h_comic') keys.
utils/website/ins.py
utils/website/info.py
Wire h-comic into global configuration, GUI site selection, and status text/locale entries.
  • Add h_comic to global website index mappings, special-website lists, default completer map, and status tips with a custom search usage string.
  • Extend the GUI dropdown to include an "8、h-comic🔞" option matching the new index.
  • Add Chinese and English locale description strings describing h-comic search behavior.
variables/__init__.py
GUI/mainwindow.py
assets/res/locale/en_US.yml
assets/res/locale/zh_CN.yml
Add a Scrapy spider for h-comic that uses the new utilities to search and build download tasks.
  • Create HComicSpider with site-specific URLs, pagination pattern, and referer middleware configuration.
  • Use HComicUtils headers as the spider user-agent and parse search responses into indexed book results.
  • Implement frame_section to compute per-page image URLs from media_id/comic_source and enqueue the download task with user-facing messages.
ComicSpider/spiders/h_comic.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In HComicUtils.parse_search and parse_search_item, the broad except Exception blocks silently drop errors; consider narrowing the exceptions you catch or at least logging unexpected payload issues so real parsing problems are visible during debugging.
  • HComicUtils.uuid_regex and book_url_regex are defined but never used; remove these until they’re needed to keep the utils class focused and avoid confusion about which URL shapes are actually supported.
  • The payload_regex and _jsobj_to_dict logic tightly couples the parser to the current frontend JS structure; consider adding a small validation step (e.g., checking for required keys and raising a clear error) or a fallback parsing path so minor frontend changes don’t silently break search/book parsing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `HComicUtils.parse_search` and `parse_search_item`, the broad `except Exception` blocks silently drop errors; consider narrowing the exceptions you catch or at least logging unexpected payload issues so real parsing problems are visible during debugging.
- `HComicUtils.uuid_regex` and `book_url_regex` are defined but never used; remove these until they’re needed to keep the utils class focused and avoid confusion about which URL shapes are actually supported.
- The `payload_regex` and `_jsobj_to_dict` logic tightly couples the parser to the current frontend JS structure; consider adding a small validation step (e.g., checking for required keys and raising a clear error) or a fallback parsing path so minor frontend changes don’t silently break search/book parsing.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the h-comic website, a frontend-rendered adult content site. The implementation follows established patterns in the codebase for adult content spiders, providing search functionality, book parsing, and image downloading capabilities.

Changes:

  • Adds new spider implementation with search and book parsing methods
  • Implements HComicUtils class with JavaScript object parsing and URL building
  • Adds HComicBookInfo data model for h-comic specific metadata
  • Updates configuration dictionaries and GUI dropdown to register the new site

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ComicSpider/spiders/h_comic.py New spider implementation for h-comic with frame_book and frame_section methods
utils/website/ins.py HComicUtils class with payload extraction, JSON parsing, and book/search parsing
utils/website/info.py HComicBookInfo data model extending Ero base class
variables/init.py Configuration updates for SPIDERS, SPECIAL_WEBSITES, STATUS_TIP, and other settings
GUI/mainwindow.py GUI dropdown entry for h-comic site selection
assets/res/locale/zh_CN.yml Chinese localization for h-comic site description
assets/res/locale/en_US.yml English localization for h-comic site description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

5: f"mangabz: {res.GUI.SearchInputStatusTip.mangabz}",
6: f"hitomi: {res.GUI.SearchInputStatusTip.hitomi}"
6: f"hitomi: {res.GUI.SearchInputStatusTip.hitomi}",
8: "h_comic: 直接输入关键词搜索,例如 NTR / 中文 / 作者名"
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STATUS_TIP entry for h_comic at line 51 hardcodes a Chinese description directly, which is inconsistent with other sites that use the internationalization pattern (res.GUI.SearchInputStatusTip.). This breaks the i18n system and prevents proper localization. The value should be changed to: f"h_comic: {res.GUI.SearchInputStatusTip.h_comic}" to match the pattern used by other sites and enable proper localization support.

Copilot uses AI. Check for mistakes.
}
SPECIAL_WEBSITES_IDXES = [2, 3, 4, 6]
SPECIAL_WEBSITES_IDXES = [2, 3, 4, 6, 8]
CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6]
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The h_comic site is marked as an international site (🌎) similar to ehentai and hitomi, and likely requires proxy access for users in China. Consider adding index 8 to CN_PREVIEW_NEED_PROXIES_IDXES list to ensure Chinese users can properly preview h-comic content through the browser window. Currently CN_PREVIEW_NEED_PROXIES_IDXES includes [3, 4, 6] for wnacg, ehentai, and hitomi.

Suggested change
CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6]
CN_PREVIEW_NEED_PROXIES_IDXES = [3, 4, 6, 8]

Copilot uses AI. Check for mistakes.
@jsonmaki
Copy link
Collaborator

jsonmaki commented Feb 10, 2026

感谢贡献

目前发现以下问题

  1. HComicUtils.parse_search 的异常捕捉,CGS 本身有兜底异常反馈和全局日志机制,
    看返回值是空列表,流也会中断,所以如果此处用 try catch 会影响用户判断,需要直接抛出自定义错误信息

  2. HComicSpider 似乎缺漏 ComicDlProxyMiddleware 或 ComicDlAllProxyMiddleware,因为实测不开代理无法进入

  3. 还有 copilot 提的 _format_public_date 是否能更正

Zhong added 2 commits February 11, 2026 09:37
- enable ComicDlProxyMiddleware and UAMiddleware for HComicSpider

- add proxy_domains for h-comic domains

- improve upload_date parsing for second/ms timestamps

- raise explicit HComicParseError for invalid payload/search entries
- switch to ComicDlAllProxyMiddleware for all requests

- remove unused proxy_domains setting
@xulingran
Copy link
Author

感谢详细 review 和建议,已按你说的方向处理并推送:

HComicUtils.parse_search保持抛出自定义异常,交给 CGS 全局异常反馈和日志链路统一处理。
HComicSpider 代理中间件已补齐,并按“该站全站需代理”调整为 ComicDlAllProxyMiddleware
_format_public_date 已修正,兼容秒/毫秒时间戳,同时补了 OSError/OverflowError 边界处理。
对应提交:1fcf035ebb1ae6

@jasoneri jasoneri merged commit 95db5f3 into jasoneri:2.8-dev Feb 11, 2026
1 check passed
@jasoneri
Copy link
Owner

jasoneri commented Feb 11, 2026

🎉 feat 将会跟随 下个稳定版 发布

@jasoneri jasoneri added the dev spider add spider label Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev spider add spider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants