Skip to content

Conversation

@heroheman
Copy link
Contributor

@heroheman heroheman commented Dec 17, 2025

Description

Adds first-class support for "supplement" pages from the SCP Wiki, enabling automated crawling and post-processing of supplementary content.

Changes

  • New Item Type: ScpSupplement class in items.py
  • New Spider: ScpSupplementSpider to crawl pages tagged with supplement from https://scp-wiki.wikidot.com/system:page-tags/tag/supplement
  • Post-Processing: run_postproc_supplement() command that:
    • Extracts parent SCP references (e.g., "SCP-6881" from link)
    • Extracts parent tale series (e.g., "planetfall" from planetfall-7)
    • Processes history, images, and hub references
    • Outputs to data/processed/supplement/ with index.json and content_supplement.json
  • Makefile Integration: Supplements now included in scp_crawl and scp_postprocess targets
  • Documentation: Updated README with supplement spider usage and output structure

Output Structure

{
  "planetfall-7": {
    "title": "Planetfall: 7",
    "parent_scp": null,
    "parent_tale": "planetfall",
    "created_at": "2023-05-15T14:30:00",
    ...
  }
}

Testing

make supplement              # Crawl + post-process
scrapy crawl scp_supplement  # Crawl only

- Introduce ScpSupplement class for item representation
- Implement ScpSupplementSpider to crawl supplement pages
- Update makefile to include supplement in data targets
- Implement run_postproc_supplement to process SCP supplement data
- Create necessary directories and handle data extraction
- Store processed supplements in JSON format for further use
- Added instructions for crawling pages tagged as 'supplement'
- Updated content structure to include multiple content types
- Clarified post-processing details for supplements
@tedivm tedivm force-pushed the feature/add-supplements branch from d3ae165 to e9f6ba0 Compare December 29, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant