Skip to content

feat: Add image download and processing support#8

Merged
konard merged 5 commits intomainfrom
issue-7-ece319e73fdb
Dec 25, 2025
Merged

feat: Add image download and processing support#8
konard merged 5 commits intomainfrom
issue-7-ece319e73fdb

Conversation

@konard
Copy link
Member

@konard konard commented Dec 25, 2025

Summary

This PR implements comprehensive image handling for pull requests, solving the "Could not process image" errors when used with Claude Code CLI. It adds support for downloading and validating embedded images from PR descriptions, comments, and reviews.

Key Features

  • Image Download: Downloads all embedded images from PR body, comments, reviews, and review comments
  • Multiple Formats: Supports both Markdown (![alt](url)) and HTML (<img src="url">) image references
  • Image Validation: Validates downloaded files by checking magic bytes, not just file extensions. Detects:
    • PNG, JPG, GIF, WebP, BMP, ICO, SVG images
    • HTML error pages (GitHub 404 pages mistakenly downloaded)
  • Redirect Handling: Properly follows GitHub's S3 signed URLs and redirects (up to 5 hops)
  • Local Path Updates: Automatically updates markdown to reference local image paths
  • JSON Output: New --format json option for programmatic use
  • New CLI Options:
    • --download-images (default: true) - Download embedded images
    • --include-reviews (default: true) - Include PR reviews
    • --format markdown|json - Output format
    • --verbose / -v - Enable verbose logging

Output Structure

When saving to a directory:

pr-123.md               # PR content with local image references
pr-123-images/          # Directory with downloaded images
  image-1.png
  image-2.jpg
pr-123.json             # Optional JSON export (with --format json)

CLI Options

gh-download-pull-request <pr-url> [options]

Options:
  -t, --token           GitHub personal access token
  -o, --output          Output directory (default: current directory)
  --download-images     Download embedded images (default: true)
  --include-reviews     Include PR reviews (default: true)
  --format              Output format: markdown, json (default: markdown)
  -v, --verbose         Enable verbose logging
  -h, --help            Show help
  --version             Show version number

Usage Examples

# Download PR with images to current directory
gh-download-pull-request owner/repo#123 -o ./

# Output as JSON
gh-download-pull-request owner/repo#123 -o ./ --format json

# Skip image download
gh-download-pull-request owner/repo#123 --no-download-images

# Verbose mode for debugging
gh-download-pull-request owner/repo#123 -o ./ -v

Technical Implementation

  1. Image Extraction: Uses regex to find both Markdown and HTML image references
  2. Download with Auth: Uses GitHub token for authenticated image downloads
  3. Magic Byte Validation: Checks file headers to identify actual format vs extension
  4. Error Handling: Graceful handling of failed downloads, logs warnings but continues

Testing

  • ✅ Tested with PRs containing images (badges, screenshots)
  • ✅ All existing CLI tests pass
  • ✅ Lint and format checks pass

Issue Reference

Fixes #7


🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #7
@konard konard self-assigned this Dec 25, 2025
This commit implements comprehensive image handling for pull requests:

## Features Added:
- Download embedded images from PR body, comments, and reviews
- Extract images from both Markdown (![alt](url)) and HTML (<img src="url">)
- Validate downloaded files by checking magic bytes (not just extension)
- Handle GitHub S3 signed URLs and redirects
- Update markdown to reference local image paths
- Add JSON output format for programmatic use
- Add new CLI options:
  - --download-images (default: true)
  - --include-reviews (default: true)
  - --format (markdown/json)
  - --verbose (-v)

## Technical Implementation:
- Image validation using magic bytes for PNG, JPG, GIF, WebP, BMP, ICO, SVG
- Detection of HTML error pages (404 pages mistakenly downloaded as images)
- Redirect handling for up to 5 hops
- Graceful error handling for failed image downloads
- Proper authentication for GitHub-hosted images

## Output Structure:
When saving to directory:
- pr-{number}.md (or .json) - PR content with local image references
- pr-{number}-images/ - Downloaded images

Fixes #7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard changed the title [WIP] Implement gh-download-pull-request tool feat: Add image download and processing support Dec 25, 2025
@konard konard marked this pull request as ready for review December 25, 2025 15:08
@konard
Copy link
Member Author

konard commented Dec 25, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $7.043484 USD
  • Calculated by Anthropic: $4.674269 USD
  • Difference: $-2.369214 (-33.64%)
    📎 Log file uploaded as GitHub Gist (855KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit 0e4b7e4 into main Dec 25, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement gh-download-pull-request tool

1 participant