Skip to content

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Dec 18, 2025

Summary

This PR addresses critical API compatibility gaps identified in the feedparser-rs gap analysis:

  • feed.published / feed.published_parsed field - RSS <pubDate> and Atom <published> at feed level now accessible via Python and Node.js bindings
  • xml:base URL resolution - Relative URLs are now automatically resolved to absolute URLs using xml:base attributes (Atom) or channel link (RSS)
  • SSRF protection - All resolved URLs are validated against dangerous targets (localhost, private IPs, cloud metadata endpoints)

Changes

Core Library

  • Add published: Option<DateTime<Utc>> to FeedMeta struct
  • Add extract_xml_base() helper for xml:base attribute extraction
  • Add resolve_safe() method to BaseUrlContext with SSRF validation
  • Integrate URL resolution in Atom parser (feed + entry level xml:base)
  • Integrate URL resolution in RSS parser (channel link as base)

Python Bindings

  • Add published getter returning RFC3339 string
  • Add published_parsed getter returning time.struct_time

Node.js Bindings

  • Add published field returning milliseconds since epoch
  • Easy conversion to JavaScript Date: new Date(feed.feed.published)
  • TypeScript definitions updated

Security

  • Block localhost (127.0.0.1, ::1, localhost domain)
  • Block private IPs (10.x.x.x, 192.168.x.x, 172.16-31.x.x)
  • Block cloud metadata (169.254.169.254, metadata.google.internal)
  • Block dangerous schemes (file://, data://, javascript://)
  • Block absolute malicious URLs in href attributes
  • Case-insensitive scheme validation (prevents bypass via FILE://, JAVASCRIPT:)

Test Plan

  • 10 URL resolution integration tests
  • 23 SSRF security tests (including case-insensitive scheme bypass)
  • 13 Python integration tests
  • 48 Node.js tests pass
  • All 474 Rust tests pass
  • cargo clippy --all-targets clean
  • cargo fmt applied

API Compatibility

Before: 85% Python feedparser parity
After: ~95% parity for modern feeds (RSS 2.0, Atom 1.0, JSON Feed)

No breaking changes to existing API.

@github-actions github-actions bot added type: tooling Development tools, CI/CD, or infrastructure component: core feedparser-rs-core Rust library component: python Python bindings (PyO3) component: node Node.js bindings (napi-rs) component: tests Test suite or test infrastructure area: parser Feed parsing logic area: rss RSS 0.9x, 1.0, 2.0 support area: atom Atom 1.0 support lang: rust Rust code lang: python Python code size: XL Extra large PR (<1000 lines changed) labels Dec 18, 2025
- Add published field to FeedMeta for RSS pubDate and Atom published
- Add published_parsed getter to Python bindings returning time.struct_time
- Implement xml:base URL resolution for Atom (feed and entry level)
- Implement implicit base URL from RSS channel link
- Add resolve_safe() method with SSRF protection
- Block dangerous URLs: localhost, private IPs, cloud metadata endpoints
- Add comprehensive test coverage (29 new tests)

Closes Python API parity gap for date fields and relative URL handling.
@bug-ops bug-ops force-pushed the feat/phase1-api-parity branch from 2811bff to 52543b4 Compare December 18, 2025 00:14
@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   90.75%   90.85%   +0.10%     
==========================================
  Files          32       32              
  Lines        6175     6245      +70     
==========================================
+ Hits         5604     5674      +70     
  Misses        571      571              
Flag Coverage Δ
rust-core 90.85% <100.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crates/feedparser-rs-core/src/parser/atom.rs 89.16% <100.00%> (+0.34%) ⬆️
crates/feedparser-rs-core/src/parser/common.rs 75.36% <100.00%> (+1.92%) ⬆️
crates/feedparser-rs-core/src/parser/rss.rs 94.43% <100.00%> (+0.05%) ⬆️
crates/feedparser-rs-core/src/types/feed.rs 96.36% <ø> (ø)
crates/feedparser-rs-core/src/util/base_url.rs 99.61% <100.00%> (+0.03%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds critical Python API compatibility features to feedparser-rs: support for feed.published_parsed field and xml:base URL resolution with SSRF protection. The changes enable RSS channel <pubDate> and Atom <published> dates to be exposed at the feed level (previously only available for entries), and implement automatic resolution of relative URLs against xml:base attributes or channel links while blocking SSRF attack vectors.

Key Changes

  • Feed-level publication dates: RSS <pubDate> and Atom <published> now populate FeedMeta.published field, exposed in Python bindings as both RFC3339 string and time.struct_time
  • URL resolution: Relative URLs in feeds are automatically resolved to absolute URLs using xml:base (Atom) or channel link (RSS) as the base
  • SSRF protection: All resolved URLs are validated against localhost, private IPs, cloud metadata endpoints, and dangerous schemes (file://, data://, etc.)

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/feedparser-rs-core/src/types/feed.rs Added published: Option<DateTime<Utc>> field to FeedMeta struct
crates/feedparser-rs-core/src/util/base_url.rs Added resolve_safe() method to BaseUrlContext with SSRF validation logic
crates/feedparser-rs-core/src/parser/common.rs Added extract_xml_base() helper to extract xml:base attributes from XML elements
crates/feedparser-rs-core/src/parser/atom.rs Integrated xml:base extraction and URL resolution for feed and entry elements
crates/feedparser-rs-core/src/parser/rss.rs Changed RSS <pubDate> to populate feed.published instead of feed.updated; integrated channel link as base URL for item link and enclosure resolution
crates/feedparser-rs-py/src/types/feed_meta.rs Added Python getters for published (string) and published_parsed (time.struct_time)
crates/feedparser-rs-py/tests/test_phase1_integration.py 13 comprehensive Python integration tests covering date parsing and URL resolution
crates/feedparser-rs-core/tests/test_url_resolution.rs 10 integration tests for xml:base URL resolution behavior
crates/feedparser-rs-core/tests/test_url_security.rs 19 SSRF protection tests validating blocking of malicious URLs

Copilot review identified that absolute malicious URLs like
http://localhost/admin in href attributes bypassed SSRF protection.
Now resolve_safe() returns empty string for unsafe absolute hrefs.

- Use is_some_and() instead of map().unwrap_or(false) per clippy
- Add test for absolute malicious URL bypass scenario
- Add test for private IP in href blocking
@github-actions github-actions bot added size: XXL Huge PR (1000+ lines changed) and removed size: XL Extra large PR (<1000 lines changed) labels Dec 18, 2025
@bug-ops bug-ops requested a review from Copilot December 18, 2025 00:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Add published field to FeedMeta for API parity with Python bindings.
Returns milliseconds since epoch for easy JavaScript Date conversion.

- Add published: Option<i64> to FeedMeta struct
- Update From<CoreFeedMeta> to map DateTime to timestamp_millis
- Add TypeScript type definition
- Add test for feed-level published parsing
@github-actions github-actions bot added the lang: javascript JavaScript/TypeScript code label Dec 18, 2025
Address Copilot review feedback:
- Add case-insensitive scheme comparison per RFC 3986
- Prevents bypass via uppercase schemes (FILE://, JAVASCRIPT:)
- Improve code readability with intermediate variables
- Add assertion to verify empty string for blocked URLs
- Add tests for case-insensitive scheme bypass
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated no new comments.

@bug-ops bug-ops merged commit 066ab85 into main Dec 18, 2025
45 checks passed
@bug-ops bug-ops deleted the feat/phase1-api-parity branch December 18, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: atom Atom 1.0 support area: parser Feed parsing logic area: rss RSS 0.9x, 1.0, 2.0 support component: core feedparser-rs-core Rust library component: node Node.js bindings (napi-rs) component: python Python bindings (PyO3) component: tests Test suite or test infrastructure lang: javascript JavaScript/TypeScript code lang: python Python code lang: rust Rust code size: XXL Huge PR (1000+ lines changed) type: tooling Development tools, CI/CD, or infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants