Skip to content

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Dec 18, 2025

Summary

Phase 2 of API parity implementation. Adds xml:lang tracking to RSS parser and license field extraction for both RSS and Atom formats.

Changes

Core Library

  • Add extract_xml_lang() helper function in common.rs
  • RSS parser: Track xml:lang at channel and item levels with inheritance
  • RSS parser: Extract license from <creativeCommons:license> and <license> tags
  • Atom parser: Extract license from <link rel="license"> elements
  • Add license: Option<String> field to FeedMeta and Entry structs

Python Bindings

  • Add license getter to FeedMeta and Entry classes
  • xml:lang accessible via title_detail.language, summary_detail.language, etc.

Node.js Bindings

  • Add license field to FeedMeta and Entry structs
  • Update TypeScript definitions with license?: string

Test Plan

  • 481 tests pass
  • RSS xml:lang tests (channel, item, inheritance, empty attribute)
  • RSS license tests (creativeCommons namespace, plain tag)
  • Atom license tests (link rel="license")
  • cargo clippy --all-targets clean
  • cargo make ci-lint-stable passes

Reviews Completed

Review Status Key Findings
Performance ✅ APPROVED +0.4% overhead, optimal implementation
Security ✅ APPROVED DoS protection via max_attr_length
Testing ✅ APPROVED TypeScript definitions fixed
Code Review ✅ APPROVED Ready for merge

API Compatibility

Matches Python feedparser behavior:

  • feed.title_detail.language returns xml:lang value
  • feed.license returns license URL
  • entry.summary_detail.language returns xml:lang value
  • entry.license returns license URL

No breaking changes.

Add comprehensive xml:lang tracking to RSS parser and license field
support across all feed formats to complete Phase 2 of *_detail
infrastructure.

Changes:
- Add extract_xml_lang() helper function to parser/common.rs
- Track xml:lang at RSS channel and item levels with inheritance
- Populate TextConstruct.language for title and description fields
- Add license field to FeedMeta and Entry types
- Parse creativeCommons:license and plain license tags in RSS
- Extract license from link rel="license" in Atom feeds
- Update Python and Node.js bindings with license getters
- Add comprehensive integration tests for xml:lang and license

The RSS parser now properly tracks xml:lang attributes on channel
and item elements, propagating language information to TextConstruct
fields. Item-level xml:lang overrides channel-level language, matching
standard XML behavior.

License URLs are extracted from both RSS (creativeCommons:license,
license tags) and Atom (link rel="license") feeds, stored in the
license field for both feed and entry metadata.

Tests: 476 passed, clippy clean, all CI checks passing
Add missing license field to Entry and FeedMeta interfaces in TypeScript
definitions. This field was already implemented in the Rust code but was
not exposed in the type definitions.

Also add test for empty xml:lang attribute handling to verify edge case
behavior.
@github-actions github-actions bot added type: tooling Development tools, CI/CD, or infrastructure component: core feedparser-rs-core Rust library component: python Python bindings (PyO3) component: node Node.js bindings (napi-rs) area: parser Feed parsing logic area: rss RSS 0.9x, 1.0, 2.0 support area: atom Atom 1.0 support lang: rust Rust code lang: javascript JavaScript/TypeScript code size: L Large PR (<500 lines changed) labels Dec 18, 2025
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 98.15668% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/feedparser-rs-core/src/parser/rss.rs 97.59% 4 Missing ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #27      +/-   ##
==========================================
+ Coverage   90.85%   91.08%   +0.22%     
==========================================
  Files          32       32              
  Lines        6245     6447     +202     
==========================================
+ Hits         5674     5872     +198     
- Misses        571      575       +4     
Flag Coverage Δ
rust-core 91.08% <98.15%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crates/feedparser-rs-core/src/parser/atom.rs 89.71% <100.00%> (+0.54%) ⬆️
crates/feedparser-rs-core/src/parser/common.rs 77.02% <100.00%> (+1.66%) ⬆️
crates/feedparser-rs-core/src/types/entry.rs 87.69% <ø> (ø)
crates/feedparser-rs-core/src/types/feed.rs 96.36% <ø> (ø)
crates/feedparser-rs-core/src/parser/rss.rs 94.73% <97.59%> (+0.30%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 2 of API parity with Python feedparser by adding xml:lang attribute tracking to RSS feeds and license field extraction for both RSS and Atom formats. The implementation adds a new extract_xml_lang() helper function in the common parser utilities and extends both RSS and Atom parsers to extract license information from various tag formats. These changes enhance compatibility with Python feedparser's behavior while maintaining the project's performance and security standards.

Key Changes:

  • Added xml:lang tracking with inheritance (channel-level → item-level) in RSS parser, storing language info in TextConstruct objects
  • Added license field extraction from <creativeCommons:license>, <license> (RSS), and <link rel="license"> (Atom)
  • Extended Python and Node.js bindings to expose the new license field

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/feedparser-rs-core/src/types/feed.rs Added license: Option<String> field to FeedMeta struct
crates/feedparser-rs-core/src/types/entry.rs Added license: Option<String> field to Entry struct
crates/feedparser-rs-core/src/parser/common.rs Added extract_xml_lang() helper with DoS protection via max_attribute_length
crates/feedparser-rs-core/src/parser/rss.rs Implemented xml:lang tracking with inheritance and license extraction for RSS; added 144 lines of comprehensive tests
crates/feedparser-rs-core/src/parser/atom.rs Added license extraction from <link rel="license"> elements; added 44 lines of tests
crates/feedparser-rs-py/src/types/feed_meta.rs Added license getter for Python FeedMeta class
crates/feedparser-rs-py/src/types/entry.rs Added license getter for Python Entry class
crates/feedparser-rs-node/src/lib.rs Added license field to Node.js FeedMeta and Entry structs with type conversions
crates/feedparser-rs-node/index.d.ts Added TypeScript definitions for license?: string fields

@bug-ops bug-ops self-assigned this Dec 18, 2025
@bug-ops bug-ops merged commit b8dc6ad into main Dec 18, 2025
37 checks passed
@bug-ops bug-ops deleted the feat/phase2-detail-structures branch December 18, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: atom Atom 1.0 support area: parser Feed parsing logic area: rss RSS 0.9x, 1.0, 2.0 support component: core feedparser-rs-core Rust library component: node Node.js bindings (napi-rs) component: python Python bindings (PyO3) lang: javascript JavaScript/TypeScript code lang: rust Rust code size: L Large PR (<500 lines changed) type: tooling Development tools, CI/CD, or infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants