-
Notifications
You must be signed in to change notification settings - Fork 0
feat(parser): add xml:lang tracking and license field support #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add comprehensive xml:lang tracking to RSS parser and license field support across all feed formats to complete Phase 2 of *_detail infrastructure. Changes: - Add extract_xml_lang() helper function to parser/common.rs - Track xml:lang at RSS channel and item levels with inheritance - Populate TextConstruct.language for title and description fields - Add license field to FeedMeta and Entry types - Parse creativeCommons:license and plain license tags in RSS - Extract license from link rel="license" in Atom feeds - Update Python and Node.js bindings with license getters - Add comprehensive integration tests for xml:lang and license The RSS parser now properly tracks xml:lang attributes on channel and item elements, propagating language information to TextConstruct fields. Item-level xml:lang overrides channel-level language, matching standard XML behavior. License URLs are extracted from both RSS (creativeCommons:license, license tags) and Atom (link rel="license") feeds, stored in the license field for both feed and entry metadata. Tests: 476 passed, clippy clean, all CI checks passing
Add missing license field to Entry and FeedMeta interfaces in TypeScript definitions. This field was already implemented in the Rust code but was not exposed in the type definitions. Also add test for empty xml:lang attribute handling to verify edge case behavior.
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #27 +/- ##
==========================================
+ Coverage 90.85% 91.08% +0.22%
==========================================
Files 32 32
Lines 6245 6447 +202
==========================================
+ Hits 5674 5872 +198
- Misses 571 575 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements Phase 2 of API parity with Python feedparser by adding xml:lang attribute tracking to RSS feeds and license field extraction for both RSS and Atom formats. The implementation adds a new extract_xml_lang() helper function in the common parser utilities and extends both RSS and Atom parsers to extract license information from various tag formats. These changes enhance compatibility with Python feedparser's behavior while maintaining the project's performance and security standards.
Key Changes:
- Added xml:lang tracking with inheritance (channel-level → item-level) in RSS parser, storing language info in TextConstruct objects
- Added license field extraction from
<creativeCommons:license>,<license>(RSS), and<link rel="license">(Atom) - Extended Python and Node.js bindings to expose the new license field
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
crates/feedparser-rs-core/src/types/feed.rs |
Added license: Option<String> field to FeedMeta struct |
crates/feedparser-rs-core/src/types/entry.rs |
Added license: Option<String> field to Entry struct |
crates/feedparser-rs-core/src/parser/common.rs |
Added extract_xml_lang() helper with DoS protection via max_attribute_length |
crates/feedparser-rs-core/src/parser/rss.rs |
Implemented xml:lang tracking with inheritance and license extraction for RSS; added 144 lines of comprehensive tests |
crates/feedparser-rs-core/src/parser/atom.rs |
Added license extraction from <link rel="license"> elements; added 44 lines of tests |
crates/feedparser-rs-py/src/types/feed_meta.rs |
Added license getter for Python FeedMeta class |
crates/feedparser-rs-py/src/types/entry.rs |
Added license getter for Python Entry class |
crates/feedparser-rs-node/src/lib.rs |
Added license field to Node.js FeedMeta and Entry structs with type conversions |
crates/feedparser-rs-node/index.d.ts |
Added TypeScript definitions for license?: string fields |
Summary
Phase 2 of API parity implementation. Adds xml:lang tracking to RSS parser and license field extraction for both RSS and Atom formats.
Changes
Core Library
extract_xml_lang()helper function incommon.rs<creativeCommons:license>and<license>tags<link rel="license">elementslicense: Option<String>field toFeedMetaandEntrystructsPython Bindings
licensegetter toFeedMetaandEntryclassestitle_detail.language,summary_detail.language, etc.Node.js Bindings
licensefield toFeedMetaandEntrystructslicense?: stringTest Plan
cargo clippy --all-targetscleancargo make ci-lint-stablepassesReviews Completed
API Compatibility
Matches Python feedparser behavior:
feed.title_detail.languagereturns xml:lang valuefeed.licensereturns license URLentry.summary_detail.languagereturns xml:lang valueentry.licensereturns license URLNo breaking changes.