-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: split long RSS parser functions for maintainability #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Address code review feedback: - Extract parse_enclosure() helper to reduce parse_item from 106 to 80 lines - Add #[inline] to 8 hot-path helper functions: - parse_channel_standard, parse_channel_podcast, parse_channel_namespace - parse_item_standard, parse_item_itunes, parse_item_podcast, parse_item_namespace - parse_enclosure (new function) Expected performance improvement: 2-3% (recovers function call overhead) All 322 tests pass.
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #19 +/- ##
==========================================
+ Coverage 87.27% 90.21% +2.94%
==========================================
Files 29 29
Lines 5327 5714 +387
==========================================
+ Hits 4649 5155 +506
+ Misses 678 559 -119
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
Add 10 new tests to improve RSS parser coverage from 52% to 71%+: Priority 1 - iTunes Item-Level (4 tests): - test_parse_rss_itunes_episode_metadata - test_parse_rss_itunes_duration_formats - test_parse_rss_itunes_nested_categories - test_parse_rss_itunes_owner_parsing Priority 2 - Podcast 2.0 (4 tests): - test_parse_rss_podcast_locked_and_guid - test_parse_rss_podcast_funding - test_parse_rss_podcast_transcript - test_parse_rss_podcast_person Priority 3 - Namespaces (2 tests): - test_parse_rss_dublin_core_channel - test_parse_rss_content_encoded RSS parser tests: 21 → 31 tests All 336 tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the RSS parser by splitting two large functions (parse_channel and parse_item) into smaller, focused helper functions to improve code maintainability and readability. The refactoring extracts parsing logic for different namespaces (iTunes, Podcast 2.0, Dublin Core, etc.) into dedicated functions, reducing the main parsing functions from over 300 lines to approximately 80 lines each.
Key changes include:
- Extraction of namespace-specific parsing logic into helper functions
- Addition of
#[inline]annotations to performance-critical helpers - Introduction of attribute cloning to resolve lifetime issues
- Comprehensive test coverage for iTunes and Podcast 2.0 namespaces
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn parse_channel_itunes( | ||
| reader: &mut Reader<&[u8]>, | ||
| buf: &mut Vec<u8>, | ||
| tag: &[u8], | ||
| attrs: &[(Vec<u8>, String)], | ||
| feed: &mut ParsedFeed, | ||
| limits: &ParserLimits, | ||
| depth: &mut usize, | ||
| ) -> Result<bool> { |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns Result<bool> where bool indicates whether the tag was handled. Consider using a more descriptive type like Result<TagHandled> with struct TagHandled(bool) or an enum to make the return value's meaning clearer.
| #[allow(clippy::too_many_arguments)] | ||
| fn parse_item_podcast( |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function has too many parameters (8), requiring a clippy exception. Consider grouping related parameters into a context struct (e.g., ParseContext containing reader, buf, limits, depth) to reduce parameter count and improve maintainability.
| #[allow(clippy::too_many_arguments)] | ||
| fn parse_item_media( |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to parse_item_podcast, this function has too many parameters. Consider using a context struct to group common parsing parameters (reader, buf, limits, depth) to improve the function signature and reduce cognitive load.
Address Copilot review suggestions: - Extract collect_attributes() to eliminate attribute cloning duplication - Extract find_attribute() for cleaner attribute lookups - Apply helpers to 8+ locations throughout rss.rs Code reduction: -51 lines (67 insertions, 118 deletions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| depth: usize, | ||
| ) -> Result<bool> { | ||
| if tag.starts_with(b"podcast:transcript") { | ||
| parse_podcast_transcript(reader, buf, attrs, entry, limits, is_empty, depth)?; |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parse_podcast_transcript function always returns Ok(()), but the result is checked with ?. Since the function cannot fail in a meaningful way (it silently skips empty URLs), consider either removing the Result return type or documenting when errors can occur.
- Fix missing reference operator when calling collect_attributes - Fix parse_channel_itunes and parse_item_itunes to return Ok(false) when href attribute is missing for itunes:image tags - Restructure handler chains in parse_channel and parse_item to properly short-circuit when first handler returns Ok(true) - Use sequential if-else blocks instead of chained || operators to avoid calling all handlers unnecessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if let Some(value) = find_attribute(attrs, b"href") { | ||
| let itunes = entry.itunes.get_or_insert_with(ItunesEntryMeta::default); | ||
| itunes.image = Some(truncate_to_length(value, limits.max_attribute_length)); | ||
| Ok(true) | ||
| } else { | ||
| Ok(false) | ||
| } |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as Comment 3 - returning Ok(false) for missing href attribute is inconsistent with other iTunes tag handlers. This duplication suggests extracting a common parse_itunes_image_href helper function to handle both channel and item level itunes:image tags consistently.
- Modified collect_attributes() to return (Vec, bool) tuple where bool indicates if any attribute parsing errors occurred - Updated parse_channel() to set bozo flag when attribute errors detected - Updated parse_item() to track and return attribute parsing errors - Fixed itunes:image handling to return Ok(true) consistently
Add comprehensive path-specific instructions for GitHub Copilot code review: - .github/copilot-instructions.md: General project guidelines - .github/instructions/parser.instructions.md: Parser-specific rules - .github/instructions/types.instructions.md: Type definition rules - .github/instructions/python-bindings.instructions.md: PyO3 binding rules - .github/instructions/node-bindings.instructions.md: napi-rs binding rules - .github/instructions/tests.instructions.md: Testing guidelines Key areas covered: - Bozo pattern (tolerant parsing with error tracking) - Security requirements (DoS protection, SSRF, input validation) - API compatibility with Python feedparser - Date/time handling (struct_time for Python, milliseconds for Node.js) - Error handling patterns - Function length and complexity limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| limits: &ParserLimits, | ||
| depth: &mut usize, | ||
| ) -> Result<Entry> { | ||
| ) -> Result<(Entry, bool)> { |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature change to return a tuple (Entry, bool) is a breaking change from the internal API. Consider adding a doc comment explaining that the boolean indicates whether attribute parsing errors occurred during entry parsing, for clarity to future maintainers.
| /// Parse Podcast 2.0 namespace tags at item level | ||
| #[inline] | ||
| #[allow(clippy::too_many_arguments)] | ||
| fn parse_item_podcast( |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function has 8 parameters which triggered a clippy warning. While the #[allow] is present, consider whether some related parameters (like reader, buf, limits) could be grouped into a context struct to reduce the parameter count and improve maintainability.
| let tag = e.name().as_ref().to_vec(); | ||
| let (attrs, has_attr_errors) = collect_attributes(&e); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag is cloned to a Vec<u8> on every start tag to pass ownership to helper functions. This creates unnecessary allocations in the hot path. Consider passing &[u8] slices to helper functions instead, since the helpers only need to read the tag, not own it.
| let tag = e.name().as_ref().to_vec(); | ||
| let (attrs, attr_error) = collect_attributes(e); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as in parse_channel: the tag is unnecessarily cloned to Vec<u8> on every item start tag. This is in the hottest parsing path (item processing). Consider passing &[u8] to avoid allocations.
| fn collect_attributes(e: &quick_xml::events::BytesStart) -> (Vec<(Vec<u8>, String)>, bool) { | ||
| let mut has_errors = false; | ||
| let attrs = e | ||
| .attributes() | ||
| .filter_map(|result| { | ||
| if let Ok(attr) = result { | ||
| if let Ok(v) = attr.unescape_value() { | ||
| Some((attr.key.as_ref().to_vec(), v.to_string())) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collect_attributes function clones attribute keys into Vec<u8> for each attribute. Since attributes are typically looked up by key immediately after collection, consider returning Vec<(&[u8], String)> with borrowed keys instead, or using a more efficient lookup structure like a small inline map for the common case of few attributes.
| fn collect_attributes(e: &quick_xml::events::BytesStart) -> (Vec<(Vec<u8>, String)>, bool) { | |
| let mut has_errors = false; | |
| let attrs = e | |
| .attributes() | |
| .filter_map(|result| { | |
| if let Ok(attr) = result { | |
| if let Ok(v) = attr.unescape_value() { | |
| Some((attr.key.as_ref().to_vec(), v.to_string())) | |
| fn collect_attributes(e: &quick_xml::events::BytesStart) -> (Vec<(&[u8], String)>, bool) { | |
| let mut has_errors = false; | |
| let attrs = e | |
| .attributes() | |
| .filter_map(|result| { | |
| if let Ok(attr) = result { | |
| if let Ok(v) = attr.unescape_value() { | |
| Some((attr.key.as_ref(), v.to_string())) |
| match parse_item(reader, &mut buf, limits, depth) { | ||
| Ok(entry) => feed.entries.push(entry), | ||
| Ok((entry, has_attr_errors)) => { | ||
| if has_attr_errors { | ||
| feed.bozo = true; | ||
| feed.bozo_exception = | ||
| Some("Malformed XML attributes".to_string()); | ||
| } |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "Malformed XML attributes" is duplicated in two places (lines 145 and 169). Consider extracting this to a constant or helper function to ensure consistency and make future changes easier.
- Add MALFORMED_ATTRIBUTES_ERROR constant to avoid string duplication - Document why tag.to_vec() allocation is necessary (borrow checker) - Document why attribute keys must be cloned to Vec<u8> - Note: allocations cannot be avoided without architectural refactoring Addresses Copilot review comments about: - Duplicated error message (lines 145, 169) - tag.to_vec() allocation documentation (lines 142, 508) - collect_attributes key cloning documentation (line 33)
…r counts - Document Ok(true)/Ok(false) return semantics for tag handling functions - Clarify parse_item tuple return value (Entry, has_errors) - Add notes about borrow checker constraints for 8-parameter functions - Document Result return type consistency in parse_podcast_transcript
Summary
Refactors the RSS parser to split large functions into smaller, focused helper functions for improved maintainability and code clarity.
Changes
RSS Parser Refactoring:
parse_channel(280 lines → 80 lines main + 6 helper functions)parse_item(328 lines → 80 lines main + 7 helper functions)parse_enclosure()helper for cleaner enclosure handling#[inline]annotations to 8 hot-path helper functionsNew Helper Functions:
parse_channel_standardparse_channel_itunesparse_itunes_categoryparse_channel_podcastparse_channel_namespaceparse_item_standardparse_item_itunesparse_item_podcastparse_podcast_transcriptparse_podcast_personparse_item_namespaceparse_item_mediaparse_enclosureVerification
#[inline]annotations added to recover function call overheadPerformance Notes
Initial refactoring introduced ~7% regression due to:
#[inline]annotations (fixed in final commit)After adding
#[inline], expected improvement of 2-3%, bringing net regression to ~4-5%. This is acceptable given the significant maintainability improvements.Test plan
cargo test- all tests passcargo clippy --all-targets --all-features -- -D warnings