Skip to content

Commit 79da235

Browse files
authored
feat(py): Python feedparser compatibility improvements (P0-P1) (#39)
* feat(py): add FeedParserDict field mapping for backward compatibility Implement Phase 1 of Python feedparser compatibility improvements: - Add deprecated field aliases (description→subtitle, tagline→subtitle, modified→updated, copyright→rights, date→updated/published, url→link) - Add entry aliases (guid→id, description→summary, issued→published, modified→updated, date→updated/published) - Add container aliases (channel→feed, items→entries) - Use once_cell::Lazy<HashMap> for O(1) alias lookups - Add __getattr__ methods to PyFeedMeta, PyEntry, PyParsedFeed - Add comprehensive Python tests (19 test cases) This allows users migrating from Python feedparser to access data using familiar deprecated field names while the modern field names remain the primary API. * feat(py): add dict-style access for Python feedparser compatibility Implement Phase 2 of Python feedparser compatibility improvements: - Add __getitem__ method to PyParsedFeed for top-level dict access - Add __getitem__ method to PyFeedMeta for feed['field'] access - Add __getitem__ method to PyEntry for entry['field'] access - Support all modern and deprecated field names via dict syntax - Raise KeyError for unknown keys (correct dict behavior) - Add 10 comprehensive test cases for dict-style access Users can now access data using both patterns: feed['feed']['title'] # dict-style feed.feed.title # attribute-style Deprecated names also work: feed['channel']['description'] * feat(py): add auto-URL detection to parse() for feedparser compatibility The parse() function now automatically detects URLs (http://, https://) and fetches them when the http feature is enabled. This matches Python feedparser's behavior where parse() accepts both URLs and content. Changes: - Add optional etag, modified, user_agent params to parse() - Add optional HTTP params to parse_with_limits() - Create internal parse_internal() for shared URL/content logic - URL detection based on http:// and https:// prefix - When http feature disabled, return NotImplementedError for URLs - Update existing tests to use keyword args for limits param * docs: update documentation for Python feedparser compatibility - Add feedparser compatibility features to CHANGELOG [Unreleased] - Update Python README with dict-style access, field aliases, auto-URL - Update main README Python section with compatibility examples - Document supported field aliases table in Python README - Update API reference with new function signatures * chore: release v0.4.0 Python feedparser compatibility improvements: - Field alias mappings for deprecated field names - Dict-style access on feed objects - Container aliases (channel → feed, items → entries) - Auto-URL detection in parse() function - Optional HTTP parameters for parse() and parse_with_limits()
1 parent 6168185 commit 79da235

File tree

16 files changed

+2025
-58
lines changed

16 files changed

+2025
-58
lines changed

CHANGELOG.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.4.0] - 2025-12-28
11+
12+
### Added
13+
- **Python feedparser compatibility improvements**:
14+
- Field alias mappings for deprecated field names (`description``subtitle`, `guid``id`, etc.)
15+
- Dict-style access on feed objects (`d['feed']['title']`, `d['entries'][0]['link']`)
16+
- Container aliases (`channel``feed`, `items``entries`)
17+
- Auto-URL detection in `parse()` function (URLs are automatically fetched when http feature enabled)
18+
- Optional HTTP parameters (`etag`, `modified`, `user_agent`) for `parse()` and `parse_with_limits()`
19+
20+
### Changed
21+
- `parse_with_limits()` now uses keyword-only `limits` parameter for consistency
22+
1023
## [0.3.0] - 2025-12-18
1124

1225
### Added
@@ -147,7 +160,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
147160
- Comprehensive test coverage
148161
- Documentation with examples
149162

150-
[Unreleased]: https://github.com/bug-ops/feedparser-rs/compare/v0.3.0...HEAD
163+
[Unreleased]: https://github.com/bug-ops/feedparser-rs/compare/v0.4.0...HEAD
164+
[0.4.0]: https://github.com/bug-ops/feedparser-rs/compare/v0.3.0...v0.4.0
151165
[0.3.0]: https://github.com/bug-ops/feedparser-rs/compare/v0.2.1...v0.3.0
152166
[0.2.1]: https://github.com/bug-ops/feedparser-rs/compare/v0.2.0...v0.2.1
153167
[0.2.0]: https://github.com/bug-ops/feedparser-rs/compare/v0.1.8...v0.2.0

Cargo.lock

Lines changed: 4 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ members = [
77
resolver = "2"
88

99
[workspace.package]
10-
version = "0.3.0"
10+
version = "0.4.0"
1111
edition = "2024"
1212
rust-version = "1.88.0"
1313
authors = ["bug-ops"]
@@ -29,6 +29,7 @@ memchr = "2.7"
2929
mockito = "1.6"
3030
napi = "3.7"
3131
napi-derive = "3.4"
32+
once_cell = "1.20"
3233
pyo3 = "0.27"
3334
quick-xml = "0.38"
3435
regex = "1.11"

README.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ High-performance RSS/Atom/JSON Feed parser written in Rust, with Python and Node
1818
- **Conditional GET** — ETag/Last-Modified support for bandwidth-efficient polling
1919
- **Podcast support** — iTunes and Podcast 2.0 namespace extensions
2020
- **Multi-language bindings** — Native Python (PyO3) and Node.js (napi-rs) bindings
21-
- **Familiar API**Inspired by Python's feedparser, easy to migrate existing code
21+
- **feedparser drop-in**Dict-style access, field aliases, same API patterns as Python feedparser
2222

2323
## Supported Formats
2424

@@ -146,18 +146,28 @@ See [Node.js API documentation](crates/feedparser-rs-node/README.md) for complet
146146
### Python
147147

148148
```python
149-
import feedparser_rs
149+
import feedparser_rs as feedparser # Drop-in replacement
150150

151-
# Parse from bytes or string
152-
d = feedparser_rs.parse(b'<rss>...</rss>')
151+
# Parse from bytes, string, or URL (auto-detected)
152+
d = feedparser.parse(b'<rss>...</rss>')
153+
d = feedparser.parse('https://example.com/feed.xml') # URL auto-detected
154+
155+
# Attribute-style access
153156
print(d.version) # 'rss20'
154157
print(d.feed.title)
155158
print(d.bozo) # True if parsing had issues
156-
print(d.entries[0].published_parsed) # time.struct_time
159+
160+
# Dict-style access (feedparser-compatible)
161+
print(d['feed']['title'])
162+
print(d['entries'][0]['link'])
163+
164+
# Deprecated field aliases work
165+
print(d.feed.description) # → d.feed.subtitle
166+
print(d.channel.title) # → d.feed.title
157167
```
158168

159169
> [!NOTE]
160-
> Python bindings provide `time.struct_time` for date fields, matching feedparser's API for easy migration.
170+
> Python bindings provide full feedparser compatibility: dict-style access, field aliases, and `time.struct_time` for date fields.
161171
162172
## Cargo Features
163173

crates/feedparser-rs-node/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "feedparser-rs",
3-
"version": "0.3.0",
3+
"version": "0.4.0",
44
"description": "High-performance RSS/Atom/JSON Feed parser for Node.js",
55
"main": "index.js",
66
"types": "index.d.ts",

crates/feedparser-rs-py/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ crate-type = ["cdylib"]
1818
feedparser-rs = { path = "../feedparser-rs-core" }
1919
pyo3 = { workspace = true, features = ["extension-module", "chrono"] }
2020
chrono = { workspace = true, features = ["clock"] }
21+
once_cell = { workspace = true }
2122

2223
[features]
2324
default = ["http"]

crates/feedparser-rs-py/README.md

Lines changed: 59 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ High-performance RSS/Atom/JSON Feed parser for Python with feedparser-compatible
1414
- **Tolerant parsing**: Bozo flag for graceful handling of malformed feeds
1515
- **Multi-format**: RSS 0.9x/1.0/2.0, Atom 0.3/1.0, JSON Feed 1.0/1.1
1616
- **Podcast support**: iTunes and Podcast 2.0 namespace extensions
17-
- **Familiar API**: Inspired by feedparser, easy migration path
17+
- **feedparser-compatible**: Dict-style access, field aliases, same API patterns
1818
- **DoS protection**: Built-in resource limits
1919

2020
## Installation
@@ -33,15 +33,20 @@ pip install feedparser-rs
3333
```python
3434
import feedparser_rs
3535

36-
# Parse from string or bytes
36+
# Parse from string, bytes, or URL (auto-detected)
3737
d = feedparser_rs.parse('<rss>...</rss>')
3838
d = feedparser_rs.parse(b'<rss>...</rss>')
39+
d = feedparser_rs.parse('https://example.com/feed.xml') # URL auto-detected
3940

40-
# Access data
41+
# Attribute-style access (feedparser-compatible)
4142
print(d.feed.title)
4243
print(d.version) # "rss20", "atom10", etc.
4344
print(d.bozo) # True if parsing errors occurred
4445

46+
# Dict-style access (feedparser-compatible)
47+
print(d['feed']['title'])
48+
print(d['entries'][0]['link'])
49+
4550
for entry in d.entries:
4651
print(entry.title)
4752
print(entry.published_parsed) # time.struct_time
@@ -55,35 +60,63 @@ for entry in d.entries:
5560
```python
5661
import feedparser_rs
5762

58-
# Fetch and parse in one call
63+
# Option 1: Auto-detection (recommended)
64+
d = feedparser_rs.parse('https://example.com/feed.xml')
65+
66+
# Option 2: Explicit URL function
5967
d = feedparser_rs.parse_url('https://example.com/feed.xml')
6068

61-
print(d.feed.title)
62-
print(f"Fetched {len(d.entries)} entries")
69+
# With conditional GET for efficient polling
70+
d = feedparser_rs.parse(
71+
'https://example.com/feed.xml',
72+
etag=cached_etag,
73+
modified=cached_modified
74+
)
75+
if d.status == 304:
76+
print("Feed not modified")
6377

6478
# With custom limits
6579
limits = feedparser_rs.ParserLimits(max_entries=100)
66-
d = feedparser_rs.parse_url_with_limits('https://example.com/feed.xml', limits)
80+
d = feedparser_rs.parse_with_limits('https://example.com/feed.xml', limits=limits)
6781
```
6882

6983
> [!TIP]
70-
> `parse_url` supports automatic compression (gzip, deflate, brotli) and follows redirects.
84+
> URL fetching supports automatic compression (gzip, deflate, brotli) and follows redirects.
7185
7286
## Migration from feedparser
7387

88+
feedparser-rs is designed as a drop-in replacement for Python feedparser:
89+
7490
```python
75-
# Option 1: alias import
91+
# Drop-in replacement
7692
import feedparser_rs as feedparser
77-
d = feedparser.parse(feed_content)
7893

79-
# Option 2: direct import
80-
import feedparser_rs
81-
d = feedparser_rs.parse(feed_content)
94+
# Same API patterns work
95+
d = feedparser.parse('https://example.com/feed.xml')
96+
print(d.feed.title)
97+
print(d['feed']['title']) # Dict-style access works too
98+
print(d.entries[0].link)
8299

83-
# Option 3: URL fetching (new!)
84-
d = feedparser_rs.parse_url('https://example.com/feed.xml')
100+
# Deprecated field names supported
101+
print(d.feed.description) # → d.feed.subtitle
102+
print(d.channel.title) # → d.feed.title
103+
print(d.items[0].guid) # → d.entries[0].id
85104
```
86105

106+
### Supported Field Aliases
107+
108+
| Old Name | Maps To |
109+
|----------|---------|
110+
| `feed.description` | `feed.subtitle` or `feed.summary` |
111+
| `feed.tagline` | `feed.subtitle` |
112+
| `feed.copyright` | `feed.rights` |
113+
| `feed.modified` | `feed.updated` |
114+
| `channel` | `feed` |
115+
| `items` | `entries` |
116+
| `entry.guid` | `entry.id` |
117+
| `entry.description` | `entry.summary` |
118+
| `entry.issued` | `entry.published` |
119+
87120
## Advanced Usage
88121

89122
### Custom Resource Limits
@@ -98,7 +131,7 @@ limits = feedparser_rs.ParserLimits(
98131
max_links_per_entry=50,
99132
)
100133

101-
d = feedparser_rs.parse_with_limits(feed_data, limits)
134+
d = feedparser_rs.parse_with_limits(feed_data, limits=limits)
102135
```
103136

104137
### Format Detection
@@ -132,20 +165,23 @@ for entry in d.entries:
132165

133166
### Functions
134167

135-
- `parse(source)` — Parse feed from bytes or str
136-
- `parse_url(url)` — Fetch and parse feed from URL
137-
- `parse_with_limits(source, limits)` — Parse with custom resource limits
138-
- `parse_url_with_limits(url, limits)` — Fetch and parse with custom limits
168+
- `parse(source, etag=None, modified=None, user_agent=None)` — Parse feed from bytes, str, or URL (auto-detected)
169+
- `parse_url(url, etag=None, modified=None, user_agent=None)` — Fetch and parse feed from URL
170+
- `parse_with_limits(source, etag=None, modified=None, user_agent=None, limits=None)` — Parse with custom resource limits
171+
- `parse_url_with_limits(url, etag=None, modified=None, user_agent=None, limits=None)` — Fetch and parse with custom limits
139172
- `detect_format(source)` — Detect feed format without full parsing
140173

141174
### Classes
142175

143-
- `FeedParserDict` — Parsed feed result
144-
- `.feed` — Feed metadata
145-
- `.entries` — List of entries
176+
- `FeedParserDict` — Parsed feed result (supports both attribute and dict-style access)
177+
- `.feed` / `['feed']` — Feed metadata
178+
- `.entries` / `['entries']` — List of entries
146179
- `.bozo` — True if parsing errors occurred
147180
- `.version` — Feed version string
148181
- `.encoding` — Character encoding
182+
- `.status` — HTTP status code (for URL fetches)
183+
- `.etag` — ETag header (for conditional GET)
184+
- `.modified` — Last-Modified header (for conditional GET)
149185

150186
- `ParserLimits` — Resource limits configuration
151187

crates/feedparser-rs-py/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "maturin"
44

55
[project]
66
name = "feedparser-rs"
7-
version = "0.3.0"
7+
version = "0.4.0"
88
description = "High-performance RSS/Atom/JSON Feed parser with feedparser-compatible API"
99
readme = "README.md"
1010
license = { text = "MIT OR Apache-2.0" }

0 commit comments

Comments
 (0)