diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
new file mode 100644
index 0000000..346c195
--- /dev/null
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,217 @@
+# feedparser-rs GitHub Copilot Instructions
+
+## Project Mission
+
+High-performance RSS/Atom/JSON Feed parser in Rust with Python (PyO3) and Node.js (napi-rs) bindings. This is a drop-in replacement for Python's `feedparser` library with 10-100x performance improvement.
+
+**CRITICAL**: API compatibility with Python feedparser is the #1 priority. Field names, types, and behavior must match exactly.
+
+**MSRV:** Rust 1.88.0 | **Edition:** 2024 | **License:** MIT/Apache-2.0
+
+## Architecture Overview
+
+### Workspace Structure
+- **`crates/feedparser-rs-core`** — Pure Rust parser. All parsing logic lives here. NO dependencies on other workspace crates.
+- **`crates/feedparser-rs-py`** — Python bindings via PyO3/maturin. Depends on core.
+- **`crates/feedparser-rs-node`** — Node.js bindings via napi-rs. Depends on core.
+
+### Parser Pipeline
+1. **Format Detection** (`parser/detect.rs`) — Identifies RSS 0.9x/1.0/2.0, Atom 0.3/1.0, or JSON Feed 1.0/1.1
+2. **Parsing** — Routes to `parser/rss.rs`, `parser/atom.rs`, or `parser/json.rs`
+3. **Namespace Extraction** — Handlers in `namespace/` process iTunes, Dublin Core, Media RSS, Podcast 2.0
+4. **Tolerant Error Handling** — Returns `ParsedFeed` with `bozo` flag set on errors, continues parsing
+
+## Idiomatic Rust & Performance
+
+### Type Safety First
+- Prefer strong types over primitives: `FeedVersion` enum, not `&str`
+- Use `Option` and `Result` — never sentinel values
+- Leverage generics and trait bounds for reusable code:
+```rust
+fn collect_limited>(iter: I, limit: usize) -> Vec {
+ iter.take(limit).collect()
+}
+```
+
+### Zero-Cost Abstractions
+- Use `&str` over `String` in function parameters
+- Prefer iterators over index-based loops: `.iter().filter().map()`
+- Use `Cow<'_, str>` when ownership is conditionally needed
+- Avoid allocations in hot paths — reuse buffers where possible
+
+### Edition 2024 Features
+- Use `gen` blocks for custom iterators where applicable
+- Leverage improved async patterns for HTTP module
+- Apply new lifetime elision rules for cleaner signatures
+
+### Safety Guidelines
+- `#![warn(unsafe_code)]` is enabled — avoid `unsafe` unless absolutely necessary
+- All public APIs must have doc comments (`#![warn(missing_docs)]`)
+- Use `thiserror` for error types with proper `#[error]` attributes
+
+## Critical Conventions
+
+### Error Handling: Bozo Pattern (MANDATORY)
+**Never panic or return errors for malformed input.** Set `bozo = true` and continue:
+```rust
+match parse_date(&text) {
+ Some(dt) => entry.published = Some(dt),
+ None => {
+ feed.bozo = true;
+ feed.bozo_exception = Some(format!("Invalid date: {text}"));
+ // Continue parsing!
+ }
+}
+```
+
+### API Compatibility with Python feedparser
+Field names must match `feedparser` exactly: `feed.title`, `entries[0].summary`, `version` returns `"rss20"`, `"atom10"`
+
+### XML Parsing with quick-xml
+Use tolerant mode — no strict validation:
+```rust
+let mut reader = Reader::from_reader(data);
+reader.config_mut().trim_text(true);
+// Do NOT enable check_end_names — tolerance over strictness
+```
+
+## Development Commands
+
+All automation via `cargo-make`:
+
+| Command | Purpose |
+|---------|---------|
+| `cargo make fmt` | Format with nightly rustfmt |
+| `cargo make clippy` | Lint (excludes py bindings) |
+| `cargo make test-rust` | Rust tests (nextest) |
+| `cargo make pre-commit` | fmt + clippy + test-rust |
+| `cargo make bench` | Criterion benchmarks |
+| `cargo make msrv-check` | Verify MSRV 1.88.0 compatibility |
+
+### Bindings
+```bash
+# Python
+cd crates/feedparser-rs-py && maturin develop && pytest tests/ -v
+
+# Node.js
+cd crates/feedparser-rs-node && pnpm install && pnpm build && pnpm test
+```
+
+## Testing Patterns
+
+Use `include_str!()` for fixtures in `tests/fixtures/`:
+```rust
+#[test]
+fn test_rss20_basic() {
+ let xml = include_str!("../../tests/fixtures/rss/example.xml");
+ let feed = parse(xml.as_bytes()).unwrap();
+ assert!(!feed.bozo);
+}
+```
+
+Always verify malformed feeds set bozo but still parse:
+```rust
+#[test]
+fn test_malformed_sets_bozo() {
+ let xml = b"Broken";
+ let feed = parse(xml).unwrap();
+ assert!(feed.bozo);
+ assert_eq!(feed.feed.title.as_deref(), Some("Broken")); // Still parsed!
+}
+```
+
+## Security Requirements
+
+### SSRF Protection (CRITICAL for HTTP Module)
+Block these URL patterns before fetching:
+- Localhost/loopback: `127.0.0.1`, `[::1]`, `localhost`
+- Private networks: `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
+- Link-local: `169.254.0.0/16` (AWS/GCP metadata endpoints), `fe80::/10`
+- Special addresses: `0.0.0.0/8`, `255.255.255.255`, `::/128`
+
+Always validate URLs through `is_safe_url()` before HTTP requests.
+
+### XSS Protection (HTML Sanitization)
+Use `ammonia` for HTML content from feeds:
+- Allowed tags: `a, abbr, b, blockquote, br, code, div, em, h1-h6, hr, i, img, li, ol, p, pre, span, strong, ul`
+- Enforce `rel="nofollow noopener"` on links
+- Allow only `http`, `https`, `mailto` URL schemes
+- Never pass raw HTML to Python/Node.js bindings without sanitization
+
+### DoS Protection
+Apply limits via `ParserLimits`:
+- `max_feed_size`: Default 50MB
+- `max_nesting_depth`: Default 100 levels
+- `max_entries`: Default 10,000 items
+- `max_text_length`: Default 1MB per text field
+- `max_attribute_length`: Default 10KB per attribute
+
+## Code Quality Standards
+
+### Function Length Guidelines
+- **Target**: Functions should be <50 lines
+- **Maximum**: NEVER exceed 100 lines
+- **If >50 lines**: Extract inline logic to helper functions
+
+Example refactoring pattern:
+```rust
+// Before: 200+ line function
+fn parse_channel(...) {
+ match tag {
+ b"itunes:category" => { /* 80 lines inline */ }
+ // ...
+ }
+}
+
+// After: Delegate to helpers
+fn parse_channel(...) {
+ match tag {
+ tag if is_itunes_tag_any(tag) => parse_channel_itunes(tag, ...)?,
+ // ...
+ }
+}
+```
+
+### Documentation Requirements
+All public APIs must have doc comments:
+```rust
+/// Parses an RSS/Atom feed from bytes.
+///
+/// # Arguments
+/// * `data` - Raw feed content as bytes
+///
+/// # Returns
+/// Returns `ParsedFeed` with extracted metadata. If parsing encounters errors,
+/// `bozo` flag is set to `true` and `bozo_exception` contains the error description.
+///
+/// # Examples
+/// ```
+/// let xml = b"...";
+/// let feed = parse(xml)?;
+/// assert_eq!(feed.version, FeedVersion::Rss20);
+/// ```
+pub fn parse(data: &[u8]) -> Result { ... }
+```
+
+### Inline Comments
+Minimize inline comments. Use comments ONLY for:
+1. **Why** decisions (not **what** the code does)
+2. Non-obvious constraints or workarounds
+3. References to specifications (RFC 4287 section 4.1.2, etc.)
+
+## Commit & Branch Conventions
+- Branch: `feat/`, `fix/`, `docs/`, `refactor/`, `test/`
+- Commits: [Conventional Commits](https://conventionalcommits.org/)
+- Never mention "Claude" or "co-authored" in commit messages
+
+## What NOT to Do
+- ❌ Don't use `.unwrap()` or `.expect()` in parser code — use bozo pattern
+- ❌ Don't add dependencies without workspace-level declaration in root `Cargo.toml`
+- ❌ Don't skip `--exclude feedparser-rs-py` in workspace-wide Rust commands (PyO3 needs special handling)
+- ❌ Don't break API compatibility with Python feedparser field names
+- ❌ Don't panic on malformed feeds — set `bozo = true` and continue parsing
+- ❌ Don't fetch URLs without SSRF validation (`is_safe_url()`)
+- ❌ Don't pass raw HTML to bindings without sanitization (`sanitize_html()`)
+- ❌ Don't create functions >100 lines — extract helpers
+- ❌ Don't use generic names like `utils`, `helpers`, `common` for modules
+- ❌ Don't add emojis to code or comments
diff --git a/.github/instructions/node-bindings.instructions.md b/.github/instructions/node-bindings.instructions.md
new file mode 100644
index 0000000..357a591
--- /dev/null
+++ b/.github/instructions/node-bindings.instructions.md
@@ -0,0 +1,397 @@
+# Node.js Bindings Code Review Instructions
+
+This file contains specific code review rules for the Node.js bindings in `crates/feedparser-rs-node/`.
+
+## Scope
+
+These instructions apply to:
+- `crates/feedparser-rs-node/src/lib.rs`
+- `crates/feedparser-rs-node/build.rs`
+- Any future files in `crates/feedparser-rs-node/src/`
+
+## Overview
+
+The Node.js bindings use **napi-rs** to expose the Rust core parser to JavaScript/TypeScript. The bindings must provide an ergonomic JavaScript API while maintaining security and performance.
+
+## Critical Rules
+
+### 1. Input Validation (CWE-770)
+
+**ALWAYS validate input size BEFORE processing to prevent DoS attacks.**
+
+```rust
+// CORRECT: Validate size before processing
+#[napi]
+pub fn parse(source: Either) -> Result {
+ let input_len = match &source {
+ Either::A(buf) => buf.len(),
+ Either::B(s) => s.len(),
+ };
+
+ if input_len > MAX_FEED_SIZE {
+ return Err(Error::from_reason(format!(
+ "Feed size ({} bytes) exceeds maximum allowed ({} bytes)",
+ input_len, MAX_FEED_SIZE
+ )));
+ }
+
+ // Now safe to process
+ let bytes: &[u8] = match &source {
+ Either::A(buf) => buf.as_ref(),
+ Either::B(s) => s.as_bytes(),
+ };
+ // ...
+}
+```
+
+```rust
+// WRONG: No size validation
+#[napi]
+pub fn parse(source: Either) -> Result {
+ let bytes: &[u8] = match &source {
+ Either::A(buf) => buf.as_ref(),
+ Either::B(s) => s.as_bytes(),
+ };
+ // ... immediate processing without size check
+}
+```
+
+### 2. Error Handling
+
+**Use `Error::from_reason()` for user-facing errors with clear messages.**
+
+```rust
+// CORRECT: Clear error message with context
+.map_err(|e| Error::from_reason(format!("Parse error: {}", e)))?;
+
+// WRONG: Generic error
+.map_err(|e| Error::from_reason(e.to_string()))?;
+```
+
+**Never expose internal error details that could aid attackers:**
+
+```rust
+// CORRECT: Safe error message
+return Err(Error::from_reason("Feed size exceeds maximum allowed"));
+
+// WRONG: Exposes internal details
+return Err(Error::from_reason(format!("Internal buffer at {:p}", ptr)));
+```
+
+### 3. NAPI Struct Definitions
+
+**Use `#[napi(object)]` for plain data objects:**
+
+```rust
+// CORRECT: Plain data object
+#[napi(object)]
+pub struct ParsedFeed {
+ pub feed: FeedMeta,
+ pub entries: Vec,
+ pub bozo: bool,
+ // ...
+}
+```
+
+**Use `#[napi(js_name = "...")]` for JavaScript naming conventions:**
+
+```rust
+// CORRECT: Use js_name for JavaScript conventions
+#[napi(object)]
+pub struct Link {
+ pub href: String,
+ #[napi(js_name = "type")] // 'type' is reserved in JS, use js_name
+ pub link_type: Option,
+}
+```
+
+**Reserved JavaScript keywords that need `js_name`:**
+- `type` -> use field name like `link_type`, `content_type`, `enclosure_type`
+- `class`, `function`, `var`, `let`, `const` (if ever needed)
+
+### 4. Date/Time Handling
+
+**Convert DateTime to milliseconds since epoch for JavaScript compatibility:**
+
+```rust
+// CORRECT: Milliseconds for JavaScript Date compatibility
+pub updated: Option,
+
+impl From for FeedMeta {
+ fn from(core: CoreFeedMeta) -> Self {
+ Self {
+ updated: core.updated.map(|dt| dt.timestamp_millis()),
+ // ...
+ }
+ }
+}
+```
+
+```rust
+// WRONG: Seconds (JavaScript Date uses milliseconds)
+pub updated: Option,
+updated: core.updated.map(|dt| dt.timestamp()),
+```
+
+### 5. Type Conversions
+
+**Handle potential overflow in u64 to i64 conversions:**
+
+```rust
+// CORRECT: Safe conversion with fallback
+pub length: Option,
+length: core.length.map(|l| i64::try_from(l).unwrap_or(i64::MAX)),
+
+// WRONG: Unchecked cast
+length: core.length.map(|l| l as i64),
+```
+
+### 6. Input Type Handling
+
+**Accept both Buffer and String using `Either`:**
+
+```rust
+// CORRECT: Accept both types
+#[napi]
+pub fn parse(source: Either) -> Result {
+ let bytes: &[u8] = match &source {
+ Either::A(buf) => buf.as_ref(),
+ Either::B(s) => s.as_bytes(),
+ };
+ // ...
+}
+```
+
+### 7. Feature Flags
+
+**Use conditional compilation for optional features:**
+
+```rust
+// CORRECT: Feature-gated HTTP functionality
+#[cfg(feature = "http")]
+#[napi]
+pub fn parse_url(url: String, ...) -> Result {
+ // ...
+}
+
+// CORRECT: Feature-gated fields
+#[napi(object)]
+pub struct ParsedFeed {
+ #[cfg(feature = "http")]
+ pub headers: Option>,
+}
+```
+
+### 8. Vector Pre-allocation
+
+**Pre-allocate vectors when converting collections:**
+
+```rust
+// CORRECT: Pre-allocate for better performance
+impl From for Entry {
+ fn from(core: CoreEntry) -> Self {
+ let links_cap = core.links.len();
+ let content_cap = core.content.len();
+
+ Self {
+ links: {
+ let mut v = Vec::with_capacity(links_cap);
+ v.extend(core.links.into_iter().map(Link::from));
+ v
+ },
+ content: {
+ let mut v = Vec::with_capacity(content_cap);
+ v.extend(core.content.into_iter().map(Content::from));
+ v
+ },
+ // ...
+ }
+ }
+}
+```
+
+```rust
+// ACCEPTABLE for small collections: Direct collect
+links: core.links.into_iter().map(Link::from).collect(),
+```
+
+### 9. Documentation
+
+**All public functions must have JSDoc-compatible documentation:**
+
+```rust
+// CORRECT: Comprehensive documentation
+/// Parse an RSS/Atom/JSON Feed from bytes or string
+///
+/// # Arguments
+///
+/// * `source` - Feed content as Buffer, string, or Uint8Array
+///
+/// # Returns
+///
+/// Parsed feed result with metadata and entries
+///
+/// # Errors
+///
+/// Returns error if input exceeds size limit or parsing fails catastrophically
+#[napi]
+pub fn parse(source: Either) -> Result {
+```
+
+**Include JavaScript examples in documentation:**
+
+```rust
+/// # Examples
+///
+/// ```javascript
+/// const feedparser = require('feedparser-rs');
+///
+/// const feed = await feedparser.parseUrl("https://example.com/feed.xml");
+/// console.log(feed.feed.title);
+/// ```
+```
+
+### 10. Struct Field Documentation
+
+**Document all fields for TypeScript type generation:**
+
+```rust
+#[napi(object)]
+pub struct Entry {
+ /// Unique entry identifier
+ pub id: Option,
+ /// Entry title
+ pub title: Option,
+ /// Publication date (milliseconds since epoch)
+ pub published: Option,
+ // ...
+}
+```
+
+## API Conventions
+
+### Function Naming
+
+Use camelCase for JavaScript function names (napi-rs does this automatically):
+- Rust: `parse_with_options` -> JS: `parseWithOptions`
+- Rust: `detect_format` -> JS: `detectFormat`
+- Rust: `parse_url` -> JS: `parseUrl`
+
+### Optional Parameters
+
+Use `Option` for optional parameters:
+
+```rust
+#[napi]
+pub fn parse_url(
+ url: String,
+ etag: Option,
+ modified: Option,
+ user_agent: Option,
+) -> Result
+```
+
+### Return Types
+
+- Return `Result` for operations that can fail
+- Return `T` directly for infallible operations
+- Never panic in public functions
+
+## Security Requirements
+
+### 1. URL Validation
+
+For HTTP functions, validate URL schemes:
+
+```rust
+// The core crate handles SSRF protection, but document it
+/// # Security
+///
+/// - Only HTTP and HTTPS URLs are accepted
+/// - Private IP addresses and localhost are blocked
+/// - Redirects follow the same security rules
+```
+
+### 2. Size Limits
+
+Always enforce size limits:
+
+```rust
+const DEFAULT_MAX_FEED_SIZE: usize = 100 * 1024 * 1024; // 100 MB
+
+#[napi]
+pub fn parse_with_options(
+ source: Either,
+ max_size: Option, // Allow user to customize
+) -> Result
+```
+
+### 3. No Unsafe Code
+
+The Node.js bindings should NOT contain any `unsafe` code. All unsafe operations should be in the core crate if necessary.
+
+## Testing Requirements
+
+### 1. Test Both Input Types
+
+```javascript
+// Test with Buffer
+const buf = Buffer.from('...');
+const feed1 = feedparser.parse(buf);
+
+// Test with String
+const str = '...';
+const feed2 = feedparser.parse(str);
+```
+
+### 2. Test Size Limits
+
+```javascript
+const largeFeed = 'x'.repeat(200 * 1024 * 1024);
+expect(() => feedparser.parse(largeFeed)).toThrow(/exceeds maximum/);
+```
+
+### 3. Test Feature Flags
+
+Verify HTTP functions are only available when the `http` feature is enabled.
+
+## Common Review Issues
+
+### Issue: Missing size validation
+**Fix:** Add input length check before processing
+
+### Issue: Using `unwrap()` in public functions
+**Fix:** Use `map_err()` with `Error::from_reason()`
+
+### Issue: Date as seconds instead of milliseconds
+**Fix:** Use `timestamp_millis()` not `timestamp()`
+
+### Issue: Unsafe i64 cast
+**Fix:** Use `i64::try_from(l).unwrap_or(i64::MAX)`
+
+### Issue: Missing documentation
+**Fix:** Add `///` documentation with examples
+
+### Issue: Missing `js_name` for reserved keywords
+**Fix:** Add `#[napi(js_name = "type")]` for fields named `type`
+
+## Integration with Core Crate
+
+The Node.js bindings are a thin wrapper around `feedparser-rs-core`. Rules:
+
+1. **No parsing logic** - All parsing is in the core crate
+2. **Type conversions only** - Convert core types to napi-compatible types
+3. **Error mapping** - Map `FeedError` to napi `Error`
+4. **Feature parity** - Keep in sync with Python bindings features
+
+## Checklist for PRs
+
+- [ ] Input size validated before processing
+- [ ] All public functions have documentation
+- [ ] No `unwrap()` or `expect()` in public functions
+- [ ] Dates converted to milliseconds
+- [ ] Large number conversions are safe (u64 -> i64)
+- [ ] Reserved keywords use `js_name`
+- [ ] Feature flags properly applied
+- [ ] No unsafe code
+- [ ] Tests cover both Buffer and String inputs
diff --git a/.github/instructions/parser.instructions.md b/.github/instructions/parser.instructions.md
new file mode 100644
index 0000000..10bd128
--- /dev/null
+++ b/.github/instructions/parser.instructions.md
@@ -0,0 +1,382 @@
+# Parser Module Instructions
+
+**Applies to:** `crates/feedparser-rs-core/src/parser/**`
+
+## Core Principles
+
+### Tolerant Parsing is MANDATORY
+
+**NEVER panic or return errors for malformed feeds.** The bozo pattern is the foundation of this project:
+
+```rust
+// ✅ CORRECT - Set bozo flag and continue
+match reader.read_event_into(&mut buf) {
+ Ok(Event::Start(e)) => { /* process */ }
+ Err(e) => {
+ feed.bozo = true;
+ feed.bozo_exception = Some(e.to_string());
+ // CONTINUE PARSING - don't return error
+ }
+ _ => {}
+}
+
+// ❌ WRONG - Never panic or abort parsing
+match reader.read_event_into(&mut buf) {
+ Ok(Event::Start(e)) => { /* process */ }
+ Err(e) => return Err(e.into()), // NO! This breaks tolerance
+ _ => {}
+}
+```
+
+**Why?** Real-world feeds are often broken:
+- Missing closing tags
+- Invalid dates
+- Malformed XML
+- Wrong encoding declarations
+- Mixed namespaces
+
+Python feedparser handles all these gracefully. We must too.
+
+## Function Length Rules
+
+### CRITICAL: No function >100 lines
+
+Current technical debt in `parser/rss.rs`:
+- `parse_channel` - 280 lines (needs refactoring)
+- `parse_item` - 328 lines (needs refactoring)
+
+**When writing new parser code:**
+1. Keep functions <50 lines (target)
+2. Never exceed 100 lines (hard limit)
+3. Extract inline parsing to separate functions
+
+### Refactoring Pattern
+
+```rust
+// ✅ GOOD - Delegate to specialized functions
+fn parse_channel(reader: &mut Reader<&[u8]>, feed: &mut ParsedFeed, limits: &ParserLimits, depth: &mut usize) -> Result<()> {
+ loop {
+ match reader.read_event_into(&mut buf) {
+ Ok(Event::Start(e) | Event::Empty(e)) => {
+ match e.name().as_ref() {
+ tag if is_standard_rss_tag(tag) =>
+ parse_channel_standard(tag, reader, &mut buf, feed, limits)?,
+ tag if is_itunes_tag_any(tag) =>
+ parse_channel_itunes(tag, &e, reader, &mut buf, feed, limits, depth)?,
+ tag if is_podcast_tag(tag) =>
+ parse_channel_podcast(tag, &e, reader, &mut buf, feed, limits)?,
+ _ => skip_element(reader, &mut buf, limits, *depth)?
+ }
+ }
+ Ok(Event::End(e)) if e.local_name().as_ref() == b"channel" => break,
+ Err(e) => {
+ feed.bozo = true;
+ feed.bozo_exception = Some(e.to_string());
+ }
+ _ => {}
+ }
+ buf.clear();
+ }
+ Ok(())
+}
+
+// Helper functions (<50 lines each)
+fn parse_channel_standard(...) -> Result<()> { ... }
+fn parse_channel_itunes(...) -> Result { ... }
+fn parse_channel_podcast(...) -> Result { ... }
+```
+
+## quick-xml Usage Patterns
+
+### Reader Configuration
+
+```rust
+let mut reader = Reader::from_reader(data);
+reader.config_mut().trim_text(true);
+// DO NOT enable check_end_names - we need tolerance for mismatched tags
+```
+
+### Event Loop Pattern
+
+```rust
+let mut buf = Vec::with_capacity(EVENT_BUFFER_CAPACITY); // Reuse buffer
+let mut depth: usize = 1;
+
+loop {
+ match reader.read_event_into(&mut buf) {
+ Ok(Event::Start(e)) => {
+ depth += 1;
+ check_depth(depth, limits.max_nesting_depth)?;
+ // Process start tag
+ }
+ Ok(Event::End(e)) => {
+ depth = depth.saturating_sub(1);
+ // Check for terminating tag
+ if e.local_name().as_ref() == b"channel" {
+ break;
+ }
+ }
+ Ok(Event::Text(e)) => {
+ // Extract text content
+ let text = e.unescape().unwrap_or_default();
+ }
+ Ok(Event::Empty(e)) => {
+ // Self-closing tag (e.g., )
+ }
+ Ok(Event::Eof) => break,
+ Err(e) => {
+ feed.bozo = true;
+ feed.bozo_exception = Some(format!("XML error: {e}"));
+ // Continue parsing if possible
+ }
+ _ => {} // Ignore other events (comments, PI, etc.)
+ }
+ buf.clear(); // Reuse buffer allocation
+}
+```
+
+## Format-Specific Rules
+
+### RSS Parsers (`rss.rs`, `rss10.rs`)
+
+1. **Version Detection**: RSS 0.9x, 1.0, 2.0 have different structures
+ - RSS 2.0: `...`
+ - RSS 1.0: `...- ...
` (items outside channel)
+ - RSS 0.9x: No version attribute
+
+2. **Date Formats**: Use RFC 2822 parser first, fallback to others
+ ```rust
+ let dt = DateTime::parse_from_rfc2822(date_str)
+ .ok()
+ .or_else(|| DateTime::parse_from_rfc3339(date_str).ok())
+ .map(|d| d.with_timezone(&Utc));
+ ```
+
+3. **Namespace Handling**: Extract iTunes, Dublin Core, Media RSS attributes
+ - Use `is_itunes_tag()`, `is_dc_tag()`, etc. helpers
+ - Delegate to namespace-specific parsers
+
+### Atom Parser (`atom.rs`)
+
+1. **Text Constructs**: Atom has three content types
+ ```rust
+ fn parse_text_construct(element: &BytesStart, reader: &mut Reader, buf: &mut Vec) -> TextConstruct {
+ let content_type = element.attributes()
+ .find(|a| a.key.as_ref() == b"type")
+ .and_then(|a| a.unescape_value().ok())
+ .map(|v| match v.as_ref() {
+ "html" => TextType::Html,
+ "xhtml" => TextType::Xhtml,
+ _ => TextType::Text,
+ })
+ .unwrap_or(TextType::Text);
+ // ...
+ }
+ ```
+
+2. **Links**: Atom supports multiple link relations
+ - `rel="alternate"` → main link
+ - `rel="enclosure"` → media attachments
+ - `rel="self"` → feed URL
+
+3. **Dates**: Use ISO 8601/RFC 3339 parser first
+
+### JSON Feed Parser (`json.rs`)
+
+1. **Use serde_json**: Already structured, no XML complexity
+2. **Version Detection**: Check `version` field ("https://jsonfeed.org/version/1", "https://jsonfeed.org/version/1.1")
+3. **Bozo Pattern**: Set bozo for missing required fields (title, items)
+
+## Depth Checking (DoS Protection)
+
+Always check nesting depth to prevent stack overflow:
+
+```rust
+fn check_depth(current: usize, max: usize) -> Result<()> {
+ if current > max {
+ return Err(FeedError::InvalidFormat(format!(
+ "XML nesting depth {current} exceeds maximum {max}"
+ )));
+ }
+ Ok(())
+}
+```
+
+## Namespace Detection
+
+Use helpers from `parser/namespace_detection.rs`:
+
+```rust
+if let Some(itunes_element) = is_itunes_tag(tag) {
+ // tag is b"itunes:author" or similar
+ let itunes = feed.feed.itunes.get_or_insert_with(ItunesFeedMeta::default);
+ // Process iTunes-specific field
+}
+
+if let Some(dc_element) = is_dc_tag(tag) {
+ // Dublin Core namespace
+ dublin_core::handle_feed_element(&dc_element, &text, &mut feed.feed);
+}
+```
+
+## Text Extraction Pattern
+
+```rust
+fn read_text(reader: &mut Reader<&[u8]>, buf: &mut Vec, limits: &ParserLimits) -> Result {
+ let mut text = String::with_capacity(TEXT_BUFFER_CAPACITY);
+
+ loop {
+ match reader.read_event_into(buf) {
+ Ok(Event::Text(e)) => {
+ append_bytes(&mut text, e.as_ref(), limits.max_text_length)?;
+ }
+ Ok(Event::CData(e)) => {
+ append_bytes(&mut text, e.as_ref(), limits.max_text_length)?;
+ }
+ Ok(Event::End(_) | Event::Eof) => break,
+ Err(e) => return Err(e.into()),
+ _ => {}
+ }
+ buf.clear();
+ }
+
+ Ok(text)
+}
+```
+
+## Date Parsing Delegation
+
+Never inline date parsing. Use `util/date.rs`:
+
+```rust
+use crate::util::date::parse_date;
+
+// ✅ CORRECT
+match parse_date(&text) {
+ Some(dt) => entry.published = Some(dt),
+ None if !text.is_empty() => {
+ feed.bozo = true;
+ feed.bozo_exception = Some("Invalid date format".to_string());
+ }
+ None => {} // Empty text, no error
+}
+
+// ❌ WRONG - Inline date parsing duplicates logic
+let dt = DateTime::parse_from_rfc3339(&text).ok(); // Misses other formats
+```
+
+## Testing Requirements
+
+Every parser function must have:
+1. **Basic test**: Well-formed feed
+2. **Malformed test**: Broken feed sets bozo but still parses
+3. **Edge case tests**: Empty fields, missing required fields, excessive nesting
+
+```rust
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_rss20_valid() {
+ let xml = include_str!("../../../tests/fixtures/rss/basic.xml");
+ let feed = parse_rss20(xml.as_bytes()).unwrap();
+ assert!(!feed.bozo);
+ assert_eq!(feed.version, FeedVersion::Rss20);
+ }
+
+ #[test]
+ fn test_rss20_malformed_sets_bozo() {
+ let xml = b"Test"; // Missing
+ let feed = parse_rss20(xml).unwrap();
+ assert!(feed.bozo);
+ assert!(feed.bozo_exception.is_some());
+ assert_eq!(feed.feed.title.as_deref(), Some("Test")); // Still extracted!
+ }
+
+ #[test]
+ fn test_rss20_excessive_nesting() {
+ let xml = b"- ..."; // 100+ levels
+ let result = parse_rss20(xml);
+ assert!(result.is_err() || result.unwrap().bozo);
+ }
+}
+```
+
+## Performance Considerations
+
+1. **Reuse buffers**: `Vec::with_capacity()` + `clear()`, not repeated allocations
+2. **Avoid clones in hot paths**: Use references where possible
+3. **Bounded collections**: Apply limits via `try_push_limited()` helper
+4. **Early termination**: Stop parsing after `max_entries` reached (set bozo flag)
+
+```rust
+// ✅ GOOD - Reuse buffer
+let mut buf = Vec::with_capacity(EVENT_BUFFER_CAPACITY);
+loop {
+ reader.read_event_into(&mut buf)?;
+ // Process
+ buf.clear(); // Reuse allocation
+}
+
+// ❌ BAD - Allocate every iteration
+loop {
+ let mut buf = Vec::new(); // New heap allocation each time
+ reader.read_event_into(&mut buf)?;
+}
+```
+
+## Common Pitfalls
+
+### Don't Skip Elements Without Checking Depth
+
+```rust
+// ✅ CORRECT
+skip_element(reader, buf, limits, depth)?;
+
+// ❌ WRONG - Depth not checked, could overflow stack
+loop {
+ match reader.read_event_into(buf) {
+ Ok(Event::End(_)) => break,
+ _ => {}
+ }
+}
+```
+
+### Don't Use Panic-Happy Methods
+
+```rust
+// ❌ WRONG
+let value = attributes.find(|a| a.key.as_ref() == b"href").unwrap();
+
+// ✅ CORRECT
+if let Some(attr) = attributes.find(|a| a.key.as_ref() == b"href") {
+ if let Ok(value) = attr.unescape_value() {
+ // Use value
+ }
+}
+```
+
+### Don't Ignore Limits
+
+```rust
+// ❌ WRONG - Unbounded growth
+feed.entries.push(entry);
+
+// ✅ CORRECT - Bounded with error handling
+if feed.entries.is_at_limit(limits.max_entries) {
+ feed.bozo = true;
+ feed.bozo_exception = Some(format!("Entry limit exceeded: {}", limits.max_entries));
+ skip_element(reader, buf, limits, depth)?;
+} else {
+ feed.entries.push(entry);
+}
+```
+
+## References
+
+- RSS 2.0 Spec: https://www.rssboard.org/rss-specification
+- RSS 1.0 Spec: https://web.resource.org/rss/1.0/spec
+- Atom 1.0 (RFC 4287): https://www.rfc-editor.org/rfc/rfc4287
+- JSON Feed: https://www.jsonfeed.org/version/1.1/
+- quick-xml docs: https://docs.rs/quick-xml/latest/quick_xml/
diff --git a/.github/instructions/python-bindings.instructions.md b/.github/instructions/python-bindings.instructions.md
new file mode 100644
index 0000000..754eb01
--- /dev/null
+++ b/.github/instructions/python-bindings.instructions.md
@@ -0,0 +1,581 @@
+# Python Bindings Instructions
+
+**Applies to:** `crates/feedparser-rs-py/**`
+
+## Mission-Critical: API Compatibility
+
+These bindings MUST be a drop-in replacement for Python's `feedparser` library. Every function, class, attribute, and return type must match exactly.
+
+**Target API:**
+```python
+import feedparser
+
+# Parse from various sources
+d = feedparser.parse('https://example.com/feed.xml')
+d = feedparser.parse(open('feed.xml').read())
+d = feedparser.parse(b'...')
+
+# Access fields (these names are MANDATORY)
+d.version # 'rss20', 'atom10', etc.
+d.bozo # True/False
+d.bozo_exception # String or None
+d.encoding # 'utf-8', etc.
+d.feed.title # Feed title
+d.feed.link # Feed link
+d.entries[0].title # Entry title
+d.entries[0].published_parsed # time.struct_time (NOT DateTime!)
+```
+
+## PyO3 Fundamentals
+
+### Module Setup
+
+**Located in:** `src/lib.rs`
+
+```rust
+use pyo3::prelude::*;
+
+#[pymodule]
+fn feedparser_rs(m: &Bound<'_, PyModule>) -> PyResult<()> {
+ m.add_function(wrap_pyfunction!(parse, m)?)?;
+ m.add_function(wrap_pyfunction!(parse_url, m)?)?;
+ m.add_class::()?;
+ m.add_class::()?;
+ m.add_class::()?;
+ // ... other classes
+ Ok(())
+}
+```
+
+**Module name MUST be `feedparser_rs`** (matches PyPI package name)
+
+### Main Parse Function
+
+```rust
+/// Parse an RSS/Atom feed from bytes, string, or URL
+#[pyfunction]
+#[pyo3(signature = (source, /))]
+pub fn parse(py: Python<'_>, source: &Bound<'_, PyAny>) -> PyResult {
+ let data: Vec = if let Ok(s) = source.extract::() {
+ // String - could be URL or XML content
+ if s.starts_with("http://") || s.starts_with("https://") {
+ // HTTP fetching (if feature enabled)
+ #[cfg(feature = "http")]
+ {
+ return parse_url_impl(py, &s);
+ }
+ #[cfg(not(feature = "http"))]
+ {
+ return Err(PyNotImplementedError::new_err(
+ "URL fetching not enabled. Install with 'pip install feedparser-rs[http]'"
+ ));
+ }
+ }
+ s.into_bytes()
+ } else if let Ok(b) = source.extract::>() {
+ b
+ } else {
+ return Err(PyTypeError::new_err("source must be str or bytes"));
+ };
+
+ let result = feedparser_rs_core::parse(&data)
+ .map_err(|e| PyValueError::new_err(e.to_string()))?;
+
+ Ok(PyParsedFeed::from(result))
+}
+```
+
+**Rules:**
+1. Accept `str` (URL or XML) and `bytes`
+2. Return `PyResult` (never panic)
+3. Use `PyValueError` for parsing errors (not `RuntimeError`)
+
+## Python Class Mapping
+
+### ParsedFeed (FeedParserDict)
+
+**Located in:** `src/types/parsed_feed.rs`
+
+```rust
+/// Main parsing result (equivalent to feedparser.FeedParserDict)
+#[pyclass(name = "FeedParserDict")]
+#[derive(Clone)]
+pub struct PyParsedFeed {
+ inner: Arc, // Use Arc for cheap clones
+}
+
+#[pymethods]
+impl PyParsedFeed {
+ #[getter]
+ fn feed(&self) -> PyFeedMeta {
+ PyFeedMeta {
+ inner: Arc::clone(&self.inner.feed),
+ }
+ }
+
+ #[getter]
+ fn entries(&self) -> Vec {
+ self.inner
+ .entries
+ .iter()
+ .map(|e| PyEntry {
+ inner: Arc::new(e.clone()),
+ })
+ .collect()
+ }
+
+ #[getter]
+ fn bozo(&self) -> bool {
+ self.inner.bozo
+ }
+
+ #[getter]
+ fn bozo_exception(&self) -> Option {
+ self.inner.bozo_exception.clone()
+ }
+
+ #[getter]
+ fn encoding(&self) -> &str {
+ &self.inner.encoding
+ }
+
+ #[getter]
+ fn version(&self) -> &str {
+ self.inner.version.as_str() // Returns "rss20", "atom10", etc.
+ }
+
+ #[getter]
+ fn namespaces(&self) -> HashMap {
+ self.inner.namespaces.clone()
+ }
+
+ // Python repr for debugging
+ fn __repr__(&self) -> String {
+ format!(
+ "FeedParserDict(version={:?}, bozo={}, entries={})",
+ self.version(),
+ self.bozo(),
+ self.entries().len()
+ )
+ }
+}
+```
+
+**CRITICAL**: Class name MUST be `"FeedParserDict"` (matches Python feedparser)
+
+### FeedMeta
+
+**Located in:** `src/types/feed_meta.rs`
+
+```rust
+#[pyclass]
+#[derive(Clone)]
+pub struct PyFeedMeta {
+ inner: Arc,
+}
+
+#[pymethods]
+impl PyFeedMeta {
+ #[getter]
+ fn title(&self) -> Option<&str> {
+ self.inner.title.as_deref()
+ }
+
+ #[getter]
+ fn link(&self) -> Option<&str> {
+ self.inner.link.as_deref()
+ }
+
+ #[getter]
+ fn subtitle(&self) -> Option<&str> {
+ self.inner.subtitle.as_deref()
+ }
+
+ #[getter]
+ fn language(&self) -> Option<&str> {
+ self.inner.language.as_deref()
+ }
+
+ // ... other getters
+}
+```
+
+**Rules:**
+1. All getters return Python-compatible types (`Option<&str>`, not `Option`)
+2. Use `as_deref()` for `Option` → `Option<&str>` conversion
+3. Clone only when necessary (prefer references)
+
+## Date Conversion (CRITICAL)
+
+### time.struct_time Requirement
+
+Python feedparser returns `time.struct_time` for `*_parsed` fields. This is MANDATORY for compatibility.
+
+```rust
+use pyo3::types::PyTuple;
+
+#[pymethods]
+impl PyEntry {
+ #[getter]
+ fn published(&self) -> Option {
+ self.inner.published.map(|dt| dt.to_rfc3339())
+ }
+
+ #[getter]
+ fn published_parsed(&self, py: Python<'_>) -> PyResult
+
+ ]]>
+
+
+ "#;
+
+ let feed = parse_rss20(xml).unwrap();
+ let entry = &feed.entries[0];
+
+ // Summary should be plain description
+ assert_eq!(entry.summary.as_deref(), Some("Plain text summary"));
+
+ // Content should contain the HTML
+ assert_eq!(entry.content.len(), 1);
+ assert!(
+ entry.content[0]
+ .value
+ .contains("HTML content")
+ );
+ assert!(entry.content[0].value.contains("