|
| 1 | +# SimpleRSS |
| 2 | + |
| 3 | +[](https://badge.fury.io/rb/simple-rss) |
| 4 | +[](https://github.com/cardmagic/simple-rss/actions/workflows/ci.yml) |
| 5 | +[](https://opensource.org/licenses/LGPL-3.0) |
| 6 | + |
| 7 | +A simple, flexible, extensible, and liberal RSS and Atom reader for Ruby. Designed to be backwards compatible with Ruby's standard RSS parser while handling malformed feeds gracefully. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- Parses both RSS and Atom feeds |
| 12 | +- Tolerant of malformed XML (regex-based parsing) |
| 13 | +- Built-in URL fetching with conditional GET support (ETags, Last-Modified) |
| 14 | +- JSON and XML serialization |
| 15 | +- Extensible tag definitions |
| 16 | +- Zero runtime dependencies |
| 17 | + |
| 18 | +## What's New in 2.0 |
| 19 | + |
| 20 | +Version 2.0 is a major update with powerful new capabilities: |
| 21 | + |
| 22 | +- **URL Fetching** - One-liner feed fetching with `SimpleRSS.fetch(url)`. Supports timeouts, custom headers, and automatic redirect following. |
| 23 | + |
| 24 | +- **Conditional GET** - Bandwidth-efficient polling with ETag and Last-Modified support. Returns `nil` when feeds haven't changed (304 Not Modified). |
| 25 | + |
| 26 | +- **JSON Serialization** - Export feeds with `to_json`, `to_hash`, and Rails-compatible `as_json`. Time objects serialize to ISO 8601. |
| 27 | + |
| 28 | +- **XML Serialization** - Convert any parsed feed to clean RSS 2.0 or Atom XML with `to_xml(format: :rss2)` or `to_xml(format: :atom)`. |
| 29 | + |
| 30 | +- **Array Tags** - Collect all occurrences of a tag (like multiple categories) with the `array_tags:` option. |
| 31 | + |
| 32 | +- **Attribute Parsing** - Extract attributes from feed, item, and media tags using the `tag#attr` syntax. |
| 33 | + |
| 34 | +- **UTF-8 Normalization** - All parsed content is automatically normalized to UTF-8 encoding. |
| 35 | + |
| 36 | +- **Modern Ruby** - Full compatibility with Ruby 3.1 through 4.0, with RBS type annotations and Steep type checking. |
| 37 | + |
| 38 | +## Installation |
| 39 | + |
| 40 | +Add to your Gemfile: |
| 41 | + |
| 42 | +```ruby |
| 43 | +gem "simple-rss" |
| 44 | +``` |
| 45 | + |
| 46 | +Or install directly: |
| 47 | + |
| 48 | +```bash |
| 49 | +gem install simple-rss |
| 50 | +``` |
| 51 | + |
| 52 | +## Quick Start |
| 53 | + |
| 54 | +```ruby |
| 55 | +require "simple-rss" |
| 56 | +require "uri" |
| 57 | +require "net/http" |
| 58 | + |
| 59 | +# Parse from a string or IO object |
| 60 | +xml = Net::HTTP.get(URI("https://example.com/feed.xml")) |
| 61 | +rss = SimpleRSS.parse(xml) |
| 62 | + |
| 63 | +rss.channel.title # => "Example Feed" |
| 64 | +rss.items.first.title # => "First Post" |
| 65 | +rss.items.first.pubDate # => 2024-01-15 12:00:00 -0500 (Time object) |
| 66 | +``` |
| 67 | + |
| 68 | +## Usage |
| 69 | + |
| 70 | +### Fetching Feeds |
| 71 | + |
| 72 | +SimpleRSS includes a built-in fetcher with conditional GET support for efficient polling: |
| 73 | + |
| 74 | +```ruby |
| 75 | +# Simple fetch |
| 76 | +feed = SimpleRSS.fetch("https://example.com/feed.xml") |
| 77 | + |
| 78 | +# With timeout |
| 79 | +feed = SimpleRSS.fetch("https://example.com/feed.xml", timeout: 10) |
| 80 | + |
| 81 | +# Conditional GET - only download if modified |
| 82 | +feed = SimpleRSS.fetch("https://example.com/feed.xml") |
| 83 | +# Store these for next request |
| 84 | +etag = feed.etag |
| 85 | +last_modified = feed.last_modified |
| 86 | + |
| 87 | +# On subsequent requests, pass the stored values |
| 88 | +feed = SimpleRSS.fetch( |
| 89 | + "https://example.com/feed.xml", |
| 90 | + etag:, |
| 91 | + last_modified: |
| 92 | +) |
| 93 | +# Returns nil if feed hasn't changed (304 Not Modified) |
| 94 | +``` |
| 95 | + |
| 96 | +### Accessing Feed Data |
| 97 | + |
| 98 | +SimpleRSS provides both RSS and Atom style accessors: |
| 99 | + |
| 100 | +```ruby |
| 101 | +feed = SimpleRSS.parse(xml) |
| 102 | + |
| 103 | +# RSS style |
| 104 | +feed.channel.title |
| 105 | +feed.channel.link |
| 106 | +feed.channel.description |
| 107 | +feed.items |
| 108 | + |
| 109 | +# Atom style (aliases) |
| 110 | +feed.feed.title |
| 111 | +feed.entries |
| 112 | +``` |
| 113 | + |
| 114 | +### Item Attributes |
| 115 | + |
| 116 | +Items support both hash and method access: |
| 117 | + |
| 118 | +```ruby |
| 119 | +item = feed.items.first |
| 120 | + |
| 121 | +# Hash access |
| 122 | +item[:title] |
| 123 | +item[:link] |
| 124 | +item[:pubDate] |
| 125 | + |
| 126 | +# Method access |
| 127 | +item.title |
| 128 | +item.link |
| 129 | +item.pubDate |
| 130 | +``` |
| 131 | + |
| 132 | +Date fields are automatically parsed into `Time` objects: |
| 133 | + |
| 134 | +```ruby |
| 135 | +item.pubDate.class # => Time |
| 136 | +item.pubDate.year # => 2024 |
| 137 | +``` |
| 138 | + |
| 139 | +### JSON Serialization |
| 140 | + |
| 141 | +```ruby |
| 142 | +feed = SimpleRSS.parse(xml) |
| 143 | + |
| 144 | +# Get as hash |
| 145 | +feed.to_hash |
| 146 | +# => { title: "Feed Title", link: "...", items: [...] } |
| 147 | + |
| 148 | +# Get as JSON string |
| 149 | +feed.to_json |
| 150 | +# => '{"title":"Feed Title","link":"...","items":[...]}' |
| 151 | + |
| 152 | +# Works with Rails/ActiveSupport |
| 153 | +feed.as_json |
| 154 | +``` |
| 155 | + |
| 156 | +### XML Serialization |
| 157 | + |
| 158 | +Convert parsed feeds to standard RSS 2.0 or Atom format: |
| 159 | + |
| 160 | +```ruby |
| 161 | +feed = SimpleRSS.parse(xml) |
| 162 | + |
| 163 | +# Convert to RSS 2.0 |
| 164 | +feed.to_xml(format: :rss2) |
| 165 | + |
| 166 | +# Convert to Atom |
| 167 | +feed.to_xml(format: :atom) |
| 168 | +``` |
| 169 | + |
| 170 | +### Extending Tag Support |
| 171 | + |
| 172 | +Add support for custom or non-standard tags: |
| 173 | + |
| 174 | +```ruby |
| 175 | +# Add a new feed-level tag |
| 176 | +SimpleRSS.feed_tags << :custom_tag |
| 177 | + |
| 178 | +# Add item-level tags |
| 179 | +SimpleRSS.item_tags << :custom_item_tag |
| 180 | + |
| 181 | +# Parse tags with specific rel attributes (common in Atom) |
| 182 | +SimpleRSS.item_tags << :"link+enclosure" |
| 183 | +# Accessible as: item.link_enclosure |
| 184 | + |
| 185 | +# Parse tag attributes |
| 186 | +SimpleRSS.item_tags << :"media:content#url" |
| 187 | +# Accessible as: item.media_content_url |
| 188 | + |
| 189 | +# Parse item/entry attributes |
| 190 | +SimpleRSS.item_tags << :"entry#xml:lang" |
| 191 | +# Accessible as: item.entry_xml_lang |
| 192 | +``` |
| 193 | + |
| 194 | +#### Tag Syntax Reference |
| 195 | + |
| 196 | +| Syntax | Example | Accessor | Description | |
| 197 | +|--------|---------|----------|-------------| |
| 198 | +| `tag` | `:title` | `.title` | Simple element content | |
| 199 | +| `tag#attr` | `:"media:content#url"` | `.media_content_url` | Attribute value | |
| 200 | +| `tag+rel` | `:"link+alternate"` | `.link_alternate` | Element with specific `rel` attribute | |
| 201 | + |
| 202 | +### Collecting Multiple Values |
| 203 | + |
| 204 | +By default, SimpleRSS returns only the first occurrence of each tag. To collect all values: |
| 205 | + |
| 206 | +```ruby |
| 207 | +# Collect all categories for each item |
| 208 | +feed = SimpleRSS.parse(xml, array_tags: [:category]) |
| 209 | + |
| 210 | +item.category # => ["tech", "programming", "ruby"] |
| 211 | +``` |
| 212 | + |
| 213 | +## API Reference |
| 214 | + |
| 215 | +### `SimpleRSS.parse(source, options = {})` |
| 216 | + |
| 217 | +Parse RSS/Atom content from a string or IO object. |
| 218 | + |
| 219 | +**Parameters:** |
| 220 | +- `source` - String or IO object containing feed XML |
| 221 | +- `options` - Hash of options |
| 222 | + - `:array_tags` - Array of tag symbols to collect as arrays |
| 223 | + |
| 224 | +**Returns:** `SimpleRSS` instance |
| 225 | + |
| 226 | +### `SimpleRSS.fetch(url, options = {})` |
| 227 | + |
| 228 | +Fetch and parse a feed from a URL. |
| 229 | + |
| 230 | +**Parameters:** |
| 231 | +- `url` - Feed URL string |
| 232 | +- `options` - Hash of options |
| 233 | + - `:timeout` - Request timeout in seconds |
| 234 | + - `:etag` - ETag from previous request (for conditional GET) |
| 235 | + - `:last_modified` - Last-Modified header from previous request |
| 236 | + - `:follow_redirects` - Follow redirects (default: true) |
| 237 | + - `:headers` - Hash of additional HTTP headers |
| 238 | + |
| 239 | +**Returns:** `SimpleRSS` instance, or `nil` if 304 Not Modified |
| 240 | + |
| 241 | +### Instance Methods |
| 242 | + |
| 243 | +| Method | Description | |
| 244 | +|--------|-------------| |
| 245 | +| `#channel` / `#feed` | Returns self (for RSS/Atom style access) | |
| 246 | +| `#items` / `#entries` | Array of parsed items | |
| 247 | +| `#to_json` | JSON string representation | |
| 248 | +| `#to_hash` / `#as_json` | Hash representation | |
| 249 | +| `#to_xml(format:)` | XML string (`:rss2` or `:atom`) | |
| 250 | +| `#etag` | ETag header from fetch (if applicable) | |
| 251 | +| `#last_modified` | Last-Modified header from fetch (if applicable) | |
| 252 | +| `#source` | Original source XML string | |
| 253 | + |
| 254 | +## Compatibility |
| 255 | + |
| 256 | +- Ruby 3.1+ |
| 257 | +- No runtime dependencies |
| 258 | + |
| 259 | +## Development |
| 260 | + |
| 261 | +```bash |
| 262 | +# Run tests |
| 263 | +bundle exec rake test |
| 264 | + |
| 265 | +# Run linter |
| 266 | +bundle exec rubocop |
| 267 | + |
| 268 | +# Type checking |
| 269 | +bundle exec steep check |
| 270 | + |
| 271 | +# Interactive console |
| 272 | +bundle exec rake console |
| 273 | +``` |
| 274 | + |
| 275 | +## Contributing |
| 276 | + |
| 277 | +1. Fork the repository |
| 278 | +2. Create a feature branch (`git checkout -b feature/my-feature`) |
| 279 | +3. Make your changes with tests |
| 280 | +4. Ensure tests pass (`bundle exec rake test`) |
| 281 | +5. Submit a pull request |
| 282 | + |
| 283 | +## Authors |
| 284 | + |
| 285 | +- [Lucas Carlson](mailto:lucas@rufy.com) |
| 286 | +- [Herval Freire](mailto:hervalfreire@gmail.com) |
| 287 | + |
| 288 | +Inspired by [Blagg](http://www.raelity.org/lang/perl/blagg) by Rael Dornfest. |
| 289 | + |
| 290 | +## License |
| 291 | + |
| 292 | +This library is released under the terms of the [GNU LGPL](LICENSE). |
0 commit comments