Skip to content

Commit 19a2b79

Browse files
authored
Merge pull request #51 from cardmagic/feature/url-fetching
feat: add URL fetching with conditional GET support
2 parents cd6a8dd + 5ef7763 commit 19a2b79

File tree

7 files changed

+497
-51
lines changed

7 files changed

+497
-51
lines changed

.rubocop.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Metrics/PerceivedComplexity:
6565

6666
# Class length for SimpleRSS (single-file library)
6767
Metrics/ClassLength:
68-
Max: 310
68+
Max: 360
6969

7070
# Frozen string literal is optional for this gem
7171
Style/FrozenStringLiteralComment:

README.markdown

Lines changed: 0 additions & 47 deletions
This file was deleted.

README.md

Lines changed: 292 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
# SimpleRSS
2+
3+
[![Gem Version](https://badge.fury.io/rb/simple-rss.svg)](https://badge.fury.io/rb/simple-rss)
4+
[![CI](https://github.com/cardmagic/simple-rss/actions/workflows/ci.yml/badge.svg)](https://github.com/cardmagic/simple-rss/actions/workflows/ci.yml)
5+
[![License: LGPL](https://img.shields.io/badge/License-LGPL-blue.svg)](https://opensource.org/licenses/LGPL-3.0)
6+
7+
A simple, flexible, extensible, and liberal RSS and Atom reader for Ruby. Designed to be backwards compatible with Ruby's standard RSS parser while handling malformed feeds gracefully.
8+
9+
## Features
10+
11+
- Parses both RSS and Atom feeds
12+
- Tolerant of malformed XML (regex-based parsing)
13+
- Built-in URL fetching with conditional GET support (ETags, Last-Modified)
14+
- JSON and XML serialization
15+
- Extensible tag definitions
16+
- Zero runtime dependencies
17+
18+
## What's New in 2.0
19+
20+
Version 2.0 is a major update with powerful new capabilities:
21+
22+
- **URL Fetching** - One-liner feed fetching with `SimpleRSS.fetch(url)`. Supports timeouts, custom headers, and automatic redirect following.
23+
24+
- **Conditional GET** - Bandwidth-efficient polling with ETag and Last-Modified support. Returns `nil` when feeds haven't changed (304 Not Modified).
25+
26+
- **JSON Serialization** - Export feeds with `to_json`, `to_hash`, and Rails-compatible `as_json`. Time objects serialize to ISO 8601.
27+
28+
- **XML Serialization** - Convert any parsed feed to clean RSS 2.0 or Atom XML with `to_xml(format: :rss2)` or `to_xml(format: :atom)`.
29+
30+
- **Array Tags** - Collect all occurrences of a tag (like multiple categories) with the `array_tags:` option.
31+
32+
- **Attribute Parsing** - Extract attributes from feed, item, and media tags using the `tag#attr` syntax.
33+
34+
- **UTF-8 Normalization** - All parsed content is automatically normalized to UTF-8 encoding.
35+
36+
- **Modern Ruby** - Full compatibility with Ruby 3.1 through 4.0, with RBS type annotations and Steep type checking.
37+
38+
## Installation
39+
40+
Add to your Gemfile:
41+
42+
```ruby
43+
gem "simple-rss"
44+
```
45+
46+
Or install directly:
47+
48+
```bash
49+
gem install simple-rss
50+
```
51+
52+
## Quick Start
53+
54+
```ruby
55+
require "simple-rss"
56+
require "uri"
57+
require "net/http"
58+
59+
# Parse from a string or IO object
60+
xml = Net::HTTP.get(URI("https://example.com/feed.xml"))
61+
rss = SimpleRSS.parse(xml)
62+
63+
rss.channel.title # => "Example Feed"
64+
rss.items.first.title # => "First Post"
65+
rss.items.first.pubDate # => 2024-01-15 12:00:00 -0500 (Time object)
66+
```
67+
68+
## Usage
69+
70+
### Fetching Feeds
71+
72+
SimpleRSS includes a built-in fetcher with conditional GET support for efficient polling:
73+
74+
```ruby
75+
# Simple fetch
76+
feed = SimpleRSS.fetch("https://example.com/feed.xml")
77+
78+
# With timeout
79+
feed = SimpleRSS.fetch("https://example.com/feed.xml", timeout: 10)
80+
81+
# Conditional GET - only download if modified
82+
feed = SimpleRSS.fetch("https://example.com/feed.xml")
83+
# Store these for next request
84+
etag = feed.etag
85+
last_modified = feed.last_modified
86+
87+
# On subsequent requests, pass the stored values
88+
feed = SimpleRSS.fetch(
89+
"https://example.com/feed.xml",
90+
etag:,
91+
last_modified:
92+
)
93+
# Returns nil if feed hasn't changed (304 Not Modified)
94+
```
95+
96+
### Accessing Feed Data
97+
98+
SimpleRSS provides both RSS and Atom style accessors:
99+
100+
```ruby
101+
feed = SimpleRSS.parse(xml)
102+
103+
# RSS style
104+
feed.channel.title
105+
feed.channel.link
106+
feed.channel.description
107+
feed.items
108+
109+
# Atom style (aliases)
110+
feed.feed.title
111+
feed.entries
112+
```
113+
114+
### Item Attributes
115+
116+
Items support both hash and method access:
117+
118+
```ruby
119+
item = feed.items.first
120+
121+
# Hash access
122+
item[:title]
123+
item[:link]
124+
item[:pubDate]
125+
126+
# Method access
127+
item.title
128+
item.link
129+
item.pubDate
130+
```
131+
132+
Date fields are automatically parsed into `Time` objects:
133+
134+
```ruby
135+
item.pubDate.class # => Time
136+
item.pubDate.year # => 2024
137+
```
138+
139+
### JSON Serialization
140+
141+
```ruby
142+
feed = SimpleRSS.parse(xml)
143+
144+
# Get as hash
145+
feed.to_hash
146+
# => { title: "Feed Title", link: "...", items: [...] }
147+
148+
# Get as JSON string
149+
feed.to_json
150+
# => '{"title":"Feed Title","link":"...","items":[...]}'
151+
152+
# Works with Rails/ActiveSupport
153+
feed.as_json
154+
```
155+
156+
### XML Serialization
157+
158+
Convert parsed feeds to standard RSS 2.0 or Atom format:
159+
160+
```ruby
161+
feed = SimpleRSS.parse(xml)
162+
163+
# Convert to RSS 2.0
164+
feed.to_xml(format: :rss2)
165+
166+
# Convert to Atom
167+
feed.to_xml(format: :atom)
168+
```
169+
170+
### Extending Tag Support
171+
172+
Add support for custom or non-standard tags:
173+
174+
```ruby
175+
# Add a new feed-level tag
176+
SimpleRSS.feed_tags << :custom_tag
177+
178+
# Add item-level tags
179+
SimpleRSS.item_tags << :custom_item_tag
180+
181+
# Parse tags with specific rel attributes (common in Atom)
182+
SimpleRSS.item_tags << :"link+enclosure"
183+
# Accessible as: item.link_enclosure
184+
185+
# Parse tag attributes
186+
SimpleRSS.item_tags << :"media:content#url"
187+
# Accessible as: item.media_content_url
188+
189+
# Parse item/entry attributes
190+
SimpleRSS.item_tags << :"entry#xml:lang"
191+
# Accessible as: item.entry_xml_lang
192+
```
193+
194+
#### Tag Syntax Reference
195+
196+
| Syntax | Example | Accessor | Description |
197+
|--------|---------|----------|-------------|
198+
| `tag` | `:title` | `.title` | Simple element content |
199+
| `tag#attr` | `:"media:content#url"` | `.media_content_url` | Attribute value |
200+
| `tag+rel` | `:"link+alternate"` | `.link_alternate` | Element with specific `rel` attribute |
201+
202+
### Collecting Multiple Values
203+
204+
By default, SimpleRSS returns only the first occurrence of each tag. To collect all values:
205+
206+
```ruby
207+
# Collect all categories for each item
208+
feed = SimpleRSS.parse(xml, array_tags: [:category])
209+
210+
item.category # => ["tech", "programming", "ruby"]
211+
```
212+
213+
## API Reference
214+
215+
### `SimpleRSS.parse(source, options = {})`
216+
217+
Parse RSS/Atom content from a string or IO object.
218+
219+
**Parameters:**
220+
- `source` - String or IO object containing feed XML
221+
- `options` - Hash of options
222+
- `:array_tags` - Array of tag symbols to collect as arrays
223+
224+
**Returns:** `SimpleRSS` instance
225+
226+
### `SimpleRSS.fetch(url, options = {})`
227+
228+
Fetch and parse a feed from a URL.
229+
230+
**Parameters:**
231+
- `url` - Feed URL string
232+
- `options` - Hash of options
233+
- `:timeout` - Request timeout in seconds
234+
- `:etag` - ETag from previous request (for conditional GET)
235+
- `:last_modified` - Last-Modified header from previous request
236+
- `:follow_redirects` - Follow redirects (default: true)
237+
- `:headers` - Hash of additional HTTP headers
238+
239+
**Returns:** `SimpleRSS` instance, or `nil` if 304 Not Modified
240+
241+
### Instance Methods
242+
243+
| Method | Description |
244+
|--------|-------------|
245+
| `#channel` / `#feed` | Returns self (for RSS/Atom style access) |
246+
| `#items` / `#entries` | Array of parsed items |
247+
| `#to_json` | JSON string representation |
248+
| `#to_hash` / `#as_json` | Hash representation |
249+
| `#to_xml(format:)` | XML string (`:rss2` or `:atom`) |
250+
| `#etag` | ETag header from fetch (if applicable) |
251+
| `#last_modified` | Last-Modified header from fetch (if applicable) |
252+
| `#source` | Original source XML string |
253+
254+
## Compatibility
255+
256+
- Ruby 3.1+
257+
- No runtime dependencies
258+
259+
## Development
260+
261+
```bash
262+
# Run tests
263+
bundle exec rake test
264+
265+
# Run linter
266+
bundle exec rubocop
267+
268+
# Type checking
269+
bundle exec steep check
270+
271+
# Interactive console
272+
bundle exec rake console
273+
```
274+
275+
## Contributing
276+
277+
1. Fork the repository
278+
2. Create a feature branch (`git checkout -b feature/my-feature`)
279+
3. Make your changes with tests
280+
4. Ensure tests pass (`bundle exec rake test`)
281+
5. Submit a pull request
282+
283+
## Authors
284+
285+
- [Lucas Carlson](mailto:lucas@rufy.com)
286+
- [Herval Freire](mailto:hervalfreire@gmail.com)
287+
288+
Inspired by [Blagg](http://www.raelity.org/lang/perl/blagg) by Rael Dornfest.
289+
290+
## License
291+
292+
This library is released under the terms of the [GNU LGPL](LICENSE).

Steepfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@ target :lib do
77
library "time"
88
library "cgi"
99
library "json"
10+
library "uri"
11+
library "net-http"
1012
end

0 commit comments

Comments
 (0)