Skip to content

Commit 8630ffb

Browse files
committed
.
Signed-off-by: Gil Desmarais <[email protected]>
1 parent 1341ab6 commit 8630ffb

File tree

8 files changed

+166
-172
lines changed

8 files changed

+166
-172
lines changed

configs-README.md

Lines changed: 0 additions & 87 deletions
This file was deleted.

html2rss-configs/how-to/index.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

html2rss-configs/how-to/usage-with-web.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

html2rss-configs/index.md

Lines changed: 166 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,173 @@
11
---
22
layout: default
33
title: html2rss-configs
4-
has_children: true
4+
has_children: false
55
nav_order: 5
66
---
77

8-
# html2rss-configs Documentation
8+
# Creating Feed Configurations
99

10-
This section provides comprehensive documentation for the `html2rss-configs` project, which contains feed configurations for various websites. You can find a list of all available configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).
10+
Welcome to the guide for `html2rss-configs`. This document explains how to create your own configuration files to convert any website into an RSS feed.
11+
12+
You can find a list of all community-contributed configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).
13+
14+
---
15+
16+
## Core Concepts
17+
18+
An `html2rss` config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: `channel` and `selectors`.
19+
20+
### The `channel` Block
21+
22+
The `channel` block contains metadata about the RSS feed itself, such as its title and the source URL.
23+
24+
**Example:**
25+
26+
```yaml
27+
channel:
28+
url: https://example.com/blog
29+
title: My Awesome Blog
30+
```
31+
32+
For a complete list of all available channel options, please see the [Channel Reference]({{ '/ruby-gem/reference/channel/' | relative_url }}).
33+
34+
### The `selectors` Block
35+
36+
The `selectors` block is the core of the configuration, defining the rules for extracting content. It always contains an `items` selector to identify the list of articles and individual selectors for the data points within each item (e.g., `title`, `link`).
37+
38+
**Example:**
39+
40+
```yaml
41+
selectors:
42+
items:
43+
selector: "article.post"
44+
title:
45+
selector: "h2 a"
46+
link:
47+
selector: "h2 a"
48+
```
49+
50+
For a comprehensive guide on all available selectors, extractors, and post-processors, please see the [Selectors Reference]({{ '/ruby-gem/reference/selectors/' | relative_url }}).
51+
52+
---
53+
54+
## Tutorial: Your First Config
55+
56+
This tutorial walks you through creating a basic configuration file from scratch.
57+
58+
### Step 1: Identify the Target Content
59+
60+
First, identify the HTML structure of the website you want to create a feed for. For this example, we'll use a simple blog structure:
61+
62+
```html
63+
<div class="posts">
64+
<article class="post">
65+
<h2><a href="/post/1">First Post</a></h2>
66+
<p>This is the summary of the first post.</p>
67+
</article>
68+
<article class="post">
69+
<h2><a href="/post/2">Second Post</a></h2>
70+
<p>This is the summary of the second post.</p>
71+
</article>
72+
</div>
73+
```
74+
75+
### Step 2: Create the Config File and Define the Channel
76+
77+
Create a new YAML file (e.g., `my-blog.yml`) and define the `channel`:
78+
79+
```yaml
80+
# my-blog.yml
81+
channel:
82+
url: https://example.com/blog
83+
title: My Awesome Blog
84+
description: The latest news from my awesome blog.
85+
```
86+
87+
### Step 3: Define the Selectors
88+
89+
Next, add the `selectors` block to extract the content for each post.
90+
91+
```yaml
92+
# my-blog.yml
93+
selectors:
94+
items:
95+
selector: "article.post"
96+
title:
97+
selector: "h2 a"
98+
link:
99+
selector: "h2 a"
100+
description:
101+
selector: "p"
102+
```
103+
104+
- `items`: This CSS selector identifies the container for each article.
105+
- `title`, `link`, `description`: These selectors target the specific data points within each item. For a `link` selector, `html2rss` defaults to extracting the `href` attribute from the matched `<a>` tag.
106+
107+
---
108+
109+
## Advanced Techniques
110+
111+
### Handling Pagination
112+
113+
To aggregate content from multiple pages, use the `pagination` option within the `items` selector.
114+
115+
```yaml
116+
selectors:
117+
items:
118+
selector: ".post-listing .post"
119+
pagination:
120+
selector: ".pagination .next-page"
121+
limit: 5 # Optional: sets the maximum number of pages to follow
122+
```
123+
124+
### Dynamic Feeds with Parameters
125+
126+
Use the `parameters` block to create flexible configs. This is useful for feeds based on search terms, categories, or regions.
127+
128+
```yaml
129+
# news-search.yml
130+
parameters:
131+
query:
132+
type: string
133+
default: "technology"
134+
135+
channel:
136+
url: "https://news.example.com/search?q={query}"
137+
title: "News results for '{query}'"
138+
```
139+
140+
---
141+
142+
## Contributing Your Config
143+
144+
Have you created a config that others might find useful? We strongly encourage you to contribute it to the project! By sharing your config, you make it available to all users of the public `html2rss-web` service and the Feed Directory.
145+
146+
To contribute, please [create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) to the `html2rss-configs` repository.
147+
148+
---
149+
150+
## Usage and Integration
151+
152+
### With `html2rss-web`
153+
154+
Once your pull request is reviewed and merged, your config will become available on the public [`html2rss-web`]({{ '/web-application/' | relative_url }}) instance. You can then access it at the path `/<domainname.tld/path>.rss`.
155+
156+
### Programmatic Usage in Ruby
157+
158+
You can also use `html2rss-configs` programmatically in your Ruby applications.
159+
160+
Add this to your Gemfile:
161+
162+
```ruby
163+
gem 'html2rss-configs', git: 'https://github.com/html2rss/html2rss-configs.git'
164+
```
165+
166+
And use it in your code:
167+
168+
```ruby
169+
require 'html2rss/configs'
170+
171+
config = Html2rss::Configs.find_by_name('domainname.tld/whatever')
172+
rss = Html2rss.feed(config)
173+
```

html2rss-configs/reference/ci-building.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

html2rss-configs/reference/index.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

html2rss-configs/reference/programmatic-usage.md

Lines changed: 0 additions & 27 deletions
This file was deleted.

html2rss-configs/tutorials/index.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

0 commit comments

Comments
 (0)