Skip to content

Commit e3d76ed

Browse files
committed
Refactor auto source configuration and improve request handling
- Simplified the auto source configuration in CONFIGURATION.md. - Removed obsolete allowed origin checks from feeds.rb. - Updated session storage usage in frontend tests and hooks. - Enhanced error handling for unsupported strategies in API endpoints. - Improved rate limiting responses in rack_attack.rb.
1 parent fe4e78d commit e3d76ed

File tree

13 files changed

+296
-247
lines changed

13 files changed

+296
-247
lines changed

CONFIGURATION.md

Lines changed: 11 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,9 @@
44

55
### Auto Source Configuration
66

7-
| Variable | Description | Default | Example |
8-
| ----------------------------- | -------------------------------------- | ----------------- | ----------------------------------------------------- |
9-
| `AUTO_SOURCE_ENABLED` | Enable auto source feature | `false` | `true` |
10-
| `AUTO_SOURCE_USERNAME` | Basic auth username | Required | `admin` |
11-
| `AUTO_SOURCE_PASSWORD` | Basic auth password | Required | `changeme` |
12-
| `AUTO_SOURCE_ALLOWED_ORIGINS` | Allowed request origins | Required | `localhost:3000,example.com` |
13-
| `AUTO_SOURCE_ALLOWED_URLS` | **URL whitelist for public instances** | `""` (allows all) | `https://github.com/*,https://news.ycombinator.com/*` |
7+
| Variable | Description | Default | Example |
8+
| --------------------- | -------------------------- | ------- | ------- |
9+
| `AUTO_SOURCE_ENABLED` | Enable auto source feature | `false` | `true` |
1410

1511
### Health Check Configuration
1612

@@ -27,41 +23,16 @@ Health check authentication relies on the `health-check` account defined in `con
2723
| `RUBY_PATH` | Path to Ruby executable | `ruby` | `/usr/bin/ruby` |
2824
| `APP_ROOT` | Application root directory | `.` | `/app` |
2925

30-
## URL Restriction Patterns
31-
32-
The `AUTO_SOURCE_ALLOWED_URLS` variable supports:
33-
34-
- **Exact URLs**: `https://example.com/news`
35-
- **Wildcard patterns**: `https://example.com/*` (matches any path)
36-
- **Domain patterns**: `https://*.example.com` (matches subdomains)
37-
- **Multiple patterns**: Comma-separated list
38-
39-
### Examples
40-
41-
```bash
42-
# Allow only specific sites
43-
AUTO_SOURCE_ALLOWED_URLS=https://github.com/*,https://news.ycombinator.com/*,https://example.com/news
44-
45-
# Allow all subdomains of a domain
46-
AUTO_SOURCE_ALLOWED_URLS=https://*.example.com/*
47-
48-
# Allow everything (for private instances)
49-
AUTO_SOURCE_ALLOWED_URLS=
50-
51-
# Block everything (disable auto source)
52-
AUTO_SOURCE_ENABLED=false
53-
```
54-
5526
## Security Considerations
5627

5728
### Public Instances
58-
- **Always set** `AUTO_SOURCE_ALLOWED_URLS` to restrict URLs
29+
- Define per-account `allowed_urls` in `config/feeds.yml`
5930
- Use strong authentication credentials
6031
- Monitor usage and set up rate limiting
6132
- Consider IP whitelisting for additional security
6233

6334
### Private Instances
64-
- Leave `AUTO_SOURCE_ALLOWED_URLS` empty to allow all URLs
35+
- Use `allowed_urls: ['*']` to allow all URLs for trusted accounts
6536
- Still use authentication to prevent unauthorized access
6637
- Consider network-level restrictions
6738

@@ -70,20 +41,20 @@ AUTO_SOURCE_ENABLED=false
7041
### Public Demo Instance
7142
```bash
7243
AUTO_SOURCE_ENABLED=true
73-
AUTO_SOURCE_USERNAME=demo
74-
AUTO_SOURCE_PASSWORD=secure_password
75-
AUTO_SOURCE_ALLOWED_URLS=https://github.com/*,https://news.ycombinator.com/*,https://example.com/*
7644
```
7745

7846
### Private Instance
7947
```bash
8048
AUTO_SOURCE_ENABLED=true
81-
AUTO_SOURCE_USERNAME=admin
82-
AUTO_SOURCE_PASSWORD=very_secure_password
83-
AUTO_SOURCE_ALLOWED_URLS=
8449
```
8550

8651
### Disabled Auto Source
8752
```bash
8853
AUTO_SOURCE_ENABLED=false
8954
```
55+
56+
## Managing Accounts
57+
58+
Authentication for auto source is configured in `config/feeds.yml`. Define accounts with unique tokens and optional
59+
`allowed_urls` patterns to control which sites each token may access. Tokens are stored client-side in session storage,
60+
so treat them like sensitive credentials and rotate when necessary.

app/api/v1/feeds.rb

Lines changed: 45 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ def show(request, token)
2727

2828
def create(request)
2929
raise ForbiddenError, 'Auto source feature is disabled' unless AutoSource.enabled?
30-
raise ForbiddenError, 'Request origin not allowed' unless AutoSource.allowed_origin?(request)
3130

3231
account = authenticate_request(request)
3332
params = extract_create_params(request)
@@ -41,7 +40,6 @@ def create(request)
4140

4241
def handle_token_based_feed(request, token)
4342
raise ForbiddenError, 'Auto source feature is disabled' unless AutoSource.enabled?
44-
raise ForbiddenError, 'Request origin not allowed' unless AutoSource.allowed_origin?(request)
4543

4644
feed_token = validate_feed_token(token)
4745
account = get_account_for_token(feed_token)
@@ -76,7 +74,7 @@ def validate_account_access(account, url)
7674
end
7775

7876
def generate_feed_response(request, url)
79-
strategy = request.params['strategy'] || 'ssrf_filter'
77+
strategy = select_strategy(request.params['strategy'])
8078
rss_content = AutoSource.generate_feed_content(url, strategy)
8179

8280
request.response['Content-Type'] = 'application/xml'
@@ -96,10 +94,11 @@ def authenticate_request(request)
9694

9795
def extract_create_params(request)
9896
url = request.params['url']
97+
strategy = select_strategy(request.params['strategy'])
9998
{
10099
url: url,
101100
name: request.params['name'] || extract_site_title(url),
102-
strategy: request.params['strategy'] || 'ssrf_filter'
101+
strategy: strategy
103102
}
104103
end
105104

@@ -111,19 +110,49 @@ def validate_create_params(params, account)
111110

112111
def build_create_response(request, feed_data)
113112
request.response['Content-Type'] = 'application/json'
114-
{ success: true, data: { feed: {
115-
id: feed_data[:id],
116-
name: feed_data[:name],
117-
url: feed_data[:url],
118-
strategy: feed_data[:strategy],
119-
public_url: feed_data[:public_url],
120-
created_at: Time.now.iso8601,
121-
updated_at: Time.now.iso8601
122-
} }, meta: { created: true } }
123-
end
124-
module_function :extract_create_params, :validate_create_params, :build_create_response, :authenticate_request
113+
request.response.status = 201
114+
feed_response_payload(feed_data)
115+
end
116+
117+
def select_strategy(raw_strategy)
118+
strategy = raw_strategy.to_s.strip
119+
strategy = default_strategy if strategy.empty?
120+
121+
raise BadRequestError, 'Unsupported strategy' unless supported_strategies.include?(strategy)
122+
123+
strategy
124+
end
125+
126+
def supported_strategies
127+
Html2rss::RequestService.strategy_names.map(&:to_s)
128+
end
129+
130+
def default_strategy
131+
Html2rss::RequestService.default_strategy_name.to_s
132+
end
133+
134+
def feed_response_payload(feed_data)
135+
{
136+
success: true,
137+
data: { feed: {
138+
id: feed_data[:id],
139+
name: feed_data[:name],
140+
url: feed_data[:url],
141+
strategy: feed_data[:strategy],
142+
public_url: feed_data[:public_url],
143+
created_at: Time.now.iso8601,
144+
updated_at: Time.now.iso8601
145+
} },
146+
meta: { created: true }
147+
}
148+
end
149+
150+
module_function :extract_create_params, :validate_create_params, :build_create_response,
151+
:authenticate_request, :select_strategy, :supported_strategies, :default_strategy,
152+
:feed_response_payload
125153
private_class_method :extract_create_params, :validate_create_params, :build_create_response,
126-
:authenticate_request
154+
:authenticate_request, :select_strategy, :supported_strategies, :default_strategy,
155+
:feed_response_payload
127156
end
128157
end
129158
end

app/auto_source.rb

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# frozen_string_literal: true
22

3-
require 'uri'
43
require_relative 'auth'
54
require_relative 'feed_generator'
65

@@ -28,25 +27,6 @@ def authenticate_with_token(request)
2827
Auth.authenticate(request)
2928
end
3029

31-
# @param request [Roda::Request]
32-
# @return [Boolean]
33-
def allowed_origin?(request)
34-
origin = request.env['HTTP_HOST'] || request.env['HTTP_X_FORWARDED_HOST']
35-
origins = allowed_origins
36-
origins.empty? || origins.include?(origin)
37-
end
38-
39-
# @return [Array<String>]
40-
def allowed_origins
41-
if development?
42-
default_origins = 'localhost:3000,localhost:3001,127.0.0.1:3000,127.0.0.1:3001'
43-
origins = ENV.fetch('AUTO_SOURCE_ALLOWED_ORIGINS', default_origins)
44-
else
45-
origins = ENV.fetch('AUTO_SOURCE_ALLOWED_ORIGINS', '')
46-
end
47-
origins.split(',').map(&:strip).reject(&:empty?)
48-
end
49-
5030
# @param token_data [Hash]
5131
# @param url [String]
5232
# @return [Boolean]

config/rack_attack.rb

Lines changed: 63 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,75 +1,84 @@
11
# frozen_string_literal: true
22

3+
require 'json'
34
require 'rack/attack'
45
require_relative '../app/security_logger'
56

67
# In-memory store (resets on restart)
78
# Note: In production, consider using Redis for persistent rate limiting
89
Rack::Attack.cache.store = {}
910

10-
# Whitelist health checks and internal IPs
11-
Rack::Attack.safelist('health-check') do |req|
12-
req.path.start_with?('/health', '/status')
13-
end
11+
STANDARD_WINDOW = 60
12+
STANDARD_LIMIT = 100
13+
TOKEN_LIMIT = 60
1414

15-
# Whitelist localhost in development
16-
Rack::Attack.safelist('localhost') do |req|
17-
%w[127.0.0.1 ::1].include?(req.ip) if ENV['RACK_ENV'] == 'development'
18-
end
15+
Rack::Attack.throttle('requests per ip', limit: STANDARD_LIMIT, period: STANDARD_WINDOW, &:ip)
1916

20-
# Rate limiting by IP
21-
Rack::Attack.throttle('requests per IP', limit: 100, period: 60) do |req|
22-
Html2rss::Web::SecurityLogger.log_rate_limit_exceeded(req.ip, req.path, 100) if req.env['rack.attack.throttle_data']
23-
req.ip
24-
end
17+
token_from_header = lambda do |req|
18+
header = req.get_header('HTTP_AUTHORIZATION')
19+
next unless header&.start_with?('Bearer ')
2520

26-
# Rate limiting for API endpoints
27-
Rack::Attack.throttle('api requests per IP', limit: 200, period: 60) do |req|
28-
if req.path.start_with?('/api/')
29-
Html2rss::Web::SecurityLogger.log_rate_limit_exceeded(req.ip, req.path, 200) if req.env['rack.attack.throttle_data']
30-
req.ip
31-
end
21+
token = header.split(' ', 2)[1]&.strip
22+
token unless token.nil? || token.empty?
3223
end
3324

34-
# Rate limiting for API feed generation (more restrictive)
35-
Rack::Attack.throttle('api feed generation per IP', limit: 10, period: 60) do |req|
36-
if req.path.include?('/api/v1/feeds/') && req.params['token']
37-
Html2rss::Web::SecurityLogger.log_rate_limit_exceeded(req.ip, req.path, 10) if req.env['rack.attack.throttle_data']
38-
req.ip
39-
end
25+
token_from_path = lambda do |req|
26+
match = req.path.match(%r{^/api/v1/feeds/([^/]+)})
27+
match && match[1]
4028
end
4129

42-
# Block suspicious patterns
43-
Rack::Attack.blocklist('block bad user agents') do |req|
44-
if req.user_agent&.match?(/bot|crawler|spider/i) && !req.user_agent&.match?(/googlebot|bingbot/i)
45-
Html2rss::Web::SecurityLogger.log_blocked_request(req.ip, 'suspicious_user_agent', req.path)
46-
true
47-
end
30+
Rack::Attack.throttle('requests per token', limit: TOKEN_LIMIT, period: STANDARD_WINDOW) do |req|
31+
token_from_header.call(req) || token_from_path.call(req)
4832
end
4933

50-
# Custom responses with proper headers
51-
Rack::Attack.throttled_response = lambda do |_env|
52-
retry_after = 60
53-
[
54-
429,
55-
{
56-
'Content-Type' => 'application/xml',
57-
'Retry-After' => retry_after.to_s,
58-
'X-RateLimit-Limit' => '100',
59-
'X-RateLimit-Remaining' => '0',
60-
'X-RateLimit-Reset' => (Time.now + retry_after).to_i.to_s
61-
},
62-
['<rss><channel><title>Rate Limited</title><description>Too many requests. ' \
63-
'Please try again later.</description></channel></rss>']
64-
]
34+
Rack::Attack.throttled_response = lambda do |env|
35+
Html2rss::Web::RackAttackResponse.call(env)
6536
end
6637

67-
# Track blocked requests for monitoring
68-
Rack::Attack.blocklisted_response = lambda do |_env|
69-
[
70-
403,
71-
{ 'Content-Type' => 'application/xml' },
72-
['<rss><channel><title>Access Denied</title><description>Request blocked by ' \
73-
'security policy.</description></channel></rss>']
74-
]
38+
module Html2rss
39+
module Web
40+
module RackAttackResponse
41+
module_function
42+
43+
def call(env)
44+
request = Rack::Request.new(env)
45+
match_data = env['rack.attack.match_data'] || {}
46+
limit = match_data[:limit] || STANDARD_LIMIT
47+
48+
Html2rss::Web::SecurityLogger.log_rate_limit_exceeded(request.ip, request.path, limit)
49+
50+
retry_after = STANDARD_WINDOW
51+
return api_response(retry_after) if request.path.start_with?('/api/')
52+
53+
text_response(retry_after)
54+
end
55+
56+
def api_response(retry_after)
57+
body = {
58+
success: false,
59+
error: { code: 'TOO_MANY_REQUESTS', message: 'Too many requests. Please try again later.' }
60+
}.to_json
61+
62+
[
63+
429,
64+
{
65+
'Content-Type' => 'application/json',
66+
'Retry-After' => retry_after.to_s
67+
},
68+
[body]
69+
]
70+
end
71+
72+
def text_response(retry_after)
73+
[
74+
429,
75+
{
76+
'Content-Type' => 'text/plain',
77+
'Retry-After' => retry_after.to_s
78+
},
79+
['Too many requests. Please try again later.']
80+
]
81+
end
82+
end
83+
end
7584
end

frontend/playwright.config.ts

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,6 @@ export default defineConfig({
2121
...process.env,
2222
RACK_ENV: 'test',
2323
AUTO_SOURCE_ENABLED: 'true',
24-
AUTO_SOURCE_USERNAME: 'admin',
25-
AUTO_SOURCE_PASSWORD: 'changeme',
26-
AUTO_SOURCE_ALLOWED_ORIGINS: '127.0.0.1:3000,localhost:3000',
27-
AUTO_SOURCE_ALLOWED_URLS: 'https://example.com/*,https://test.com/*',
2824
HEALTH_CHECK_TOKEN: 'health-check-token-xyz789',
2925
HTML2RSS_SECRET_KEY: process.env.HTML2RSS_SECRET_KEY ?? 'test-secret-key-for-smoke',
3026
},

frontend/src/__tests__/App.contract.test.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ describe('App contract', () => {
99
const token = 'contract-token';
1010

1111
const authenticate = () => {
12-
window.localStorage.setItem('html2rss_username', username);
13-
window.localStorage.setItem('html2rss_token', token);
12+
window.sessionStorage.setItem('html2rss_username', username);
13+
window.sessionStorage.setItem('html2rss_token', token);
1414
};
1515

1616
it('shows feed result when API responds with success', async () => {

frontend/src/__tests__/App.test.tsx

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,5 +108,4 @@ describe('App', () => {
108108
expect(screen.getByText('❌ Error')).toBeInTheDocument();
109109
expect(screen.getByText('Access Denied')).toBeInTheDocument();
110110
});
111-
112111
});

frontend/src/__tests__/DemoButtons.test.tsx

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,4 @@ describe('DemoButtons', () => {
2828
expect(mockOnConvert).toHaveBeenCalledWith('https://www.chip.de/testberichte');
2929
});
3030
});
31-
3231
});

0 commit comments

Comments
 (0)