-
Notifications
You must be signed in to change notification settings - Fork 95
Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
SCRAM authentication uses PBKDF2 with 4096 iterations, which takes 50-80ms of CPU time. Running this on the async runtime blocks worker threads and can cause DoS under load. This change wraps the hash_password() call in tokio::task::block_in_place() to move the expensive computation to Tokio's blocking thread pool.
Implements rate limiting to prevent authentication spam attacks. Uses the governor crate to track attempts per IP address with a token bucket algorithm (10 attempts per minute by default). Rate-limited requests receive an authentication error and are logged.
Adds 'auth_rate_limit' config option (default: 10 attempts/minute). Can be configured via TOML or PGDOG_AUTH_RATE_LIMIT env variable.
The test would take 1 minute to complete, which is too slow for regular test runs. Left as commented code for manual testing.
Add documentation for the new auth_rate_limit setting in the example configuration file.
Adds comment explaining why block_in_place() is used instead of spawn_blocking() - the AuthenticationProvider trait is synchronous and cannot use .await.
The rate limiter was storing every IP address that ever connected in an unbounded HashMap, leading to memory growth over time. Replace HashMap with LRU cache (max 10k IPs) to prevent unbounded memory growth. When cache is full, least recently used IPs are evicted and start fresh if they reconnect.
Replace .unwrap() with .expect() on compile-time constants to provide clear panic messages if invariants are violated. Changes: - PBKDF2 iterations (always 4096) - MAX_TRACKED_IPS constant (always 10,000) Note: NonZeroU32::new(limit).unwrap_or() in rate_limit.rs is safe - it handles 0 values with the default, no unwrap that can panic.
Ensure auth_rate_limit config value is always >= 1: - Values of 0 are clamped to 1 (minimum viable rate limit) - Prevents accidental disabling of DoS protection - Added test to verify validation behavior This makes the contract explicit: rate limiting cannot be disabled via configuration.
Enhanced server-side logging when rate limit is exceeded: - Include user and database in log for better debugging - Add comment clarifying security decision - Client still receives generic auth error (no information leakage) Log format: "Authentication rate limit exceeded for IP: X, user: Y, database: Z" Client sees: "password for user Y and database Z is wrong, or the database does not exist" This provides operators with debugging context while preventing attackers from learning they triggered rate limiting.
Explain that governor crate uses DefaultClock (wall-clock time) which is not affected by Tokio's pause()/advance() time mocking. To support mocked time testing would require: - Custom clock implementation wrapping tokio::time - Making rate limiter generic over clock type - Significant refactoring Not worth the complexity since: - Governor crate itself is well-tested - Our tests verify correct integration - Core functionality (limiting, LRU) is already tested
Previously, the mutex was held during limiter.check(), serializing all authentication attempts and creating a performance bottleneck. Optimization: - Clone the Arc<RateLimiter> while holding the lock (cheap operation) - Release lock before calling check() - Allows concurrent authentication attempts from different IPs This is safe because: - Mutex protects LRU cache access only (fast) - Governor's RateLimiter is internally thread-safe (uses atomics) - Arc cloning is just a reference count increment (~1-2 CPU cycles) Performance impact: - Before: All auth attempts serialized - After: Concurrent auth attempts possible (parallel PBKDF2 + rate checks) Added documentation explaining thread safety guarantees.
Attackers with IPv6 /64 blocks can trivially bypass per-IP rate limiting by rotating through addresses in their allocated block. Most ISPs and cloud providers allocate /64 or /48 blocks to users. This change normalizes IPv6 addresses to their /64 prefix before rate limiting, effectively treating all addresses in the same /64 block as a single source. IPv4 addresses are used as-is since they're typically more granular. Tests verify: - IPv6 addresses in same /64 are normalized to same prefix - IPv6 addresses in different /64 are kept separate - IPv4 addresses pass through unchanged - Rate limiting correctly blocks entire /64 blocks
The unused dashmap = "6" dependency was causing a version conflict: - governor 0.6.3 depends on dashmap 5.5.3 - pgdog had unused dashmap = "6" dependency This resulted in both dashmap 5.5.3 and 6.1.0 in the dependency tree, increasing binary size and build time unnecessarily. Since dashmap is not used anywhere in the pgdog codebase, removing it entirely resolves the conflict.
Replaced manual HashMap+Mutex approach with governor's built-in keyed rate limiter. This eliminates mutex contention by using DashMap internally (sharded concurrent hash map). Benefits: - No global mutex contention on every auth attempt - Fine-grained per-IP locking via DashMap sharding - Simpler code (50% reduction) - Better concurrency under load - No need to manually manage LRU eviction The keyed limiter stores rate limit state per IP address with governor managing memory automatically via DashMap.
Wrapped rate limiter in Arc<RwLock<>> to allow runtime config updates. Previously the rate limit was read once at startup and never updated. Changes: - rate_limit::reload() function to rebuild limiter with new config - Integrated into databases::reload() for SIGHUP/RELOAD command - Added test to verify reload functionality - Note: Reloading resets rate limit state (intentional tradeoff) Now auth_rate_limit config changes take effect via: - SIGHUP signal on Unix - RELOAD command in admin database
Tests now create their own rate limiters with explicit limits rather than depending on config defaults. This makes tests: - Independent of PGDOG_AUTH_RATE_LIMIT env var - Faster (no config loading) - More robust (explicit test parameters) Changes: - Added check_with_limiter() internal function - Added test_limiter() helper for tests - Tests use unique IPs to avoid interference - Added edge case tests (limit=1, limit=3) Tests now work regardless of environment configuration.
Governor's DefaultKeyedStateStore (DashMap) grows indefinitely as new IPs are encountered. Reset the limiter every hour to clear accumulated state and prevent memory exhaustion. Trade-off: Brief window where rate-limited IPs get fresh quota, but this is acceptable for defense-in-depth and better than unbounded growth.
// Move expensive PBKDF2 computation to blocking thread pool | ||
// to avoid blocking the async runtime. | ||
// Note: Using block_in_place() because AuthenticationProvider trait is synchronous. | ||
let hash = tokio::task::block_in_place(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ... better than what was there, but ideally we run SCRAM in a background thread overall. Requires a bigger refactor which we don't have to do now.
}; | ||
|
||
if let Some(addr) = *stream.peer_addr() { | ||
if !rate_limit::check(addr.ip()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be optional. Default should be unlimited. It's completely reasonable to receive thousands of these per second in a internal setting, especially during incidents when databases go down and all apps reconnect.
if let Some(addr) = *stream.peer_addr() { | ||
if !rate_limit::check(addr.ip()) { | ||
error!( | ||
"Authentication rate limit exceeded for IP: {}, user: \"{}\", database: \"{}\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the unit? Per minute/second?
This PR addresses SCRAM authentication DoS vulnerabilities by fixing async runtime blocking and adding IP-based rate limiting.
Changes
1. Fix SCRAM PBKDF2 Blocking Async Runtime
2. Add Per-IP Authentication Rate Limiting (new feature)
governor
crateConfiguration
Follow-up Work
(this would be a separate fix)
SCRAM verifier caching (pre-existing issue): The current SCRAM implementation (pre-existed before this PR) generates a new random salt on every authentication attempt, which violates the SCRAM-SHA-256 protocol design and wastes ~50-80ms of PBKDF2 computation per auth. Possible solution: pre-compute verifiers (salt, iterations, stored_key, server_key) at user creation time and store them in the config file, similar to how PostgreSQL stores them in
pg_shadow
.