Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #520

NikolayS · 2025-09-30T04:10:19Z

This PR addresses SCRAM authentication DoS vulnerabilities by fixing async runtime blocking and adding IP-based rate limiting.

Changes

1. Fix SCRAM PBKDF2 Blocking Async Runtime

Moved PBKDF2 computation (4096 iterations, ~50-80ms) to blocking thread pool
Prevents attackers from exhausting async worker threads with authentication spam

2. Add Per-IP Authentication Rate Limiting (new feature)

Configurable rate limiting: 10 attempts per minute per IP by default
Token bucket algorithm using the governor crate
IPv6-aware: normalizes to /64 prefix to prevent bypass attacks
Memory-bounded: hourly state reset prevents unbounded growth

Configuration

# Authentication rate limit (attempts per minute per IP address)
# Default: 10
auth_rate_limit = 10

Follow-up Work

(this would be a separate fix)

SCRAM verifier caching (pre-existing issue): The current SCRAM implementation (pre-existed before this PR) generates a new random salt on every authentication attempt, which violates the SCRAM-SHA-256 protocol design and wastes ~50-80ms of PBKDF2 computation per auth. Possible solution: pre-compute verifiers (salt, iterations, stored_key, server_key) at user creation time and store them in the config file, similar to how PostgreSQL stores them in pg_shadow.

SCRAM authentication uses PBKDF2 with 4096 iterations, which takes 50-80ms of CPU time. Running this on the async runtime blocks worker threads and can cause DoS under load. This change wraps the hash_password() call in tokio::task::block_in_place() to move the expensive computation to Tokio's blocking thread pool.

Implements rate limiting to prevent authentication spam attacks. Uses the governor crate to track attempts per IP address with a token bucket algorithm (10 attempts per minute by default). Rate-limited requests receive an authentication error and are logged.

Adds 'auth_rate_limit' config option (default: 10 attempts/minute). Can be configured via TOML or PGDOG_AUTH_RATE_LIMIT env variable.

The test would take 1 minute to complete, which is too slow for regular test runs. Left as commented code for manual testing.

Add documentation for the new auth_rate_limit setting in the example configuration file.

Adds comment explaining why block_in_place() is used instead of spawn_blocking() - the AuthenticationProvider trait is synchronous and cannot use .await.

The rate limiter was storing every IP address that ever connected in an unbounded HashMap, leading to memory growth over time. Replace HashMap with LRU cache (max 10k IPs) to prevent unbounded memory growth. When cache is full, least recently used IPs are evicted and start fresh if they reconnect.

Replace .unwrap() with .expect() on compile-time constants to provide clear panic messages if invariants are violated. Changes: - PBKDF2 iterations (always 4096) - MAX_TRACKED_IPS constant (always 10,000) Note: NonZeroU32::new(limit).unwrap_or() in rate_limit.rs is safe - it handles 0 values with the default, no unwrap that can panic.

Ensure auth_rate_limit config value is always >= 1: - Values of 0 are clamped to 1 (minimum viable rate limit) - Prevents accidental disabling of DoS protection - Added test to verify validation behavior This makes the contract explicit: rate limiting cannot be disabled via configuration.

Enhanced server-side logging when rate limit is exceeded: - Include user and database in log for better debugging - Add comment clarifying security decision - Client still receives generic auth error (no information leakage) Log format: "Authentication rate limit exceeded for IP: X, user: Y, database: Z" Client sees: "password for user Y and database Z is wrong, or the database does not exist" This provides operators with debugging context while preventing attackers from learning they triggered rate limiting.

Explain that governor crate uses DefaultClock (wall-clock time) which is not affected by Tokio's pause()/advance() time mocking. To support mocked time testing would require: - Custom clock implementation wrapping tokio::time - Making rate limiter generic over clock type - Significant refactoring Not worth the complexity since: - Governor crate itself is well-tested - Our tests verify correct integration - Core functionality (limiting, LRU) is already tested

Previously, the mutex was held during limiter.check(), serializing all authentication attempts and creating a performance bottleneck. Optimization: - Clone the Arc<RateLimiter> while holding the lock (cheap operation) - Release lock before calling check() - Allows concurrent authentication attempts from different IPs This is safe because: - Mutex protects LRU cache access only (fast) - Governor's RateLimiter is internally thread-safe (uses atomics) - Arc cloning is just a reference count increment (~1-2 CPU cycles) Performance impact: - Before: All auth attempts serialized - After: Concurrent auth attempts possible (parallel PBKDF2 + rate checks) Added documentation explaining thread safety guarantees.

Attackers with IPv6 /64 blocks can trivially bypass per-IP rate limiting by rotating through addresses in their allocated block. Most ISPs and cloud providers allocate /64 or /48 blocks to users. This change normalizes IPv6 addresses to their /64 prefix before rate limiting, effectively treating all addresses in the same /64 block as a single source. IPv4 addresses are used as-is since they're typically more granular. Tests verify: - IPv6 addresses in same /64 are normalized to same prefix - IPv6 addresses in different /64 are kept separate - IPv4 addresses pass through unchanged - Rate limiting correctly blocks entire /64 blocks

The unused dashmap = "6" dependency was causing a version conflict: - governor 0.6.3 depends on dashmap 5.5.3 - pgdog had unused dashmap = "6" dependency This resulted in both dashmap 5.5.3 and 6.1.0 in the dependency tree, increasing binary size and build time unnecessarily. Since dashmap is not used anywhere in the pgdog codebase, removing it entirely resolves the conflict.

Replaced manual HashMap+Mutex approach with governor's built-in keyed rate limiter. This eliminates mutex contention by using DashMap internally (sharded concurrent hash map). Benefits: - No global mutex contention on every auth attempt - Fine-grained per-IP locking via DashMap sharding - Simpler code (50% reduction) - Better concurrency under load - No need to manually manage LRU eviction The keyed limiter stores rate limit state per IP address with governor managing memory automatically via DashMap.

Wrapped rate limiter in Arc<RwLock<>> to allow runtime config updates. Previously the rate limit was read once at startup and never updated. Changes: - rate_limit::reload() function to rebuild limiter with new config - Integrated into databases::reload() for SIGHUP/RELOAD command - Added test to verify reload functionality - Note: Reloading resets rate limit state (intentional tradeoff) Now auth_rate_limit config changes take effect via: - SIGHUP signal on Unix - RELOAD command in admin database

Tests now create their own rate limiters with explicit limits rather than depending on config defaults. This makes tests: - Independent of PGDOG_AUTH_RATE_LIMIT env var - Faster (no config loading) - More robust (explicit test parameters) Changes: - Added check_with_limiter() internal function - Added test_limiter() helper for tests - Tests use unique IPs to avoid interference - Added edge case tests (limit=1, limit=3) Tests now work regardless of environment configuration.

Governor's DefaultKeyedStateStore (DashMap) grows indefinitely as new IPs are encountered. Reset the limiter every hour to clear accumulated state and prevent memory exhaustion. Trade-off: Brief window where rate-limited IPs get fresh quota, but this is acceptable for defense-in-depth and better than unbounded growth.

CLAassistant · 2025-09-30T04:10:27Z

All committers have signed the CLA.

levkk · 2025-09-30T19:07:13Z

pgdog/src/auth/scram/server.rs

+        // Move expensive PBKDF2 computation to blocking thread pool
+        // to avoid blocking the async runtime.
+        // Note: Using block_in_place() because AuthenticationProvider trait is synchronous.
+        let hash = tokio::task::block_in_place(|| {


That's ... better than what was there, but ideally we run SCRAM in a background thread overall. Requires a bigger refactor which we don't have to do now.

levkk · 2025-09-30T19:13:15Z

pgdog/src/frontend/client/mod.rs

        };

+        if let Some(addr) = *stream.peer_addr() {
+            if !rate_limit::check(addr.ip()) {


This should be optional. Default should be unlimited. It's completely reasonable to receive thousands of these per second in a internal setting, especially during incidents when databases go down and all apps reconnect.

levkk · 2025-09-30T19:13:29Z

pgdog/src/frontend/client/mod.rs

+        if let Some(addr) = *stream.peer_addr() {
+            if !rate_limit::check(addr.ip()) {
+                error!(
+                    "Authentication rate limit exceeded for IP: {}, user: \"{}\", database: \"{}\"",


What's the unit? Per minute/second?

NikolayS added 18 commits September 29, 2025 19:25

Make authentication rate limit configurable

0dc78c7

Adds 'auth_rate_limit' config option (default: 10 attempts/minute). Can be configured via TOML or PGDOG_AUTH_RATE_LIMIT env variable.

Comment out slow rate limiter recovery test

5879bb8

The test would take 1 minute to complete, which is too slow for regular test runs. Left as commented code for manual testing.

Document auth_rate_limit configuration option

a9ab317

Add documentation for the new auth_rate_limit setting in the example configuration file.

Clarify block_in_place usage with trait constraint note

d6c9332

Adds comment explaining why block_in_place() is used instead of spawn_blocking() - the AuthenticationProvider trait is synchronous and cannot use .await.

levkk reviewed Sep 30, 2025

View reviewed changes

NikolayS changed the title ~~Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #2~~ Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #520

Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #520

NikolayS commented Sep 30, 2025

Uh oh!

CLAassistant commented Sep 30, 2025 •

edited

Loading

Uh oh!

levkk Sep 30, 2025

Uh oh!

levkk Sep 30, 2025

Uh oh!

levkk Sep 30, 2025

Uh oh!

Uh oh!

Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #520

Are you sure you want to change the base?

Fix SCRAM authentication DoS vulnerabilities and add IP-based rate limiting #520

Conversation

NikolayS commented Sep 30, 2025

Changes

Configuration

Follow-up Work

Uh oh!

CLAassistant commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levkk Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

levkk Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

levkk Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CLAassistant commented Sep 30, 2025 •

edited

Loading