Skip to content

Conversation

@phacops
Copy link
Contributor

@phacops phacops commented Jan 22, 2026

Summary

  • Reduce the rate limiter Redis socket timeout from 0.5s to 0.1s to fail faster when Redis is slow or unavailable
  • Emit a new metric ratelimiter_redis_timeout when a Redis timeout occurs, tagged with the function (start_request or finish_request) where it happened

Test plan

  • Deploy to staging and monitor for ratelimiter_redis_timeout metric
  • Verify rate limiting continues to work when Redis is healthy
  • Confirm requests are not blocked when Redis times out (existing fail-open behavior preserved)

🤖 Generated with Claude Code

Reduce the rate limiter Redis socket timeout from 0.5s to 0.1s to fail
faster when Redis is slow or unavailable. Additionally, emit a metric
(ratelimiter_redis_timeout) when a timeout occurs to enable monitoring.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@phacops phacops requested a review from a team as a code owner January 22, 2026 21:13
pipe.expire(query_bucket, max_query_duration_s)
pipe.execute()
except Exception as ex:
logger.exception(ex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't want to log this exeption. it just becomes a sentry error and we want to avoid that.

Only emit metric for Redis timeout errors without logging, since
timeouts are expected when Redis is slow and logging would be noisy.
Other exceptions continue to be logged.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@volokluev
Copy link
Member

Also, I don't think this is where we want to try/catch things. All of these errors happen in the allocation policy and this is not the only rate limiter that we use. (We also use a sliding window rate limiter for different policies). I think this change is better made in allocation_policies/__init__.py

The metric is already emitted in the dedicated RedisTimeoutError handler,
so no need to check for it again in the generic Exception handler.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@phacops phacops requested a review from volokluev January 22, 2026 21:37
phacops and others added 2 commits January 22, 2026 13:38
Consolidate the error return path by using a finally block to check
if an exception occurred and return the fail-open stats.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@phacops phacops force-pushed the reduce-ratelimiter-redis-timeout branch from a3c3671 to c7f29b1 Compare January 22, 2026 21:50
phacops and others added 2 commits January 22, 2026 14:09
Move the RedisTimeoutError handling from rate_limit.py to
allocation_policies/__init__.py. This covers all rate limiters
(including sliding window) since they all go through the allocation
policy base class.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@phacops phacops enabled auto-merge (squash) January 22, 2026 22:33
@phacops phacops merged commit 8bb4b90 into master Jan 23, 2026
34 checks passed
@phacops phacops deleted the reduce-ratelimiter-redis-timeout branch January 23, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants