feat: add per-user rate limiting middleware for inference endpoints. And remove catch all. by PierreLeGuen · Pull Request #67 · nearai/chat-api

PierreLeGuen · 2025-12-04T02:07:29Z

Introduced rate_limit_middleware to enforce maximum 2 concurrent requests and 1 request per second per user.
Updated API routes to integrate rate limiting.
Removed unused ALLOWED_PROXY_PATHS constant and related tests.
Added tests for rate limiting functionality to ensure correct behavior across different users and request scenarios.

…And remove catch all. - Introduced `rate_limit_middleware` to enforce maximum 2 concurrent requests and 1 request per second per user. - Updated API routes to integrate rate limiting. - Removed unused `ALLOWED_PROXY_PATHS` constant and related tests. - Added tests for rate limiting functionality to ensure correct behavior across different users and request scenarios.

Copilot

Pull request overview

This PR introduces per-user rate limiting middleware for inference endpoints and refactors the API routing from a catch-all pattern to explicit route handlers. The rate limiting enforces 2 concurrent requests and 1 request per second per user. However, there are critical bugs that must be addressed before merging.

Key Changes:

Added RateLimitState middleware with per-user concurrency and time-based rate limiting
Replaced catch-all proxy handler with explicit route handlers for /v1/responses, /v1/model/list, and /v1/signature/{chat_id}
Removed ALLOWED_PROXY_PATHS constant and related whitelist validation logic

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`crates/api/src/middleware/rate_limit.rs`	New rate limiting middleware implementation with per-user state tracking (contains critical semaphore leak bug)
`crates/api/src/middleware/mod.rs`	Exports rate limiting middleware components
`crates/api/src/routes/api.rs`	Replaced catch-all handler with explicit proxy routes; missing some previously allowed paths
`crates/api/src/routes/mod.rs`	Integrated `RateLimitState` into router creation
`crates/api/src/consts.rs`	Removed `ALLOWED_PROXY_PATHS` constant
`crates/api/src/lib.rs`	Removed `create_router` export, keeping only `create_router_with_cors`
`crates/api/tests/common.rs`	Updated test setup to use `create_router_with_cors` directly
`crates/api/tests/rate_limit_tests.rs`	Added integration tests for rate limiting behavior
`crates/api/tests/proxy_path_tests.rs`	Removed tests for catch-all proxy path validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T02:31:13Z

crates/api/src/middleware/rate_limit.rs

+    #[tokio::test]
+    async fn test_rate_limit_blocks_second_request_within_window() {
+        let state = RateLimitState::new();
+        let user = test_user_id(1);
+
+        // First request should succeed
+        let _guard1 = state.try_acquire(user.clone()).await.unwrap();
+
+        // Second request within the same second should fail (rate limit)
+        let result = state.try_acquire(user).await;
+        assert!(matches!(
+            result,
+            Err(RateLimitError::RateLimitExceeded { .. })
+        ));
+    }


Missing test coverage for the semaphore permit leak bug. The existing test test_rate_limit_blocks_second_request_within_window doesn't drop the first guard before the second request, so it won't catch the case where a permit is leaked when the rate limit (not concurrency limit) is exceeded.

A test should verify that after a rate-limited request is rejected, the concurrency limit is not affected. For example:

Make a request and drop the guard

Make a second request immediately (should fail with RateLimitExceeded)

Wait for the window to expire

Make max_concurrent requests simultaneously (should all succeed, proving no permits were leaked)

crates/api/src/routes/api.rs

crates/api/src/middleware/rate_limit.rs

- Improved user rate limit state management by adding idle user cleanup and refining concurrency checks. - Updated rate limit error handling to differentiate between concurrency and rate limit exceeded scenarios. - Removed redundant comments and streamlined the rate limit middleware for clarity. - Added tests to ensure proper behavior of rate limiting under various conditions.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/api/src/routes/api.rs

crates/api/tests/rate_limit_tests.rs

- Introduced a new test to validate the behavior of the rate limiting middleware under concurrent requests. - The test ensures that at least two out of three concurrent requests are rate limited, adhering to the defined rate limit configuration.

PierreLeGuen requested a review from think-in-universe December 4, 2025 02:07

PierreLeGuen self-assigned this Dec 4, 2025

think-in-universe requested review from Copilot and hanakannzashi December 4, 2025 02:26

Copilot started reviewing on behalf of think-in-universe December 4, 2025 02:27 View session

Copilot finished reviewing on behalf of think-in-universe December 4, 2025 02:30

Copilot AI reviewed Dec 4, 2025

View reviewed changes

think-in-universe requested a review from Copilot December 4, 2025 03:01

Copilot started reviewing on behalf of think-in-universe December 4, 2025 03:02 View session

Copilot finished reviewing on behalf of think-in-universe December 4, 2025 03:16

Copilot AI reviewed Dec 4, 2025

View reviewed changes

crates/api/src/routes/api.rs Show resolved Hide resolved

crates/api/tests/rate_limit_tests.rs Show resolved Hide resolved

PierreLeGuen added 2 commits December 3, 2025 19:21

fix: clippy

2e375d5

think-in-universe approved these changes Dec 4, 2025

View reviewed changes

think-in-universe merged commit ed61228 into main Dec 4, 2025
1 check passed

think-in-universe deleted the rate-limit-catch-all branch December 4, 2025 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-user rate limiting middleware for inference endpoints. And remove catch all.#67

feat: add per-user rate limiting middleware for inference endpoints. And remove catch all.#67
think-in-universe merged 4 commits intomainfrom
rate-limit-catch-all

PierreLeGuen commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PierreLeGuen commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants