Skip to content

feat(rate_limiter): add SlidingWindowRateLimiter for strict per-minute caps#20799

Open
dive2tech wants to merge 8 commits intorun-llama:mainfrom
dive2tech:feat/sliding-window-rate-limiter
Open

feat(rate_limiter): add SlidingWindowRateLimiter for strict per-minute caps#20799
dive2tech wants to merge 8 commits intorun-llama:mainfrom
dive2tech:feat/sliding-window-rate-limiter

Conversation

@dive2tech
Copy link

Summary

Adds a new rate limiter implementation, SlidingWindowRateLimiter, as an alternative to the existing TokenBucketRateLimiter.

Motivation

Token-bucket limiters allow bursts at the start of each window. Some APIs enforce strict limits over a rolling 60-second window with no burst allowance. This implementation enforces a strict sliding window: only requests (or tokens) within the last 60 seconds count toward the limit.

Changes

  • llama_index/core/rate_limiter.py: New SlidingWindowRateLimiter class implementing BaseRateLimiter with:
    • Optional requests-per-minute (RPM) and tokens-per-minute (TPM) limits (at least one required)
    • Thread-safe sync acquire() and async async_acquire()
    • Pruning of out-of-window entries and blocking until capacity is available
  • tests/test_rate_limiter.py: Tests for creation, validation, blocking behavior, pruning, TPM limiting, async/concurrent behavior, and LLM/embedding integration.

Usage

from llama_index.core.rate_limiter import SlidingWindowRateLimiter

limiter = SlidingWindowRateLimiter(requests_per_minute=60)
llm = SomeLLM(rate_limiter=limiter)

All existing rate limiter tests pass; 13 new tests added for the sliding-window implementation.

…e caps

Add SlidingWindowRateLimiter as an alternative to TokenBucketRateLimiter.
It enforces a strict sliding 60-second window for RPM/TPM, with no burst
at window boundaries. Includes full sync/async support and tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 25, 2026
Extend SlidingWindowRateLimiter with optional request_burst and token_burst
parameters so callers can configure limited burst headroom while keeping a
sliding 60s window model. Includes tests covering request and token bursts.

Made-with: Cursor
@dive2tech
Copy link
Author

Hi, @rootInfluence . Thanks for your feedback.
I have considered your suggestions and remake the code.

…window

Replace the SlidingWindowRateLimiter __init__ override with a model_validator
that enforces the RPM/TPM requirement and remove a redundant instance-type
test. This keeps the Pydantic API ergonomic while maintaining invariants.

Made-with: Cursor
@dive2tech
Copy link
Author

Hi @AstraBert,
Thank you for your feedback.
I’ve updated the code as per your suggestions. Could you kindly review it?
I appreciate your time and assistance.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 27, 2026
@AstraBert
Copy link
Member

Linting is failing, you should ensure linting by running:

uv pip install pre-commit
pre-commit install
pre-commit run -a

From the root folder of the llama_index repo

Apply ruff/black formatting to rate_limiter module so it passes the
standard pre-commit hooks used in this repository.

Made-with: Cursor
@dive2tech
Copy link
Author

Fixed lint error, ready to merge

@dive2tech dive2tech requested a review from AstraBert February 27, 2026 13:38
Add a small compatibility shim for NoneType so llama_index.core.base.llms.types
imports cleanly on Python 3.9 where types.NoneType is not available.

Made-with: Cursor
@dive2tech
Copy link
Author

I've fixed the CI bot test error. Please try running it again.
Thank you for your time.

# NOTE:
# Python 3.9 does not expose `types.NoneType`, so we define a local alias that
# works across all supported versions instead of importing it from `types`.
NoneType = type(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to stabilize integrations across versions: integrations will most probably fail all the times when a modification is made on core, and this is mostly due to flakiness in their own tests. Especially for 3.9, we are preparing to drop support, so there is no need to adapt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the last commit

@AstraBert
Copy link
Member

I will merge this one once the last commit is reverted and CI is done. If linting, type-checking and the core tests on 3.12 and 3.14 are passing, this is all I need to merge, it's not a problem if other CI checks fail

@dive2tech
Copy link
Author

Hi, @AstraBert, Thanks for your feedback.
I just reverted last commit.
Unit Testing / test 3.12 and core-py314 is passed. so is it ready to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants