Skip to content

Serverless rate limiting for AI model endpoints #814

@ebubae

Description

@ebubae

Is your feature request related to a problem? Please describe.
Existing rate limiting solution requires endpoint to be continually running to correctly limit requests but our serverless infrastructure makes it such that it cannot easily maintain state between teardowns of the application on subsequent requests.

Describe the solution you'd like
Use Upstash rate limiting to rate limit incoming requests. More specifically we should

  • Rate limit incoming requests by developer token
  • Rate limit requests from internal main key by incoming IP address

Describe alternatives you've considered
Existing Redis solution doesn't scale as well as a serverless solution and does not have built-in rate limiting functionality.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions