Serverless rate limiting for AI model endpoints

**Is your feature request related to a problem? Please describe.**
Existing rate limiting solution requires endpoint to be continually running to correctly limit requests but our serverless infrastructure makes it such that it cannot easily maintain state between teardowns of the application on subsequent requests.

**Describe the solution you'd like**
Use [Upstash rate limiting](https://github.com/upstash/ratelimit-js) to rate limit incoming requests. More specifically we should

- Rate limit incoming requests by developer token
- Rate limit requests from internal main key by incoming IP address

**Describe alternatives you've considered**
Existing Redis solution doesn't scale as well as a serverless solution and does not have built-in rate limiting functionality.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serverless rate limiting for AI model endpoints #814

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serverless rate limiting for AI model endpoints #814

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions