Skip to content

Commit 3e2d673

Browse files
authored
Service rate limits (#2517)
1 parent 913da7d commit 3e2d673

File tree

16 files changed

+349
-30
lines changed

16 files changed

+349
-30
lines changed

docs/docs/concepts/gateways.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
# Gateways
22

33
Gateways manage the ingress traffic of running [services](services.md),
4-
provide an HTTPS endpoint mapped to your domain,
5-
and handle auto-scaling.
4+
provide an HTTPS endpoint mapped to your domain, handle auto-scaling and rate limits.
65

76
> If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
87
> the gateway is already set up for you.

docs/docs/concepts/services.md

Lines changed: 56 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ If [authorization](#authorization) is not disabled, the service endpoint require
100100

101101
However, you'll need a gateway in the following cases:
102102

103-
* To use auto-scaling
103+
* To use auto-scaling or rate limits
104104
* To enable HTTPS for the endpoint and map it to your domain
105105
* If your service requires WebSockets
106106
* If your service cannot work with a [path prefix](#path-prefix)
@@ -161,8 +161,7 @@ case `dstack` adjusts the number of replicas (scales up or down) automatically b
161161

162162
Setting the minimum number of replicas to `0` allows the service to scale down to zero when there are no requests.
163163

164-
>The `scaling` property currently requires creating a [gateway](gateways.md).
165-
This requirement is expected to be removed soon.
164+
> The `scaling` property requires creating a [gateway](gateways.md).
166165

167166
### Model
168167

@@ -238,6 +237,60 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
238237
If your app cannot be configured to work with a path prefix, you can host it
239238
on a dedicated domain name by setting up a [gateway](gateways.md).
240239

240+
### Rate Limits { #rate-limits }
241+
242+
If you have a [gateway](gateways.md), you can configure rate limits for your service
243+
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
244+
245+
<div editor-title="service.dstack.yml">
246+
247+
```yaml
248+
type: service
249+
image: my-app:latest
250+
port: 80
251+
252+
rate_limits:
253+
# For /api/auth/* - 1 request per second, no bursts
254+
- prefix: /api/auth/
255+
rps: 1
256+
# For other URLs - 4 requests per second + bursts of up to 9 requests
257+
- rps: 4
258+
burst: 9
259+
```
260+
261+
</div>
262+
263+
The limit is specified in requests per second, but requests are tracked with millisecond
264+
granularity. For example, `rps: 4` means at most 1 request every 250 milliseconds.
265+
For most applications, it is recommended to set the `burst` property, which allows
266+
temporary bursts, but keeps the average request rate at the limit specified in `rps`.
267+
268+
Rate limits are applied to the entire service regardless of the number of replicas.
269+
They are applied to each client separately, as determined by the client's IP address.
270+
If a client violates a limit, it receives an error with status code `429`.
271+
272+
??? info "Partitioning key"
273+
Instead of partitioning requests by client IP address,
274+
you can choose to partition by the value of a header.
275+
276+
<div editor-title="service.dstack.yml">
277+
278+
```yaml
279+
type: service
280+
image: my-app:latest
281+
port: 80
282+
283+
rate_limits:
284+
- rps: 4
285+
burst: 9
286+
# Apply to each user, as determined by the `Authorization` header
287+
key:
288+
type: header
289+
header: Authorization
290+
```
291+
292+
</div>
293+
241294
### Resources
242295
243296
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a

docs/docs/quickstart.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,9 @@ $ dstack init
191191
</div>
192192

193193
!!! info "Gateway"
194-
To enable [auto-scaling](concepts/services.md#replicas-and-scaling), or use a custom domain with HTTPS,
194+
To enable [auto-scaling](concepts/services.md#replicas-and-scaling),
195+
[rate limits](concepts/services.md#rate-limits),
196+
or use a custom domain with HTTPS,
195197
set up a [gateway](concepts/gateways.md) before running the service.
196198
If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
197199
a gateway is pre-configured for you.

docs/docs/reference/dstack.yml/service.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,38 @@ The `service` configuration type allows running [services](../../concepts/servic
7474
type:
7575
required: true
7676

77+
### `rate_limits`
78+
79+
#### `rate_limits[n]`
80+
81+
#SCHEMA# dstack._internal.core.models.configurations.RateLimit
82+
overrides:
83+
show_root_heading: false
84+
type:
85+
required: true
86+
87+
##### `rate_limits[n].key` { data-toc-label="key" }
88+
89+
=== "IP address"
90+
91+
Partition requests by client IP address.
92+
93+
#SCHEMA# dstack._internal.core.models.configurations.IPAddressPartitioningKey
94+
overrides:
95+
show_root_heading: false
96+
type:
97+
required: true
98+
99+
=== "Header"
100+
101+
Partition requests by the value of a header.
102+
103+
#SCHEMA# dstack._internal.core.models.configurations.HeaderPartitioningKey
104+
overrides:
105+
show_root_heading: false
106+
type:
107+
required: true
108+
77109
### `retry`
78110

79111
#SCHEMA# dstack._internal.core.models.profiles.ProfileRetry

scripts/docs/gen_schema_reference.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -76,11 +76,7 @@ def generate_schema_reference(
7676
# TODO: This is a dirty workaround
7777
if field_type:
7878
if field.annotation.__name__ == "Annotated":
79-
if field_type.__name__ == "Optional":
80-
field_type = get_args(field_type)[0]
81-
if field_type.__name__ == "List":
82-
field_type = get_args(field_type)[0]
83-
if field_type.__name__ == "Union":
79+
if field_type.__name__ in ["Optional", "List", "list", "Union"]:
8480
field_type = get_args(field_type)[0]
8581
base_model = (
8682
inspect.isclass(field_type)

src/dstack/_internal/core/models/configurations.py

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import re
2+
from collections import Counter
23
from enum import Enum
34
from typing import Any, Dict, List, Optional, Union
45

@@ -18,6 +19,7 @@
1819

1920
CommandsList = List[str]
2021
ValidPort = conint(gt=0, le=65536)
22+
MAX_INT64 = 2**63 - 1
2123
SERVICE_HTTPS_DEFAULT = True
2224
STRIP_PREFIX_DEFAULT = True
2325

@@ -85,6 +87,70 @@ class ScalingSpec(CoreModel):
8587
] = Duration.parse("10m")
8688

8789

90+
class IPAddressPartitioningKey(CoreModel):
91+
type: Annotated[Literal["ip_address"], Field(description="Partitioning type")] = "ip_address"
92+
93+
94+
class HeaderPartitioningKey(CoreModel):
95+
type: Annotated[Literal["header"], Field(description="Partitioning type")] = "header"
96+
header: Annotated[
97+
str,
98+
Field(
99+
description="Name of the header to use for partitioning",
100+
regex=r"^[a-zA-Z0-9-_]+$", # prevent Nginx config injection
101+
max_length=500, # chosen randomly, Nginx limit is higher
102+
),
103+
]
104+
105+
106+
class RateLimit(CoreModel):
107+
prefix: Annotated[
108+
str,
109+
Field(
110+
description=(
111+
"URL path prefix to which this limit is applied."
112+
" If an incoming request matches several prefixes, the longest prefix is applied"
113+
),
114+
max_length=4094, # Nginx limit
115+
regex=r"^/[^\s\\{}]*$", # prevent Nginx config injection
116+
),
117+
] = "/"
118+
key: Annotated[
119+
Union[IPAddressPartitioningKey, HeaderPartitioningKey],
120+
Field(
121+
discriminator="type",
122+
description=(
123+
"The partitioning key. Each incoming request belongs to a partition"
124+
" and rate limits are applied per partition."
125+
" Defaults to partitioning by client IP address"
126+
),
127+
),
128+
] = IPAddressPartitioningKey()
129+
rps: Annotated[
130+
float,
131+
Field(
132+
description=(
133+
"Max allowed number of requests per second."
134+
" Requests are tracked at millisecond granularity."
135+
" For example, `rps: 10` means at most 1 request per 100ms"
136+
),
137+
# should fit into Nginx limits after being converted to requests per minute
138+
ge=1 / 60,
139+
le=MAX_INT64 // 60,
140+
),
141+
]
142+
burst: Annotated[
143+
int,
144+
Field(
145+
ge=0,
146+
le=MAX_INT64, # Nginx limit
147+
description=(
148+
"Max number of requests that can be passed to the service ahead of the rate limit"
149+
),
150+
),
151+
] = 0
152+
153+
88154
class BaseRunConfiguration(CoreModel):
89155
type: Literal["none"]
90156
name: Annotated[
@@ -306,6 +372,7 @@ class ServiceConfigurationParams(CoreModel):
306372
Optional[ScalingSpec],
307373
Field(description="The auto-scaling rules. Required if `replicas` is set to a range"),
308374
] = None
375+
rate_limits: Annotated[list[RateLimit], Field(description="Rate limiting rules")] = []
309376

310377
@validator("port")
311378
def convert_port(cls, v) -> PortMapping:
@@ -358,6 +425,17 @@ def validate_scaling(cls, values):
358425
raise ValueError("To use `scaling`, `replicas` must be set to a range.")
359426
return values
360427

428+
@validator("rate_limits")
429+
def validate_rate_limits(cls, v: list[RateLimit]) -> list[RateLimit]:
430+
counts = Counter(limit.prefix for limit in v)
431+
duplicates = [prefix for prefix, count in counts.items() if count > 1]
432+
if duplicates:
433+
raise ValueError(
434+
f"Prefixes {duplicates} are used more than once."
435+
" Each rate limit should have a unique path prefix"
436+
)
437+
return v
438+
361439

362440
class ServiceConfiguration(
363441
ProfileParams, BaseRunConfigurationWithCommands, ServiceConfigurationParams

src/dstack/_internal/proxy/gateway/resources/nginx/service.jinja2

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
{% for zone in limit_req_zones %}
2+
limit_req_zone {{ zone.key }} zone={{ zone.name }}:10m rate={{ zone.rpm }}r/m;
3+
{% endfor %}
4+
15
{% if replicas %}
26
upstream {{ run_name }} {
37
{% for replica in replicas %}
@@ -9,21 +13,27 @@ upstream {{ run_name }} {
913
{% endif %}
1014
server {
1115
server_name {{ domain }};
12-
16+
limit_req_status 429;
1317
access_log {{ access_log_path }} dstack_stat;
1418
client_max_body_size {{ client_max_body_size }};
1519

16-
location / {
20+
{% for location in locations %}
21+
location {{ location.prefix }} {
1722
{% if auth %}
18-
auth_request /auth;
23+
auth_request /_dstack_auth;
1924
{% endif %}
2025

2126
{% if replicas %}
2227
try_files /nonexistent @$http_upgrade;
2328
{% else %}
2429
return 503;
2530
{% endif %}
31+
32+
{% if location.limit_req %}
33+
limit_req zone={{ location.limit_req.zone }}{% if location.limit_req.burst %} burst={{ location.limit_req.burst }} nodelay{% endif %};
34+
{% endif %}
2635
}
36+
{% endfor %}
2737

2838
{% if replicas %}
2939
location @websocket {
@@ -44,7 +54,7 @@ server {
4454
{% endif %}
4555

4656
{% if auth %}
47-
location = /auth {
57+
location = /_dstack_auth {
4858
internal;
4959
if ($remote_addr = 127.0.0.1) {
5060
return 200;

src/dstack/_internal/proxy/gateway/routers/registry.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ async def register_service(
3030
run_name=body.run_name.lower(),
3131
domain=body.domain.lower(),
3232
https=body.https,
33+
rate_limits=body.rate_limits,
3334
auth=body.auth,
3435
client_max_body_size=body.client_max_body_size,
3536
model=body.options.openai.model if body.options.openai is not None else None,

src/dstack/_internal/proxy/gateway/schemas/registry.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from pydantic import BaseModel, Field
44

55
from dstack._internal.core.models.instances import SSHConnectionParams
6+
from dstack._internal.proxy.lib.models import RateLimit
67

78

89
class BaseChatModel(BaseModel):
@@ -42,6 +43,7 @@ class RegisterServiceRequest(BaseModel):
4243
client_max_body_size: int
4344
options: Options
4445
ssh_private_key: str
46+
rate_limits: tuple[RateLimit, ...] = ()
4547

4648

4749
class RegisterReplicaRequest(BaseModel):

src/dstack/_internal/proxy/gateway/services/nginx.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import tempfile
44
from asyncio import Lock
55
from pathlib import Path
6+
from typing import Optional
67

78
import jinja2
89
from pydantic import BaseModel
@@ -38,13 +39,31 @@ class ReplicaConfig(BaseModel):
3839
socket: Path
3940

4041

42+
class LimitReqZoneConfig(BaseModel):
43+
name: str
44+
key: str
45+
rpm: int
46+
47+
48+
class LimitReqConfig(BaseModel):
49+
zone: str
50+
burst: int
51+
52+
53+
class LocationConfig(BaseModel):
54+
prefix: str
55+
limit_req: Optional[LimitReqConfig]
56+
57+
4158
class ServiceConfig(SiteConfig):
4259
type: Literal["service"] = "service"
4360
project_name: str
4461
run_name: str
4562
auth: bool
4663
client_max_body_size: int
4764
access_log_path: Path
65+
limit_req_zones: list[LimitReqZoneConfig]
66+
locations: list[LocationConfig]
4867
replicas: list[ReplicaConfig]
4968

5069

0 commit comments

Comments
 (0)