Skip to content

Commit d347fbf

Browse files
authored
Adds minimal example of http_server (#1514)
* adds minimal http_server example * fix URL * fix URL
1 parent 43d2512 commit d347fbf

File tree

1 file changed

+109
-0
lines changed

1 file changed

+109
-0
lines changed

07_web_endpoints/http_server.py

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# ---
2+
# mypy: ignore-errors
3+
# ---
4+
5+
# # Deploy HTTP servers with ultra low latency on Modal
6+
7+
# Modal offers a primitive for edge-deployed, low latency web services:
8+
# the Modal HTTP Server.
9+
10+
# Modal HTTP Servers are designed for applications with very demanding
11+
# latency requirements, where a few tens of milliseconds of round-trip latency is unacceptable,
12+
# like [low latency LLM inference](https://modal.com/docs/guide/high-performance-llm-inference).
13+
# That ends up meaning users and clients are required to do more work.
14+
# For Modal's higher-level primitives for web serving, see
15+
# [this guide](https://modal.com/docs/guide/webhooks).
16+
17+
# This example documents a minimal Modal HTTP Server and client.
18+
19+
# ## How to define a Modal HTTP Server
20+
21+
from pathlib import Path
22+
23+
import modal
24+
import modal.experimental
25+
26+
# Notice that we imported `modal.experimental` above.
27+
# Modal HTTP Servers are still under development,
28+
# so the interface is subject to change.
29+
30+
# To make a Modal HTTP Server, define a Python class
31+
# with a [`modal.enter`-decorated](https://modal.com/docs/guide/lifecycle-functions) method
32+
# that creates a subtask (thread or process) that listens for HTTP requests on some port.
33+
34+
# Then wrap that class in the `modal.experimental.http_server` decorator,
35+
# passing in the `port` your server task is listening on
36+
# and a list of `proxy_regions` where Modal should add your server to an edge proxy
37+
# that communicates directly with the containers running your server.
38+
39+
# Finally, add one more decorator, `app.cls`, with the rest of your resource definitions,
40+
# like [distributed Volume storage](https://modal.com/docs/guide/volumes)
41+
# [CPU/memory resources](https://modal.com/docs/guide/resources),
42+
# and [GPU type and count](https://modal.com/docs/guide/gpu).
43+
# To reduce end-to-end latency, include a [Region](https://modal.com/docs/guide/region-selection)
44+
# in this decorator that matches the proxy region and containers will be deployed into that Region.
45+
# Note that region-pinning has cost and resource availability implications!
46+
# See [the guide](https://modal.com/docs/guide/region-selection)
47+
# for details.
48+
49+
# Altogether, the minimal version of a Modal HTTP Server looks something like:
50+
51+
PORT = 8000
52+
REGION = "us"
53+
PROXY_REGION = "us-east"
54+
55+
app = modal.App("example-http-server")
56+
57+
58+
@app.cls(region=REGION)
59+
@modal.experimental.http_server(port=PORT, proxy_regions=[PROXY_REGION])
60+
class FileServer:
61+
@modal.enter()
62+
def start(self):
63+
import subprocess
64+
65+
subprocess.Popen(["python", "-m", "http.server", f"{PORT}"])
66+
67+
68+
# ## How to write a client and tests for a Modal HTTP Server
69+
70+
# We test the file server defined above by requesting file from it.
71+
# This one will do nicely.
72+
73+
# We put the test in a `local_entrypoint` so that we can execute it from the command line:
74+
75+
# ```bash
76+
# modal run http_server.py
77+
# ```
78+
79+
80+
@app.local_entrypoint()
81+
def ping():
82+
from urllib.error import HTTPError
83+
from urllib.request import urlopen
84+
85+
url = FileServer._experimental_get_flash_urls()[0] # one URL per proxy region
86+
87+
this = Path(__file__).name
88+
89+
print(f"requesting {this} from Modal HTTP Server at {url}")
90+
91+
while True:
92+
try:
93+
print(urlopen(url + f"/{this}").read().decode("utf-8"))
94+
break
95+
except HTTPError as e:
96+
if e.code == 503:
97+
continue
98+
else:
99+
raise e
100+
101+
102+
# Notice the retry loop! Modal Clses and Functions are serverless and scale to zero by default.
103+
# When a Modal HTTP Server has scaled to zero, clients will get a
104+
# [503 Service Unavailable](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/503)
105+
# error response from Modal. Those requests still trigger the underlying Modal Cls to scale up,
106+
# and once a container is ready, the 503s will stop and clients will receive the server's responses.
107+
108+
# Modal HTTP Servers also support "sticky routing" for improved cache locality within client sessions.
109+
# For details, see [this example](https://modal.com/docs/examples/http_server_sticky).

0 commit comments

Comments
 (0)