Skip to content

Commit f0ea3b5

Browse files
kemingylkevinzc
andauthored
feat: add http/2 support (#568)
* feat: add http/2 support Signed-off-by: Keming <kemingy94@gmail.com> * update readme Signed-off-by: Keming <kemingy94@gmail.com> * Update README.md Signed-off-by: zclzc <38581401+lkevinzc@users.noreply.github.com> * fix sd1.5 link, add http/2 test Signed-off-by: Keming <kemingy94@gmail.com> --------- Signed-off-by: Keming <kemingy94@gmail.com> Signed-off-by: zclzc <38581401+lkevinzc@users.noreply.github.com> Co-authored-by: zclzc <38581401+lkevinzc@users.noreply.github.com>
1 parent 4ce3e1a commit f0ea3b5

File tree

9 files changed

+143
-90
lines changed

9 files changed

+143
-90
lines changed

Cargo.lock

Lines changed: 107 additions & 73 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "mosec"
3-
version = "0.8.7"
3+
version = "0.8.8"
44
authors = ["Keming <kemingy94@gmail.com>", "Zichen <lkevinzc@gmail.com>"]
55
edition = "2021"
66
license = "Apache-2.0"
@@ -10,9 +10,7 @@ description = "Model Serving made Efficient in the Cloud."
1010
documentation = "https://docs.rs/mosec"
1111
exclude = ["target", "examples", "tests", "scripts"]
1212

13-
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
1413
[dependencies]
15-
hyper = { version = "1", features = ["http1", "server"] }
1614
bytes = "1"
1715
tracing = "0.1"
1816
tracing-subscriber = { version = "0.3", features = ["local-time", "json"] }
@@ -21,7 +19,7 @@ derive_more = { version = "1", features = ["display", "error", "from"] }
2119
# MPMS that only one consumer sees each message & async
2220
async-channel = "2.2"
2321
prometheus-client = "0.22"
24-
axum = "0.7"
22+
axum = { version = "0.7", default-features = false, features = ["matched-path", "original-uri", "query", "tokio", "http1", "http2"]}
2523
async-stream = "0.3.5"
2624
serde = "1.0"
2725
serde_json = "1.0"

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ from mosec.mixin import MsgpackMixin
9191
logger = get_logger()
9292
```
9393

94-
Then, we **build an API** for clients to query a text prompt and obtain an image based on the [stable-diffusion-v1-5 model](https://huggingface.co/runwayml/stable-diffusion-v1-5) in just 3 steps.
94+
Then, we **build an API** for clients to query a text prompt and obtain an image based on the [stable-diffusion-v1-5 model](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) in just 3 steps.
9595

9696
1) Define your service as a class which inherits `mosec.Worker`. Here we also inherit `MsgpackMixin` to employ the [msgpack](https://msgpack.org/index.html) serialization format<sup>(a)</sup></a>.
9797

@@ -104,10 +104,9 @@ Then, we **build an API** for clients to query a text prompt and obtain an image
104104
class StableDiffusion(MsgpackMixin, Worker):
105105
def __init__(self):
106106
self.pipe = StableDiffusionPipeline.from_pretrained(
107-
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
107+
"sd-legacy/stable-diffusion-v1-5", torch_dtype=torch.float16
108108
)
109-
device = "cuda" if torch.cuda.is_available() else "cpu"
110-
self.pipe = self.pipe.to(device)
109+
self.pipe.enable_model_cpu_offload()
111110
self.example = ["useless example prompt"] * 4 # warmup (batch_size=4)
112111

113112
def forward(self, data: List[str]) -> List[memoryview]:
@@ -229,6 +228,7 @@ More ready-to-use examples can be found in the [Example](https://mosecorg.github
229228
- For multi-stage services, note that the data passing through different stages will be serialized/deserialized by the `serialize_ipc/deserialize_ipc` methods, so extremely large data might make the whole pipeline slow. The serialized data is passed to the next stage through rust by default, you could enable shared memory to potentially reduce the latency (ref [RedisShmIPCMixin](https://mosecorg.github.io/mosec/examples/ipc.html#redis-shm-ipc-py)).
230229
- You should choose appropriate `serialize/deserialize` methods, which are used to decode the user request and encode the response. By default, both are using JSON. However, images and embeddings are not well supported by JSON. You can choose msgpack which is faster and binary compatible (ref [Stable Diffusion](https://mosecorg.github.io/mosec/examples/stable_diffusion.html)).
231230
- Configure the threads for OpenBLAS or MKL. It might not be able to choose the most suitable CPUs used by the current Python process. You can configure it for each worker by using the [env](https://mosecorg.github.io/mosec/reference/interface.html#mosec.server.Server.append_worker) (ref [custom GPU allocation](https://mosecorg.github.io/mosec/examples/env.html)).
231+
- Enable HTTP/2 from client side. `mosec` automatically adapts to user's protocol (e.g., HTTP/2) since v0.8.8.
232232

233233
## Adopters
234234

examples/stable_diffusion/server.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@
2727
class StableDiffusion(MsgpackMixin, Worker):
2828
def __init__(self):
2929
self.pipe = StableDiffusionPipeline.from_pretrained(
30-
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
30+
"sd-legacy/stable-diffusion-v1-5",
31+
torch_dtype=torch.float16,
3132
)
32-
device = "cuda" if torch.cuda.is_available() else "cpu"
33-
self.pipe = self.pipe.to(device) # type: ignore
33+
self.pipe.enable_model_cpu_offload()
3434
self.example = ["useless example prompt"] * 4 # warmup (bs=4)
3535

3636
def forward(self, data: List[str]) -> List[memoryview]:

requirements/dev.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@ mypy~=1.11
55
pyright~=1.1
66
ruff~=0.6
77
pre-commit>=2.15.0
8-
httpx==0.27.2
8+
httpx[http2]==0.27.2
99
httpx-sse==0.4.0

src/main.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ fn main() {
123123
.with(
124124
output
125125
.with_filter(filter::filter_fn(|metadata| {
126-
!metadata.target().starts_with("hyper")
126+
!metadata.target().starts_with("h2")
127127
}))
128128
.with_filter(filter::LevelFilter::DEBUG),
129129
)

src/routes.rs

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,11 @@
1515
use std::time::Duration;
1616

1717
use axum::body::{to_bytes, Body};
18-
use axum::http::Uri;
18+
use axum::http::header::{HeaderValue, CONTENT_TYPE};
19+
use axum::http::{Request, Response, StatusCode, Uri};
1920
use axum::response::sse::{Event, KeepAlive, Sse};
2021
use axum::response::IntoResponse;
2122
use bytes::Bytes;
22-
use hyper::header::{HeaderValue, CONTENT_TYPE};
23-
use hyper::{Request, Response, StatusCode};
2423
use prometheus_client::encoding::text::encode;
2524
use tracing::warn;
2625
use utoipa::OpenApi;

src/tasks.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ use std::sync::atomic::{AtomicBool, Ordering};
1818
use std::sync::{Arc, Mutex, OnceLock};
1919
use std::time::{Duration, Instant};
2020

21+
use axum::http::StatusCode;
2122
use bytes::Bytes;
22-
use hyper::StatusCode;
2323
use tokio::sync::{mpsc, oneshot, Barrier};
2424
use tokio::time;
2525
use tracing::{debug, error, info, warn};

tests/test_service.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,15 @@ def http_client():
4040
yield client
4141

4242

43+
@pytest.fixture
44+
def http2_client():
45+
# force to use HTTP/2
46+
with httpx.Client(
47+
base_url=f"http://127.0.0.1:{TEST_PORT}", http1=False, http2=True
48+
) as client:
49+
yield client
50+
51+
4352
@pytest.fixture(scope="session")
4453
def mosec_service(request):
4554
params = request.param.split(" ")
@@ -54,6 +63,19 @@ def mosec_service(request):
5463
assert wait_for_port_free(port=TEST_PORT), "service failed to stop"
5564

5665

66+
@pytest.mark.parametrize(
67+
"mosec_service, http2_client",
68+
[
69+
pytest.param("square_service", "", id="HTTP/2"),
70+
],
71+
indirect=["mosec_service", "http2_client"],
72+
)
73+
def test_http2_service(mosec_service, http2_client):
74+
resp = http2_client.get("/")
75+
assert resp.status_code == HTTPStatus.OK
76+
assert resp.http_version == "HTTP/2"
77+
78+
5779
@pytest.mark.parametrize(
5880
"mosec_service, http_client",
5981
[

0 commit comments

Comments
 (0)