Skip to content

Commit 74f41f6

Browse files
authored
feat: support gzip & zstd compression (#599)
* feat: support gzip & zstd compression Signed-off-by: Keming <kemingy94@gmail.com> * fix args help Signed-off-by: Keming <kemingy94@gmail.com> * add compression example Signed-off-by: Keming <kemingy94@gmail.com> --------- Signed-off-by: Keming <kemingy94@gmail.com>
1 parent 03a5fdd commit 74f41f6

File tree

14 files changed

+323
-5
lines changed

14 files changed

+323
-5
lines changed

Cargo.lock

Lines changed: 98 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "mosec"
3-
version = "0.8.9"
3+
version = "0.9.0"
44
authors = ["Keming <kemingy94@gmail.com>", "Zichen <lkevinzc@gmail.com>"]
55
edition = "2021"
66
license = "Apache-2.0"
@@ -25,3 +25,5 @@ serde = "1.0"
2525
serde_json = "1.0"
2626
utoipa = "5"
2727
utoipa-swagger-ui = { version = "8", features = ["axum"] }
28+
tower = "0.5.1"
29+
tower-http = {version = "0.6.1", features = ["compression-zstd", "decompression-zstd", "compression-gzip", "decompression-gzip"]}

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,7 @@ More ready-to-use examples can be found in the [Example](https://mosecorg.github
193193
- [Customized GPU allocation](https://mosecorg.github.io/mosec/examples/env.html): deploy multiple replicas, each using different GPUs.
194194
- [Customized metrics](https://mosecorg.github.io/mosec/examples/metric.html): record your own metrics for monitoring.
195195
- [Jax jitted inference](https://mosecorg.github.io/mosec/examples/jax.html): just-in-time compilation speeds up the inference.
196+
- [Compression](https://mosecorg.github.io/mosec/examples/compression.html): enable request/response compression.
196197
- PyTorch deep learning models:
197198
- [sentiment analysis](https://mosecorg.github.io/mosec/examples/pytorch.html#natural-language-processing): infer the sentiment of a sentence.
198199
- [image recognition](https://mosecorg.github.io/mosec/examples/pytorch.html#computer-vision): categorize a given image.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Compression
2+
3+
This example demonstrates how to use the `--compression` feature for segmentation tasks. We use the example from the [Segment Anything Model 2](https://github.com/facebookresearch/sam2/blob/main/notebooks/image_predictor_example.ipynb). The request includes an image and its low resolution mask, the response is the final mask. Since there are lots of duplicate values in the mask, we can use `gzip` or `zstd` to compress it.
4+
5+
## Server
6+
7+
```shell
8+
python examples/segment/server.py --compression
9+
```
10+
11+
<details>
12+
<summary>segment.py</summary>
13+
14+
```{include} ../../../examples/segment/server.py
15+
:code: python
16+
```
17+
18+
</details>
19+
20+
## Client
21+
22+
```shell
23+
python examples/segment/client.py
24+
```
25+
26+
<details>
27+
<summary>segment.py</summary>
28+
29+
```{include} ../../../examples/segment/client.py
30+
:code: python
31+
```
32+
33+
</details>

docs/source/examples/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ pytorch
1616
rerank
1717
stable_diffusion
1818
validate
19+
compression
1920
```
2021

2122
We provide examples across different ML frameworks and for various tasks in this section.

examples/segment/client.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Copyright 2023 MOSEC Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import gzip
16+
from http import HTTPStatus
17+
from io import BytesIO
18+
19+
import httpx
20+
import msgpack # type: ignore
21+
import numbin
22+
import numpy as np
23+
from PIL import Image # type: ignore
24+
25+
truck_image = Image.open(
26+
BytesIO(
27+
httpx.get(
28+
"https://raw.githubusercontent.com/facebookresearch/sam2/main/notebooks/images/truck.jpg"
29+
).content
30+
)
31+
)
32+
array = np.array(truck_image.convert("RGB"))
33+
# assume we have obtains the low resolution mask from the previous step
34+
mask = np.zeros((256, 256))
35+
36+
resp = httpx.post(
37+
"http://127.0.0.1:8000/inference",
38+
content=gzip.compress(
39+
msgpack.packb( # type: ignore
40+
{
41+
"image": numbin.dumps(array),
42+
"mask": numbin.dumps(mask),
43+
"labels": [1, 1],
44+
"point_coords": [[500, 375], [1125, 625]],
45+
}
46+
)
47+
),
48+
headers={"Accept-Encoding": "gzip", "Content-Encoding": "gzip"},
49+
)
50+
assert resp.status_code == HTTPStatus.OK, resp.status_code
51+
res = numbin.loads(msgpack.loads(resp.content))
52+
assert res.shape == array.shape[:2], f"expect {array.shape[:2]}, got {res.shape}"

examples/segment/server.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Copyright 2023 MOSEC Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# refer to https://github.com/facebookresearch/sam2/blob/main/notebooks/image_predictor_example.ipynb
16+
17+
import numbin
18+
import torch # type: ignore
19+
from sam2.sam2_image_predictor import SAM2ImagePredictor # type: ignore
20+
21+
from mosec import Server, Worker, get_logger
22+
from mosec.mixin import MsgpackMixin
23+
24+
logger = get_logger()
25+
MIN_TF32_MAJOR = 8
26+
27+
28+
class SegmentAnything(MsgpackMixin, Worker):
29+
def __init__(self):
30+
# select the device for computation
31+
if torch.cuda.is_available():
32+
device = torch.device("cuda")
33+
elif torch.backends.mps.is_available():
34+
device = torch.device("mps")
35+
else:
36+
device = torch.device("cpu")
37+
logger.info("using device: %s", device)
38+
39+
self.predictor = SAM2ImagePredictor.from_pretrained(
40+
"facebook/sam2-hiera-large", device=device
41+
)
42+
43+
if device.type == "cuda":
44+
# use bfloat16
45+
torch.autocast("cuda", dtype=torch.bfloat16).__enter__()
46+
# turn on tf32 for Ampere GPUs (https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices)
47+
if torch.cuda.get_device_properties(0).major >= MIN_TF32_MAJOR:
48+
torch.backends.cuda.matmul.allow_tf32 = True
49+
torch.backends.cudnn.allow_tf32 = True
50+
51+
def forward(self, data: dict) -> bytes:
52+
with torch.inference_mode():
53+
self.predictor.set_image(numbin.loads(data["image"]))
54+
masks, _, _ = self.predictor.predict(
55+
point_coords=data["point_coords"],
56+
point_labels=data["labels"],
57+
mask_input=numbin.loads(data["mask"])[None, :, :],
58+
multimask_output=False,
59+
)
60+
return numbin.dumps(masks[0])
61+
62+
63+
if __name__ == "__main__":
64+
server = Server()
65+
server.append_worker(SegmentAnything, num=1, max_batch_size=1)
66+
server.run()

mosec/args.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,13 @@ def build_arguments_parser() -> argparse.ArgumentParser:
134134
"This will omit the worker number for each stage.",
135135
action="store_true",
136136
)
137+
138+
parser.add_argument(
139+
"--compression",
140+
help="Enable `zstd` & `gzip` compression for the request & response",
141+
action="store_true",
142+
)
143+
137144
return parser
138145

139146

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ classifiers = [
2323
"Programming Language :: Python :: 3.10",
2424
"Programming Language :: Python :: 3.11",
2525
"Programming Language :: Python :: 3.12",
26+
"Programming Language :: Python :: 3.13",
2627
"Programming Language :: Python :: Implementation :: CPython",
2728
"Programming Language :: Rust",
2829
"Topic :: Scientific/Engineering :: Artificial Intelligence",

requirements/dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ ruff>=0.7
77
pre-commit>=2.15.0
88
httpx[http2]==0.27.2
99
httpx-sse==0.4.0
10+
zstandard~=0.23

0 commit comments

Comments
 (0)