Skip to content

Commit 8ad74f7

Browse files
authored
Switch to tritonclient in OVMS adapter (#206)
* Switch to tritonclient in OVMS adapter * Fix linter * Update ovms docs * Update OVMS address * Update OVMS launch configs * Update docs
1 parent b9efcb9 commit 8ad74f7

File tree

7 files changed

+115
-100
lines changed

7 files changed

+115
-100
lines changed

.github/workflows/test_precommit.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,6 @@ jobs:
146146
python -m pip install --upgrade pip
147147
python -m pip install model_api/python/[ovms,tests]
148148
python -c "from model_api.models import DetectionModel; DetectionModel.create_model('ssd_mobilenet_v1_fpn_coco').save('ovms_models/ssd_mobilenet_v1_fpn_coco/1/ssd_mobilenet_v1_fpn_coco.xml')"
149-
docker run -d --rm -v $GITHUB_WORKSPACE/ovms_models/:/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest --model_path /models/ssd_mobilenet_v1_fpn_coco/ --model_name ssd_mobilenet_v1_fpn_coco --port 9000 --rest_port 8000 --log_level DEBUG --target_device CPU
149+
docker run -d --rm -v $GITHUB_WORKSPACE/ovms_models/:/models -p 8000:8000 openvino/model_server:latest --model_path /models/ssd_mobilenet_v1_fpn_coco/ --model_name ssd_mobilenet_v1_fpn_coco --rest_port 8000 --log_level DEBUG --target_device CPU
150150
python tests/cpp/precommit/prepare_data.py -d data -p tests/cpp/precommit/public_scope.json
151151
python examples/python/serving_api/run.py data/coco128/images/train2017/000000000009.jpg # detects 4 objects

examples/python/serving_api/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This example demonstrates how to use a Python API of OpenVINO Model API for a re
2828
- Run docker with OVMS server:
2929

3030
```bash
31-
docker run -d -v /home/user/models:/models -p 9000:9000 openvino/model_server:latest --model_path /models/ssd_mobilenet_v1_fpn_coco --model_name ssd_mobilenet_v1_fpn_coco --port 9000 --shape auto --nireq 4 --target_device CPU --plugin_config "{\"CPU_THROUGHPUT_STREAMS\": \"CPU_THROUGHPUT_AUTO\"}"
31+
docker run -d -v /home/user/models:/models -p 8000:8000 openvino/model_server:latest --model_path /models/ssd_mobilenet_v1_fpn_coco --model_name ssd_mobilenet_v1_fpn_coco --rest_port 8000 --nireq 4 --target_device CPU
3232
```
3333

3434
## Run example

examples/python/serving_api/run.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def main():
2020

2121
# Create Object Detection model specifying the OVMS server URL
2222
model = DetectionModel.create_model(
23-
"localhost:9000/models/ssd_mobilenet_v1_fpn_coco", model_type="ssd"
23+
"localhost:8000/v2/models/ssd_mobilenet_v1_fpn_coco", model_type="ssd"
2424
)
2525
detections = model(image)
2626
print(f"Detection results: {detections}")

model_api/python/model_api/adapters/ovms_adapter.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ The `OVMSAdapter` implements `InferenceAdapter` interface. The `OVMSAdapter` mak
77
`OVMSAdapter` enables inference via gRPC calls to OpenVINO Model Server, so in order to use it you need two things:
88

99
- OpenVINO Model Server that serves your model
10-
- [`ovmsclient`](https://pypi.org/project/ovmsclient/) package installed to enable communication with the model server: `python3 -m pip install ovmsclient`
10+
- [`tritonclient[http]`](https://pypi.org/project/tritonclient/) package installed to enable communication with the model server: `python3 -m pip install tritonclient[http]`
1111

1212
### Deploy OpenVINO Model Server
1313

1414
Model Server is distributed as a docker image and it's available in DockerHub, so you can use it with `docker run` command. See [model server documentation](https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md) to learn how to deploy OpenVINO optimized models with OpenVINO Model Server.
1515

1616
## Model configuration
1717

18-
When using OpenVINO Model Server model cannot be directly accessed from the client application (like OMZ demos). Therefore any configuration must be done on model server side or before starting the server: see [Prepare a model for `InferenceAdapter`](../../../../../README.md#prepare-a-model-for-inferenceadapter).
18+
When using OpenVINO Model Server model cannot be directly accessed from the client application. Therefore any configuration must be done on model server side or before starting the server: see [Prepare a model for `InferenceAdapter`](../../../../../README.md#prepare-a-model-for-inferenceadapter).
1919

2020
### Input reshaping
2121

@@ -51,8 +51,8 @@ To run the demo with model served in OpenVINO Model Server, you would have to pr
5151

5252
Assuming that model server runs on the same machine as the demo, exposes gRPC service on port 9000 and serves model called `model1`, the value of `-m` parameter would be:
5353

54-
- `localhost:9000/models/model1` - requesting latest model version
55-
- `localhost:9000/models/model1:2` - requesting model version number 2
54+
- `localhost:9000/v2/models/model1` - requesting latest model version
55+
- `localhost:9000/v2/models/model1:2` - requesting model version number 2
5656

5757
## See Also
5858

model_api/python/model_api/adapters/ovms_adapter.py

Lines changed: 105 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -11,75 +11,104 @@
1111
import numpy as np
1212

1313
from .inference_adapter import InferenceAdapter, Metadata
14-
from .utils import Layout
14+
from .utils import Layout, get_rt_info_from_dict
1515

1616

1717
class OVMSAdapter(InferenceAdapter):
18-
"""Class that allows working with models served by the OpenVINO Model Server"""
18+
"""Inference adapter that allows working with models served by the OpenVINO Model Server"""
1919

2020
def __init__(self, target_model: str):
21-
"""Expected format: <address>:<port>/models/<model_name>[:<model_version>]"""
22-
import ovmsclient
21+
"""
22+
Initializes OVMS adapter.
23+
24+
Args:
25+
target_model (str): Model URL. Expected format: <address>:<port>/v2/models/<model_name>[:<model_version>]
26+
"""
27+
import tritonclient.http as httpclient
2328

2429
service_url, self.model_name, self.model_version = _parse_model_arg(
2530
target_model,
2631
)
27-
self.client = ovmsclient.make_grpc_client(url=service_url)
28-
_verify_model_available(self.client, self.model_name, self.model_version)
32+
self.client = httpclient.InferenceServerClient(service_url)
33+
if not self.client.is_model_ready(self.model_name, self.model_version):
34+
msg = f"Requested model: {self.model_name}, version: {self.model_version} is not accessible"
35+
raise RuntimeError(msg)
2936

3037
self.metadata = self.client.get_model_metadata(
3138
model_name=self.model_name,
3239
model_version=self.model_version,
3340
)
41+
self.inputs = self.get_input_layers()
42+
43+
def get_input_layers(self) -> dict[str, Metadata]:
44+
"""
45+
Retrieves information about remote model's inputs.
3446
35-
def get_input_layers(self):
47+
Returns:
48+
dict[str, Metadata]: metadata for each input.
49+
"""
3650
return {
37-
name: Metadata(
38-
{name},
51+
meta["name"]: Metadata(
52+
{meta["name"]},
3953
meta["shape"],
4054
Layout.from_shape(meta["shape"]),
41-
_tf2ov_precision.get(meta["dtype"], meta["dtype"]),
55+
meta["datatype"],
4256
)
43-
for name, meta in self.metadata["inputs"].items()
57+
for meta in self.metadata["inputs"]
4458
}
4559

46-
def get_output_layers(self):
60+
def get_output_layers(self) -> dict[str, Metadata]:
61+
"""
62+
Retrieves information about remote model's outputs.
63+
64+
Returns:
65+
dict[str, Metadata]: metadata for each output.
66+
"""
4767
return {
48-
name: Metadata(
49-
{name},
68+
meta["name"]: Metadata(
69+
{meta["name"]},
5070
shape=meta["shape"],
51-
precision=_tf2ov_precision.get(meta["dtype"], meta["dtype"]),
71+
precision=meta["datatype"],
5272
)
53-
for name, meta in self.metadata["outputs"].items()
73+
for meta in self.metadata["outputs"]
5474
}
5575

56-
def infer_sync(self, dict_data):
57-
inputs = _prepare_inputs(dict_data, self.metadata["inputs"])
58-
raw_result = self.client.predict(
59-
inputs,
76+
def infer_sync(self, dict_data: dict) -> dict:
77+
"""
78+
Performs the synchronous model inference. The infer is a blocking method.
79+
80+
Args:
81+
dict_data (dict): data for each input layer.
82+
83+
Returns:
84+
dict: model raw outputs.
85+
"""
86+
inputs = _prepare_inputs(dict_data, self.inputs)
87+
raw_result = self.client.infer(
6088
model_name=self.model_name,
6189
model_version=self.model_version,
90+
inputs=inputs,
6291
)
63-
# For models with single output ovmsclient returns ndarray with results,
64-
# so the dict must be created to correctly implement interface.
65-
if isinstance(raw_result, np.ndarray):
66-
output_name = next(iter(self.metadata["outputs"].keys()))
67-
return {output_name: raw_result}
68-
return raw_result
69-
70-
def infer_async(self, dict_data, callback_data):
71-
inputs = _prepare_inputs(dict_data, self.metadata["inputs"])
72-
raw_result = self.client.predict(
73-
inputs,
92+
93+
inference_results = {}
94+
for output in self.metadata["outputs"]:
95+
inference_results[output["name"]] = raw_result.as_numpy(output["name"])
96+
97+
return inference_results
98+
99+
def infer_async(self, dict_data: dict, callback_data: Any):
100+
"""A stub method imitating async inference with a blocking call."""
101+
inputs = _prepare_inputs(dict_data, self.inputs)
102+
raw_result = self.client.infer(
74103
model_name=self.model_name,
75104
model_version=self.model_version,
105+
inputs=inputs,
76106
)
77-
# For models with single output ovmsclient returns ndarray with results,
78-
# so the dict must be created to correctly implement interface.
79-
if isinstance(raw_result, np.ndarray):
80-
output_name = next(iter(self.metadata["outputs"].keys()))
81-
raw_result = {output_name: raw_result}
82-
self.callback_fn(raw_result, (lambda x: x, callback_data))
107+
inference_results = {}
108+
for output in self.metadata["outputs"]:
109+
inference_results[output["name"]] = raw_result.as_numpy(output["name"])
110+
111+
self.callback_fn(inference_results, (lambda x: x, callback_data))
83112

84113
def set_callback(self, callback_fn: Callable):
85114
self.callback_fn = callback_fn
@@ -118,97 +147,84 @@ def embed_preprocessing(
118147
):
119148
pass
120149

121-
def reshape_model(self, new_shape):
122-
raise NotImplementedError
123-
124-
def get_rt_info(self, path):
125-
msg = "OVMSAdapter does not support RT info getting"
150+
def reshape_model(self, new_shape: dict):
151+
"""OVMS adapter can not modify the remote model. This method raises an exception."""
152+
msg = "OVMSAdapter does not support model reshaping"
126153
raise NotImplementedError(msg)
127154

155+
def get_rt_info(self, path: list[str]) -> Any:
156+
"""Returns an attribute stored in model info."""
157+
return get_rt_info_from_dict(self.metadata["rt_info"], path)
158+
128159
def update_model_info(self, model_info: dict[str, Any]):
160+
"""OVMS adapter can not update the source model info. This method raises an exception."""
129161
msg = "OVMSAdapter does not support updating model info"
130162
raise NotImplementedError(msg)
131163

132164
def save_model(self, path: str, weights_path: str | None = None, version: str | None = None):
165+
"""OVMS adapter can not retrieve the source model. This method raises an exception."""
133166
msg = "OVMSAdapter does not support saving a model"
134167
raise NotImplementedError(msg)
135168

136169

137-
_tf2ov_precision = {
138-
"DT_INT64": "I64",
139-
"DT_UINT64": "U64",
140-
"DT_FLOAT": "FP32",
141-
"DT_UINT32": "U32",
142-
"DT_INT32": "I32",
143-
"DT_HALF": "FP16",
144-
"DT_INT16": "I16",
145-
"DT_INT8": "I8",
146-
"DT_UINT8": "U8",
147-
}
148-
149-
150-
_tf2np_precision = {
151-
"DT_INT64": np.int64,
152-
"DT_UINT64": np.uint64,
153-
"DT_FLOAT": np.float32,
154-
"DT_UINT32": np.uint32,
155-
"DT_INT32": np.int32,
156-
"DT_HALF": np.float16,
157-
"DT_INT16": np.int16,
158-
"DT_INT8": np.int8,
159-
"DT_UINT8": np.uint8,
170+
_triton2np_precision = {
171+
"INT64": np.int64,
172+
"UINT64": np.uint64,
173+
"FLOAT": np.float32,
174+
"UINT32": np.uint32,
175+
"INT32": np.int32,
176+
"HALF": np.float16,
177+
"INT16": np.int16,
178+
"INT8": np.int8,
179+
"UINT8": np.uint8,
180+
"FP32": np.float32,
160181
}
161182

162183

163184
def _parse_model_arg(target_model: str):
185+
"""Parses OVMS model URL."""
164186
if not isinstance(target_model, str):
165187
msg = "target_model must be str"
166188
raise TypeError(msg)
167189
# Expected format: <address>:<port>/models/<model_name>[:<model_version>]
168190
if not re.fullmatch(
169-
r"(\w+\.*\-*)*\w+:\d+\/models\/[a-zA-Z0-9._-]+(\:\d+)*",
191+
r"(\w+\.*\-*)*\w+:\d+\/v2/models\/[a-zA-Z0-9._-]+(\:\d+)*",
170192
target_model,
171193
):
172194
msg = "invalid --model option format"
173195
raise ValueError(msg)
174-
service_url, _, model = target_model.split("/")
196+
service_url, _, _, model = target_model.split("/")
175197
model_spec = model.split(":")
176198
if len(model_spec) == 1:
177199
# model version not specified - use latest
178-
return service_url, model_spec[0], 0
200+
return service_url, model_spec[0], ""
179201
if len(model_spec) == 2:
180-
return service_url, model_spec[0], int(model_spec[1])
181-
msg = "invalid target_model format"
202+
return service_url, model_spec[0], model_spec[1]
203+
msg = "Invalid target_model format"
182204
raise ValueError(msg)
183205

184206

185-
def _verify_model_available(client, model_name, model_version):
186-
import ovmsclient
207+
def _prepare_inputs(dict_data: dict, inputs_meta: dict[str, Metadata]):
208+
"""Converts raw model inputs into OVMS-specific representation."""
209+
import tritonclient.http as httpclient
187210

188-
version = "latest" if model_version == 0 else model_version
189-
try:
190-
model_status = client.get_model_status(model_name, model_version)
191-
except ovmsclient.ModelNotFoundError as e:
192-
msg = f"Requested model: {model_name}, version: {version} has not been found"
193-
raise RuntimeError(msg) from e
194-
target_version = max(model_status.keys())
195-
version_status = model_status[target_version]
196-
if version_status["state"] != "AVAILABLE" or version_status["error_code"] != 0:
197-
msg = f"Requested model: {model_name}, version: {version} is not in available state"
198-
raise RuntimeError(msg)
199-
200-
201-
def _prepare_inputs(dict_data, inputs_meta):
202-
inputs = {}
211+
inputs = []
203212
for input_name, input_data in dict_data.items():
204213
if input_name not in inputs_meta:
205214
msg = "Input data does not match model inputs"
206215
raise ValueError(msg)
207216
input_info = inputs_meta[input_name]
208-
model_precision = _tf2np_precision[input_info["dtype"]]
217+
model_precision = _triton2np_precision[input_info.precision]
209218
if isinstance(input_data, np.ndarray) and input_data.dtype != model_precision:
210219
input_data = input_data.astype(model_precision)
211220
elif isinstance(input_data, list):
212221
input_data = np.array(input_data, dtype=model_precision)
213-
inputs[input_name] = input_data
222+
223+
infer_input = httpclient.InferInput(
224+
input_name,
225+
input_data.shape,
226+
input_info.precision,
227+
)
228+
infer_input.set_data_from_numpy(input_data)
229+
inputs.append(infer_input)
214230
return inputs

model_api/python/model_api/models/model.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ def create_model(
175175
if isinstance(model, InferenceAdapter):
176176
inference_adapter = model
177177
elif isinstance(model, str) and re.compile(
178-
r"(\w+\.*\-*)*\w+:\d+\/models\/[a-zA-Z0-9._-]+(\:\d+)*",
178+
r"(\w+\.*\-*)*\w+:\d+\/v2/models\/[a-zA-Z0-9._-]+(\:\d+)*",
179179
).fullmatch(model):
180180
inference_adapter = OVMSAdapter(model)
181181
else:
@@ -268,8 +268,7 @@ def _load_config(self, config: dict[str, Any]) -> None:
268268
self.__setattr__(name, value)
269269
except RuntimeError as error:
270270
missing_rt_info = "Cannot get runtime attribute. Path to runtime attribute is incorrect." in str(error)
271-
is_OVMSAdapter = str(error) == "OVMSAdapter does not support RT info getting"
272-
if not missing_rt_info and not is_OVMSAdapter:
271+
if not missing_rt_info:
273272
raise
274273

275274
for name, value in config.items():

model_api/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ dependencies = [
3434

3535
[project.optional-dependencies]
3636
ovms = [
37-
"ovmsclient",
37+
"tritonclient[http]",
3838
]
3939
tests = [
4040
"pre-commit",

0 commit comments

Comments
 (0)