-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Description
I’m using triton python_backend to run the pytorch example in the python_backend repo. I packaged the pytroch dependencies into a conda environment and can
load the model successfully. However when running the client inference script provided in the repo, I encounter the following error when trying to get the output from the httpclient response. It seems that the response is empty.
ValueError Traceback (most recent call last)
Cell In[20], line 33
30 response = client.infer(model_name, inputs, request_id=str(1), outputs=outputs)
32 result = response.get_response()
---> 33 output0_data = response.as_numpy("OUTPUT0")
34 output1_data = response.as_numpy("OUTPUT1")
36 print(
37 "INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format(
38 input0_data, input1_data, output0_data
39 )
40 )
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python310/lib/python3.10/site-packages/tritonclient/http/_infer_result.py:208, in InferResult.as_numpy(self, name)
204 if not has_binary_data:
205 np_array = np.array(
206 output["data"], dtype=triton_to_np_dtype(datatype)
207 )
--> 208 np_array = np_array.reshape(output["shape"])
209 return np_array
210 return None
ValueError: cannot reshape array of size 0 into shape (4,)
I tried running the add_sub example and the response.as_numpy("OUTPUT0") worked fine with the expected output.
Triton Information
What version of Triton are you using?
server_version 2.41.0
Are you using the Triton container or did you build it yourself?
I’m using a sagemaker docker image for triton server: 763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:23.12-py3
To Reproduce
The model.py
import json
# triton_python_backend_utils is available in every Triton Python model. You
# need to use this module to create inference requests and responses. It also
# contains some utility functions for extracting information from model_config
# and converting Triton input/output types to numpy types.
import triton_python_backend_utils as pb_utils
from torch import nn
class AddSubNet(nn.Module):
"""
Simple AddSub network in PyTorch. This network outputs the sum and
subtraction of the inputs.
"""
def __init__(self):
super(AddSubNet, self).__init__()
def forward(self, input0, input1):
return (input0 + input1), (input0 - input1)
class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
that is created must have "TritonPythonModel" as the class name.
"""
def initialize(self, args):
"""`initialize` is called only once when the model is being loaded.
Implementing `initialize` function is optional. This function allows
the model to initialize any state associated with this model.
Parameters
----------
args : dict
Both keys and values are strings. The dictionary keys and values are:
* model_config: A JSON string containing the model configuration
* model_instance_kind: A string containing model instance kind
* model_instance_device_id: A string containing model instance device ID
* model_repository: Model repository path
* model_version: Model version
* model_name: Model name
"""
# You must parse model_config. JSON string is not parsed here
self.model_config = model_config = json.loads(args["model_config"])
# Get OUTPUT0 configuration
output0_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT0")
# Get OUTPUT1 configuration
output1_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT1")
# Convert Triton types to numpy types
self.output0_dtype = pb_utils.triton_string_to_numpy(
output0_config["data_type"]
)
self.output1_dtype = pb_utils.triton_string_to_numpy(
output1_config["data_type"]
)
# Instantiate the PyTorch model
self.add_sub_model = AddSubNet()
def execute(self, requests):
"""`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
argument. This function is called when an inference is requested
for this model. Depending on the batching configuration (e.g. Dynamic
Batching) used, `requests` may contain multiple requests. Every
Python model, must create one pb_utils.InferenceResponse for every
pb_utils.InferenceRequest in `requests`. If there is an error, you can
set the error argument when creating a pb_utils.InferenceResponse.
Parameters
----------
requests : list
A list of pb_utils.InferenceRequest
Returns
-------
list
A list of pb_utils.InferenceResponse. The length of this list must
be the same as `requests`
"""
output0_dtype = self.output0_dtype
output1_dtype = self.output1_dtype
responses = []
# Every Python backend must iterate over everyone of the requests
# and create a pb_utils.InferenceResponse for each of them.
for request in requests:
# Get INPUT0
in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
# Get INPUT1
in_1 = pb_utils.get_input_tensor_by_name(request, "INPUT1")
out_0, out_1 = self.add_sub_model(in_0.as_numpy(), in_1.as_numpy())
# Create output tensors. You need pb_utils.Tensor
# objects to create pb_utils.InferenceResponse.
out_tensor_0 = pb_utils.Tensor("OUTPUT0", out_0.astype(output0_dtype))
out_tensor_1 = pb_utils.Tensor("OUTPUT1", out_1.astype(output1_dtype))
# Create InferenceResponse. You can set an error here in case
# there was a problem with handling this inference request.
# Below is an example of how you can set errors in inference
# response:
#
# pb_utils.InferenceResponse(
# output_tensors=..., TritonError("An error occurred"))
inference_response = pb_utils.InferenceResponse(
output_tensors=[out_tensor_0, out_tensor_1]
)
responses.append(inference_response)
# You should return a list of pb_utils.InferenceResponse. Length
# of this list must match the length of `requests` list.
return responses
def finalize(self):
"""`finalize` is called only once when the model is being unloaded.
Implementing `finalize` function is optional. This function allows
the model to perform any necessary clean ups before exit.
"""
print("Cleaning up...")
config.pbtxt:
name: "add_sub"
backend: "python"
input [
{
name: "INPUT0"
data_type: TYPE_FP32
dims: [ 4 ]
}
]
input [
{
name: "INPUT1"
data_type: TYPE_FP32
dims: [ 4 ]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_FP32
dims: [ 4 ]
}
]
output [
{
name: "OUTPUT1"
data_type: TYPE_FP32
dims: [ 4 ]
}
]
instance_group [{ kind: KIND_CPU }]
The client.py:
import sys
import numpy as np
import tritonclient.http as httpclient
from tritonclient.utils import *
model_name = "add_sub"
shape = [4]
with httpclient.InferenceServerClient("localhost:8000") as client:
input0_data = np.random.rand(*shape).astype(np.float32)
input1_data = np.random.rand(*shape).astype(np.float32)
inputs = [
httpclient.InferInput(
"INPUT0", input0_data.shape, np_to_triton_dtype(input0_data.dtype)
),
httpclient.InferInput(
"INPUT1", input1_data.shape, np_to_triton_dtype(input1_data.dtype)
),
]
inputs[0].set_data_from_numpy(input0_data)
inputs[1].set_data_from_numpy(input1_data)
outputs = [
httpclient.InferRequestedOutput("OUTPUT0"),
httpclient.InferRequestedOutput("OUTPUT1"),
]
response = client.infer(model_name, inputs, request_id=str(1), outputs=outputs)
result = response.get_response()
output0_data = response.as_numpy("OUTPUT0")
output1_data = response.as_numpy("OUTPUT1")
print(
"INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format(
input0_data, input1_data, output0_data
)
)
print(
"INPUT0 ({}) - INPUT1 ({}) = OUTPUT1 ({})".format(
input0_data, input1_data, output1_data
)
)
if not np.allclose(input0_data + input1_data, output0_data):
print("add_sub example error: incorrect sum")
sys.exit(1)
if not np.allclose(input0_data - input1_data, output1_data):
print("add_sub example error: incorrect difference")
sys.exit(1)
print("PASS: add_sub")
sys.exit(0)
Expected behavior
I expect that the client script
output0_data = response.as_numpy("OUTPUT0")
output1_data = response.as_numpy("OUTPUT1")
will convert the output tensors in the http response to numpy array.