Added dynamic shapes and deployment on triton examples (#293)

tanayvarshney · web-flow · commit bd74e793d0fe · 2022-06-15T17:11:12.000-04:00
diff --git a/tftrt/examples/deploying_on_triton/README.md b/tftrt/examples/deploying_on_triton/README.md
@@ -0,0 +1,85 @@
+# Tensorflow with TensorRT (TF-TRT) to Triton
+
+This README showcases how to deploy a simple ResNet model accelerated by using Tensorflow-TensorRT on Triton Inference Server.
+
+## Step 1: Optimize your model with TensorFlow-TensorRT
+
+If you are unfamiliar with Tensorflow-TensorRT, please refer to this [video](https://www.youtube.com/watch?v=w7871kMiAs8&ab_channel=NVIDIADeveloper). The first step in this pipeline is to accelerate your model. If you use TensorFlow as your framework of choice for training, you can either use TensorRT or TensorFlow-TensorRT, depending on your model's operations.
+
+For using Tensorflow-TensorRT, let's first pull the NGC TensorFlow Docker container, which comes installed with both TensorRT and Tensorflow-TensorRT. You may need to create an account and get the API key from [here](https://ngc.nvidia.com/setup/). Sign up and login with your key (follow the instructions [here](https://ngc.nvidia.com/setup/api-key) after signing up).
+
+```
+# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Tensorflow 
+# container; eg. 22.04
+
+docker run -it --gpus all -v /path/to/this/folder:/resnet50_eg nvcr.io/nvidia/tensorflow:<xx.xx>-tf2-py3
+```
+
+We have already made a sample to use Tensorflow-TensorRT: `tf_trt_resnet50.py`. This sample downloads a ResNet model from Keras and then optimizes it with TensorFlow-TensorRT. For more examples, visit the TF-TRT [Github Repository](https://github.com/tensorflow/tensorrt).
+
+```
+python tf_trt_resnet50.py
+
+# you can exit out of this container now
+exit
+```
+
+## Step 2: Set Up Triton Inference Server
+
+If you are new to the Triton Inference Server and want to learn more, we highly recommend checking out our [Github Repository](https://github.com/triton-inference-server).
+
+To use Triton, we need to make a model repository. The structure of the repository should look something like this:
+```
+model_repository
+|
++-- resnet50
+    |
+    +-- config.pbtxt
+    +-- 1
+        |
+        +-- model.savedmodel
+            |
+            +-- saved_model.pb
+            +-- variables
+                |
+                +-- variables.data-00000-of-00001
+                +-- variables.index
+```
+
+A sample model configuration of the model is included with this demo as `config.pbtxt`. If you are new to Triton, we highly encourage you to check out this [section of our documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md) for more details. Once you have the model repository setup, it is time to launch the Triton server! You can do that with the docker command below.
+```
+docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models --backend-config=tensorflow,version=2
+```
+
+## Step 3: Using a Triton Client to Query the Server
+
+Download an example image to test inference.
+
+```
+wget  -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
+```
+
+Install dependencies.
+```
+pip install --upgrade tensorflow
+pip install pillow
+pip install nvidia-pyindex
+pip install tritonclient[all]
+```
+
+Run client
+```
+python3 triton_client.py
+```
+
+The output of the same should look like below:
+```
+[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
+ b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
+ b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
+ b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
+ ...
+ b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
+ b'0.000001:627' b'0.000001:933' b'0.000000:661' b'0.000000:148']
+```
+The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md).
diff --git a/tftrt/examples/deploying_on_triton/config.pbtxt b/tftrt/examples/deploying_on_triton/config.pbtxt
@@ -0,0 +1,43 @@
+# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+name: "resnet50"
+platform: "tensorflow_savedmodel"
+max_batch_size : 0
+input [
+  {
+    name: "input_1"
+    data_type: TYPE_FP32
+    dims: [-1, 224, 224, 3 ]
+  }
+]
+output [
+  {
+    name: "predictions"
+    data_type: TYPE_FP32
+    dims: [-1, 1000]
+  }
+]
diff --git a/tftrt/examples/deploying_on_triton/tf_trt_resnet50.py b/tftrt/examples/deploying_on_triton/tf_trt_resnet50.py
@@ -0,0 +1,40 @@
+# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import tensorflow as tf
+from tensorflow.python.compiler.tensorrt import trt_convert as trt
+from tensorflow.keras.applications.resnet50 import ResNet50
+
+# Load model0
+model = ResNet50(weights='imagenet')
+model.save('resnet50_saved_model') 
+
+# Optimize with tftrt
+converter = trt.TrtGraphConverterV2(input_saved_model_dir='resnet50_saved_model', precision_mode=trt.TrtPrecisionMode.FP32, max_workspace_size_bytes=8000000000)
+converter.convert()
+
+# Save the model
+converter.save(output_saved_model_dir='resnet50_saved_model_TFTRT_FP32')
diff --git a/tftrt/examples/deploying_on_triton/triton_client.py b/tftrt/examples/deploying_on_triton/triton_client.py
@@ -0,0 +1,55 @@
+# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+from tensorflow.keras.preprocessing import image
+from tensorflow.keras.applications.resnet50 import preprocess_input
+import tensorflow as tf
+
+import numpy as np
+import tritonclient.http as httpclient
+from tritonclient.utils import triton_to_np_dtype
+
+def process_image(image_path="img1.jpg"):
+    img = image.load_img(image_path, target_size=(224, 224))
+    x = image.img_to_array(img)
+    x = np.expand_dims(x, axis=0)
+    return preprocess_input(x)
+
+transformed_img = process_image()
+
+# Setting up client
+triton_client = httpclient.InferenceServerClient(url="localhost:8000")
+
+test_input = httpclient.InferInput("input_1", transformed_img.shape, datatype="FP32")
+test_input.set_data_from_numpy(transformed_img, binary_data=True)
+
+test_output = httpclient.InferRequestedOutput("predictions", binary_data=True, class_count=1000)
+
+# Querying the server
+results = triton_client.infer(model_name="resnet50", inputs=[test_input], outputs=[test_output])
+
+test_output_fin = results.as_numpy('predictions')
+print(test_output_fin)
diff --git a/tftrt/examples/dynamic_shapes.ipynb b/tftrt/examples/dynamic_shapes.ipynb