|
1 | | -# TensorFlow Serving |
2 | | - |
3 | | -[](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.html) |
4 | | -[](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu-tf-head.html) |
5 | | - |
6 | | - |
7 | | - |
8 | | ----- |
9 | | -TensorFlow Serving is a flexible, high-performance serving system for |
10 | | -machine learning models, designed for production environments. It deals with |
11 | | -the *inference* aspect of machine learning, taking models after *training* and |
12 | | -managing their lifetimes, providing clients with versioned access via |
13 | | -a high-performance, reference-counted lookup table. |
14 | | -TensorFlow Serving provides out-of-the-box integration with TensorFlow models, |
15 | | -but can be easily extended to serve other types of models and data. |
16 | | - |
17 | | -To note a few features: |
18 | | - |
19 | | -- Can serve multiple models, or multiple versions of the same model |
20 | | - simultaneously |
21 | | -- Exposes both gRPC as well as HTTP inference endpoints |
22 | | -- Allows deployment of new model versions without changing any client code |
23 | | -- Supports canarying new versions and A/B testing experimental models |
24 | | -- Adds minimal latency to inference time due to efficient, low-overhead |
25 | | - implementation |
26 | | -- Features a scheduler that groups individual inference requests into batches |
27 | | - for joint execution on GPU, with configurable latency controls |
28 | | -- Supports many *servables*: Tensorflow models, embeddings, vocabularies, |
29 | | - feature transformations and even non-Tensorflow-based machine learning |
30 | | - models |
31 | | - |
32 | | -## Serve a Tensorflow model in 60 seconds |
33 | | -```bash |
34 | | -# Download the TensorFlow Serving Docker image and repo |
35 | | -docker pull tensorflow/serving |
36 | | - |
37 | | -git clone https://github.com/tensorflow/serving |
38 | | -# Location of demo models |
39 | | -TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata" |
40 | | - |
41 | | -# Start TensorFlow Serving container and open the REST API port |
42 | | -docker run -t --rm -p 8501:8501 \ |
43 | | - -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \ |
44 | | - -e MODEL_NAME=half_plus_two \ |
45 | | - tensorflow/serving & |
| 1 | +# DeepRec Serving |
46 | 2 |
|
47 | | -# Query the model using the predict API |
48 | | -curl -d '{"instances": [1.0, 2.0, 5.0]}' \ |
49 | | - -X POST http://localhost:8501/v1/models/half_plus_two:predict |
50 | | - |
51 | | -# Returns => { "predictions": [2.5, 3.0, 4.5] } |
52 | | -``` |
| 3 | +DeepRec Serving is a high-performance serving system for DeepRec based on TensorFlow Serving. |
| 4 | +DeepRec Serving could highly improve performance and cpu/gpu utilization in inference, such as SessionGroup, CUDA multi-stream, etc. |
53 | 5 |
|
54 | | -## End-to-End Training & Serving Tutorial |
| 6 | +Few features in DeepRec Serving: |
| 7 | +- Support SessionGroup which is shared-variable (only variables shared) architecture for mutliple session in serving process. |
| 8 | +- Support CUDA Multiple Stream, could highly improve QPS and GPU Utilization in GPU Inference. |
55 | 9 |
|
56 | | -Refer to the official Tensorflow documentations site for [a complete tutorial to train and serve a Tensorflow Model](https://www.tensorflow.org/tfx/tutorials/serving/rest_simple). |
| 10 | +## Installation |
57 | 11 |
|
| 12 | +### Prepare for build |
58 | 13 |
|
59 | | -## Documentation |
| 14 | +**CPU Dev Docker** |
60 | 15 |
|
61 | | -### Set up |
| 16 | +| GCC Version | Python Version | IMAGE | |
| 17 | +| ----------- | -------------- | --------------------------------------------------------- | |
| 18 | +| 9.4.0 | 3.8.10 | alideeprec/deeprec-build:deeprec-dev-cpu-py38-ubuntu20.04 | |
62 | 19 |
|
63 | | -The easiest and most straight-forward way of using TensorFlow Serving is with |
64 | | -Docker images. We highly recommend this route unless you have specific needs |
65 | | -that are not addressed by running in a container. |
| 20 | +**GPU(cuda11.6) Dev Docker** |
66 | 21 |
|
67 | | -* [Install Tensorflow Serving using Docker](tensorflow_serving/g3doc/docker.md) |
68 | | - *(Recommended)* |
69 | | -* [Install Tensorflow Serving without Docker](tensorflow_serving/g3doc/setup.md) |
70 | | - *(Not Recommended)* |
71 | | -* [Build Tensorflow Serving from Source with Docker](tensorflow_serving/g3doc/building_with_docker.md) |
72 | | -* [Deploy Tensorflow Serving on Kubernetes](tensorflow_serving/g3doc/serving_kubernetes.md) |
| 22 | +| GCC Version | Python Version | CUDA VERSION | IMAGE | |
| 23 | +| ----------- | -------------- | ------------ | --------------------------------------------------------------- | |
| 24 | +| 9.4.0 | 3.8.10 | CUDA 11.6.2 | alideeprec/deeprec-build:deeprec-dev-gpu-py38-cu116-ubuntu20.04 | |
73 | 25 |
|
74 | | -### Use |
| 26 | +### Build from source |
75 | 27 |
|
76 | | -#### Export your Tensorflow model |
| 28 | +Develop Branch: master, Latest Release Branch: deeprec2302 |
77 | 29 |
|
78 | | -In order to serve a Tensorflow model, simply export a SavedModel from your |
79 | | -Tensorflow program. |
80 | | -[SavedModel](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) |
81 | | -is a language-neutral, recoverable, hermetic serialization format that enables |
82 | | -higher-level systems and tools to produce, consume, and transform TensorFlow |
83 | | -models. |
| 30 | +**Build Package Builder-CPU** |
84 | 31 |
|
85 | | -Please refer to [Tensorflow documentation](https://www.tensorflow.org/guide/saved_model#save_and_restore_models) |
86 | | -for detailed instructions on how to export SavedModels. |
87 | | - |
88 | | -#### Configure and Use Tensorflow Serving |
89 | | - |
90 | | -* [Follow a tutorial on Serving Tensorflow models](tensorflow_serving/g3doc/serving_basic.md) |
91 | | -* [Configure Tensorflow Serving to make it fit your serving use case](tensorflow_serving/g3doc/serving_config.md) |
92 | | -* Read the [REST API Guide](tensorflow_serving/g3doc/api_rest.md) or [gRPC API definition](https://github.com/tensorflow/serving/tree/master/tensorflow_serving/apis) |
93 | | -* [Use SavedModel Warmup if initial inference requests are slow due to lazy initialization of graph](tensorflow_serving/g3doc/saved_model_warmup.md) |
94 | | -* [If encountering issues regarding model signatures, please read the SignatureDef documentation](tensorflow_serving/g3doc/signature_defs.md) |
95 | | -* If using a model with custom ops, [learn how to serve models with custom ops](tensorflow_serving/g3doc/custom_op.md) |
| 32 | +```bash |
| 33 | +bazel build -c opt tensorflow_serving/... |
| 34 | +``` |
96 | 35 |
|
97 | | -### Extend |
| 36 | +**Build CPU Package Builder with OneDNN + Eigen Threadpool** |
98 | 37 |
|
99 | | -Tensorflow Serving's architecture is highly modular. You can use some parts |
100 | | -individually (e.g. batch scheduling) and/or extend it to serve new use cases. |
| 38 | +```bash |
| 39 | +bazel build -c opt --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true tensorflow_serving/... |
| 40 | +``` |
101 | 41 |
|
102 | | -* [Ensure you are familiar with building Tensorflow Serving](tensorflow_serving/g3doc/building_with_docker.md) |
103 | | -* [Learn about Tensorflow Serving's architecture](tensorflow_serving/g3doc/architecture.md) |
104 | | -* [Explore the Tensorflow Serving C++ API reference](https://www.tensorflow.org/tfx/serving/api_docs/cc/) |
105 | | -* [Create a new type of Servable](tensorflow_serving/g3doc/custom_servable.md) |
106 | | -* [Create a custom Source of Servable versions](tensorflow_serving/g3doc/custom_source.md) |
| 42 | +**Build Package Builder-GPU** |
107 | 43 |
|
108 | | -## Contribute |
| 44 | +```bash |
| 45 | +bazel build -c opt --config=cuda tensorflow_serving/... |
| 46 | +``` |
109 | 47 |
|
| 48 | +**Build Package** |
110 | 49 |
|
111 | | -**If you'd like to contribute to TensorFlow Serving, be sure to review the |
112 | | -[contribution guidelines](CONTRIBUTING.md).** |
| 50 | +```bash |
| 51 | +bazel-bin/tensorflow_serving/tools/pip_package/build_pip_package /tmp/tf_serving_client_whl |
| 52 | +``` |
113 | 53 |
|
| 54 | +**Server Bin** |
114 | 55 |
|
115 | | -## For more information |
| 56 | +Server Bin would generated in following directory: |
| 57 | +```bash |
| 58 | +bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server |
| 59 | +``` |
| 60 | +## More details |
116 | 61 |
|
117 | | -Please refer to the official [TensorFlow website](http://tensorflow.org) for |
118 | | -more information. |
| 62 | +* [SessionGroup](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/SessionGroup.md) |
| 63 | +* [CUDA MultiStream](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/GPU-MultiStream.md) |
| 64 | +* [Device Placement Optimization](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/Device-Placement.md) |
0 commit comments