[Docs] Update Serving README.md. (#22)

liutongxuan · web-flow · commit f3fb6e5980d3 · 2023-05-29T17:41:53.000+08:00
Signed-off-by: Tongxuan Liu &lt;tongxuan.ltx@alibaba-inc.com&gt;
diff --git a/README.md b/README.md
@@ -1,118 +1,64 @@
-# TensorFlow Serving
-
-[![Ubuntu Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.svg)](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.html)
-[![Ubuntu Build Status at TF HEAD](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu-tf-head.svg)](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu-tf-head.html)
-![Docker CPU Nightly Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/docker-cpu-nightly.svg)
-![Docker GPU Nightly Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/docker-gpu-nightly.svg)
-
-----
-TensorFlow Serving is a flexible, high-performance serving system for
-machine learning models, designed for production environments. It deals with
-the *inference* aspect of machine learning, taking models after *training* and
-managing their lifetimes, providing clients with versioned access via
-a high-performance, reference-counted lookup table.
-TensorFlow Serving provides out-of-the-box integration with TensorFlow models,
-but can be easily extended to serve other types of models and data.
-
-To note a few features:
-
--   Can serve multiple models, or multiple versions of the same model
-    simultaneously
--   Exposes both gRPC as well as HTTP inference endpoints
--   Allows deployment of new model versions without changing any client code
--   Supports canarying new versions and A/B testing experimental models
--   Adds minimal latency to inference time due to efficient, low-overhead
-    implementation
--   Features a scheduler that groups individual inference requests into batches
-    for joint execution on GPU, with configurable latency controls
--   Supports many *servables*: Tensorflow models, embeddings, vocabularies,
-    feature transformations and even non-Tensorflow-based machine learning
-    models
-
-## Serve a Tensorflow model in 60 seconds
-```bash
-# Download the TensorFlow Serving Docker image and repo
-docker pull tensorflow/serving
-
-git clone https://github.com/tensorflow/serving
-# Location of demo models
-TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"
-
-# Start TensorFlow Serving container and open the REST API port
-docker run -t --rm -p 8501:8501 \
-    -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
-    -e MODEL_NAME=half_plus_two \
-    tensorflow/serving &
+# DeepRec Serving
 
-# Query the model using the predict API
-curl -d '{"instances": [1.0, 2.0, 5.0]}' \
-    -X POST http://localhost:8501/v1/models/half_plus_two:predict
-
-# Returns => { "predictions": [2.5, 3.0, 4.5] }
-```
+DeepRec Serving is a high-performance serving system for DeepRec based on TensorFlow Serving.
+DeepRec Serving could highly improve performance and cpu/gpu utilization in inference, such as SessionGroup, CUDA multi-stream, etc.
 
-## End-to-End Training & Serving Tutorial
+Few features in DeepRec Serving:
+-   Support SessionGroup which is shared-variable (only variables shared) architecture for mutliple session in serving process.
+-   Support CUDA Multiple Stream, could highly improve QPS and GPU Utilization in GPU Inference.
 
-Refer to the official Tensorflow documentations site for [a complete tutorial to train and serve a Tensorflow Model](https://www.tensorflow.org/tfx/tutorials/serving/rest_simple).
+## Installation
 
+### Prepare for build
 
-## Documentation
+**CPU Dev Docker**
 
-### Set up
+| GCC Version | Python Version |                           IMAGE                           |
+| ----------- | -------------- | --------------------------------------------------------- |
+|   9.4.0     |    3.8.10      | alideeprec/deeprec-build:deeprec-dev-cpu-py38-ubuntu20.04 |
 
-The easiest and most straight-forward way of using TensorFlow Serving is with
-Docker images. We highly recommend this route unless you have specific needs
-that are not addressed by running in a container.
+**GPU(cuda11.6) Dev Docker**
 
-*   [Install Tensorflow Serving using Docker](tensorflow_serving/g3doc/docker.md)
-    *(Recommended)*
-*   [Install Tensorflow Serving without Docker](tensorflow_serving/g3doc/setup.md)
-    *(Not Recommended)*
-*   [Build Tensorflow Serving from Source with Docker](tensorflow_serving/g3doc/building_with_docker.md)
-*   [Deploy Tensorflow Serving on Kubernetes](tensorflow_serving/g3doc/serving_kubernetes.md)
+| GCC Version | Python Version | CUDA VERSION |                           IMAGE                                 |
+| ----------- | -------------- | ------------ | --------------------------------------------------------------- |
+|    9.4.0    |    3.8.10      | CUDA 11.6.2  | alideeprec/deeprec-build:deeprec-dev-gpu-py38-cu116-ubuntu20.04 |
 
-### Use
+### Build from source
 
-#### Export your Tensorflow model
+Develop Branch: master, Latest Release Branch: deeprec2302
 
-In order to serve a Tensorflow model, simply export a SavedModel from your
-Tensorflow program.
-[SavedModel](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md)
-is a language-neutral, recoverable, hermetic serialization format that enables
-higher-level systems and tools to produce, consume, and transform TensorFlow
-models.
+**Build Package Builder-CPU**
 
-Please refer to [Tensorflow documentation](https://www.tensorflow.org/guide/saved_model#save_and_restore_models)
-for detailed instructions on how to export SavedModels.
-
-#### Configure and Use Tensorflow Serving
-
-* [Follow a tutorial on Serving Tensorflow models](tensorflow_serving/g3doc/serving_basic.md)
-* [Configure Tensorflow Serving to make it fit your serving use case](tensorflow_serving/g3doc/serving_config.md)
-* Read the [REST API Guide](tensorflow_serving/g3doc/api_rest.md) or [gRPC API definition](https://github.com/tensorflow/serving/tree/master/tensorflow_serving/apis)
-* [Use SavedModel Warmup if initial inference requests are slow due to lazy initialization of graph](tensorflow_serving/g3doc/saved_model_warmup.md)
-* [If encountering issues regarding model signatures, please read the SignatureDef documentation](tensorflow_serving/g3doc/signature_defs.md)
-* If using a model with custom ops, [learn how to serve models with custom ops](tensorflow_serving/g3doc/custom_op.md)
+```bash
+bazel build -c opt tensorflow_serving/...
+```
 
-### Extend
+**Build CPU Package Builder with OneDNN + Eigen Threadpool**
 
-Tensorflow Serving's architecture is highly modular. You can use some parts
-individually (e.g. batch scheduling) and/or extend it to serve new use cases.
+```bash
+bazel build  -c opt --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true tensorflow_serving/...
+```
 
-* [Ensure you are familiar with building Tensorflow Serving](tensorflow_serving/g3doc/building_with_docker.md)
-* [Learn about Tensorflow Serving's architecture](tensorflow_serving/g3doc/architecture.md)
-* [Explore the Tensorflow Serving C++ API reference](https://www.tensorflow.org/tfx/serving/api_docs/cc/)
-* [Create a new type of Servable](tensorflow_serving/g3doc/custom_servable.md)
-* [Create a custom Source of Servable versions](tensorflow_serving/g3doc/custom_source.md)
+**Build Package Builder-GPU**
 
-## Contribute
+```bash
+bazel build -c opt --config=cuda tensorflow_serving/...
+```
 
+**Build Package**
 
-**If you'd like to contribute to TensorFlow Serving, be sure to review the
-[contribution guidelines](CONTRIBUTING.md).**
+```bash
+bazel-bin/tensorflow_serving/tools/pip_package/build_pip_package /tmp/tf_serving_client_whl
+```
 
+**Server Bin**
 
-## For more information
+Server Bin would generated in following directory:
+```bash
+bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
+```
+## More details 
 
-Please refer to the official [TensorFlow website](http://tensorflow.org) for
-more information.
+* [SessionGroup](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/SessionGroup.md)
+* [CUDA MultiStream](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/GPU-MultiStream.md)
+* [Device Placement Optimization](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/Device-Placement.md)