Skip to content

Commit f3fb6e5

Browse files
authored
[Docs] Update Serving README.md. (#22)
Signed-off-by: Tongxuan Liu <[email protected]>
1 parent 9838f1e commit f3fb6e5

File tree

1 file changed

+43
-97
lines changed

1 file changed

+43
-97
lines changed

README.md

Lines changed: 43 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -1,118 +1,64 @@
1-
# TensorFlow Serving
2-
3-
[![Ubuntu Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.svg)](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu.html)
4-
[![Ubuntu Build Status at TF HEAD](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu-tf-head.svg)](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/ubuntu-tf-head.html)
5-
![Docker CPU Nightly Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/docker-cpu-nightly.svg)
6-
![Docker GPU Nightly Build Status](https://storage.googleapis.com/tensorflow-serving-kokoro-build-badges/docker-gpu-nightly.svg)
7-
8-
----
9-
TensorFlow Serving is a flexible, high-performance serving system for
10-
machine learning models, designed for production environments. It deals with
11-
the *inference* aspect of machine learning, taking models after *training* and
12-
managing their lifetimes, providing clients with versioned access via
13-
a high-performance, reference-counted lookup table.
14-
TensorFlow Serving provides out-of-the-box integration with TensorFlow models,
15-
but can be easily extended to serve other types of models and data.
16-
17-
To note a few features:
18-
19-
- Can serve multiple models, or multiple versions of the same model
20-
simultaneously
21-
- Exposes both gRPC as well as HTTP inference endpoints
22-
- Allows deployment of new model versions without changing any client code
23-
- Supports canarying new versions and A/B testing experimental models
24-
- Adds minimal latency to inference time due to efficient, low-overhead
25-
implementation
26-
- Features a scheduler that groups individual inference requests into batches
27-
for joint execution on GPU, with configurable latency controls
28-
- Supports many *servables*: Tensorflow models, embeddings, vocabularies,
29-
feature transformations and even non-Tensorflow-based machine learning
30-
models
31-
32-
## Serve a Tensorflow model in 60 seconds
33-
```bash
34-
# Download the TensorFlow Serving Docker image and repo
35-
docker pull tensorflow/serving
36-
37-
git clone https://github.com/tensorflow/serving
38-
# Location of demo models
39-
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"
40-
41-
# Start TensorFlow Serving container and open the REST API port
42-
docker run -t --rm -p 8501:8501 \
43-
-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
44-
-e MODEL_NAME=half_plus_two \
45-
tensorflow/serving &
1+
# DeepRec Serving
462

47-
# Query the model using the predict API
48-
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
49-
-X POST http://localhost:8501/v1/models/half_plus_two:predict
50-
51-
# Returns => { "predictions": [2.5, 3.0, 4.5] }
52-
```
3+
DeepRec Serving is a high-performance serving system for DeepRec based on TensorFlow Serving.
4+
DeepRec Serving could highly improve performance and cpu/gpu utilization in inference, such as SessionGroup, CUDA multi-stream, etc.
535

54-
## End-to-End Training & Serving Tutorial
6+
Few features in DeepRec Serving:
7+
- Support SessionGroup which is shared-variable (only variables shared) architecture for mutliple session in serving process.
8+
- Support CUDA Multiple Stream, could highly improve QPS and GPU Utilization in GPU Inference.
559

56-
Refer to the official Tensorflow documentations site for [a complete tutorial to train and serve a Tensorflow Model](https://www.tensorflow.org/tfx/tutorials/serving/rest_simple).
10+
## Installation
5711

12+
### Prepare for build
5813

59-
## Documentation
14+
**CPU Dev Docker**
6015

61-
### Set up
16+
| GCC Version | Python Version | IMAGE |
17+
| ----------- | -------------- | --------------------------------------------------------- |
18+
| 9.4.0 | 3.8.10 | alideeprec/deeprec-build:deeprec-dev-cpu-py38-ubuntu20.04 |
6219

63-
The easiest and most straight-forward way of using TensorFlow Serving is with
64-
Docker images. We highly recommend this route unless you have specific needs
65-
that are not addressed by running in a container.
20+
**GPU(cuda11.6) Dev Docker**
6621

67-
* [Install Tensorflow Serving using Docker](tensorflow_serving/g3doc/docker.md)
68-
*(Recommended)*
69-
* [Install Tensorflow Serving without Docker](tensorflow_serving/g3doc/setup.md)
70-
*(Not Recommended)*
71-
* [Build Tensorflow Serving from Source with Docker](tensorflow_serving/g3doc/building_with_docker.md)
72-
* [Deploy Tensorflow Serving on Kubernetes](tensorflow_serving/g3doc/serving_kubernetes.md)
22+
| GCC Version | Python Version | CUDA VERSION | IMAGE |
23+
| ----------- | -------------- | ------------ | --------------------------------------------------------------- |
24+
| 9.4.0 | 3.8.10 | CUDA 11.6.2 | alideeprec/deeprec-build:deeprec-dev-gpu-py38-cu116-ubuntu20.04 |
7325

74-
### Use
26+
### Build from source
7527

76-
#### Export your Tensorflow model
28+
Develop Branch: master, Latest Release Branch: deeprec2302
7729

78-
In order to serve a Tensorflow model, simply export a SavedModel from your
79-
Tensorflow program.
80-
[SavedModel](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md)
81-
is a language-neutral, recoverable, hermetic serialization format that enables
82-
higher-level systems and tools to produce, consume, and transform TensorFlow
83-
models.
30+
**Build Package Builder-CPU**
8431

85-
Please refer to [Tensorflow documentation](https://www.tensorflow.org/guide/saved_model#save_and_restore_models)
86-
for detailed instructions on how to export SavedModels.
87-
88-
#### Configure and Use Tensorflow Serving
89-
90-
* [Follow a tutorial on Serving Tensorflow models](tensorflow_serving/g3doc/serving_basic.md)
91-
* [Configure Tensorflow Serving to make it fit your serving use case](tensorflow_serving/g3doc/serving_config.md)
92-
* Read the [REST API Guide](tensorflow_serving/g3doc/api_rest.md) or [gRPC API definition](https://github.com/tensorflow/serving/tree/master/tensorflow_serving/apis)
93-
* [Use SavedModel Warmup if initial inference requests are slow due to lazy initialization of graph](tensorflow_serving/g3doc/saved_model_warmup.md)
94-
* [If encountering issues regarding model signatures, please read the SignatureDef documentation](tensorflow_serving/g3doc/signature_defs.md)
95-
* If using a model with custom ops, [learn how to serve models with custom ops](tensorflow_serving/g3doc/custom_op.md)
32+
```bash
33+
bazel build -c opt tensorflow_serving/...
34+
```
9635

97-
### Extend
36+
**Build CPU Package Builder with OneDNN + Eigen Threadpool**
9837

99-
Tensorflow Serving's architecture is highly modular. You can use some parts
100-
individually (e.g. batch scheduling) and/or extend it to serve new use cases.
38+
```bash
39+
bazel build -c opt --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true tensorflow_serving/...
40+
```
10141

102-
* [Ensure you are familiar with building Tensorflow Serving](tensorflow_serving/g3doc/building_with_docker.md)
103-
* [Learn about Tensorflow Serving's architecture](tensorflow_serving/g3doc/architecture.md)
104-
* [Explore the Tensorflow Serving C++ API reference](https://www.tensorflow.org/tfx/serving/api_docs/cc/)
105-
* [Create a new type of Servable](tensorflow_serving/g3doc/custom_servable.md)
106-
* [Create a custom Source of Servable versions](tensorflow_serving/g3doc/custom_source.md)
42+
**Build Package Builder-GPU**
10743

108-
## Contribute
44+
```bash
45+
bazel build -c opt --config=cuda tensorflow_serving/...
46+
```
10947

48+
**Build Package**
11049

111-
**If you'd like to contribute to TensorFlow Serving, be sure to review the
112-
[contribution guidelines](CONTRIBUTING.md).**
50+
```bash
51+
bazel-bin/tensorflow_serving/tools/pip_package/build_pip_package /tmp/tf_serving_client_whl
52+
```
11353

54+
**Server Bin**
11455

115-
## For more information
56+
Server Bin would generated in following directory:
57+
```bash
58+
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
59+
```
60+
## More details
11661

117-
Please refer to the official [TensorFlow website](http://tensorflow.org) for
118-
more information.
62+
* [SessionGroup](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/SessionGroup.md)
63+
* [CUDA MultiStream](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/GPU-MultiStream.md)
64+
* [Device Placement Optimization](https://github.com/DeepRec-AI/DeepRec/blob/main/docs/docs_en/Device-Placement.md)

0 commit comments

Comments
 (0)