You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add example of Bert-Base/Large inference serving with Triton Server with IPEX backend.
* Apply requested README.md changes.
* Add additional infromation to README.
---------
Co-authored-by: Mikolaj Zyczynski <[email protected]>
# Serving BERT models with Triton Server and Intel® Extension for PyTorch optimizations
2
+
3
+
## Description
4
+
This sample provide code to integrate Intel® Extension for PyTorch with Triton Inference Server framework. This project provides custom Python backend for Intel® Extension for PyTorch and additional dynamic batching algorithm to improve the performance. This code can be used as performance benchmark for Bert-Base and Bert-Large models.
5
+
6
+

7
+
8
+
## Preparation
9
+
Make sure that Docker is installed on both host and client instance.
10
+
In case of running on two separate instances edit config.properties and provide required variables.
11
+
## Supported models
12
+
Currently AI Inference samples support following Bert models finetuned on Squad dataset:
13
+
- bert_base - PyTorch+Intel® Extension for PyTorch [Bert Base uncased](https://huggingface.co/csarron/bert-base-uncased-squad-v1"Bert Base uncased")
14
+
- bert_large - PyTorch+Intel® Extension for PyTorch [Bert Large uncased](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad"Bert Large uncased")
15
+
16
+
## Possible run scenarios
17
+
AI Inference samples allow user to run inference on localhost or on remote Triton Server Host.
18
+
By default config.properties is filled with localhost run option.
19
+
### Execution on localhost
20
+
To build, start Docker containers, run tests, stop and do cleanup on localhost execute scripts in following order:
21
+
22
+
`$ bash build.sh` - builds Docker image for Triton Server Client and Host with name specified in config.properties
23
+
24
+
`$ bash start.sh` - runs Docker containers for Triton Server Client and Host for model specified in config.properties
25
+
26
+
`$ bash run_test.sh` - sends requests to Triton Server Host for model specified in config.properties. Values for sequence length, number of iterations, run mode can be passed as an argument.
27
+
28
+
`$ sudo bash stop.sh` - stops Docker containers for Triton Server Client and Host for model, and removes temporary files.
29
+
30
+
### Execution on two separate instances
31
+
32
+
##### DISCLAIMER: This deployment is designed to be carried out on two distinct machines.
33
+
Make sure that IP address for Triton Server Host instance is provided in config.properties on instance with Triton Server Client.
34
+
35
+
Scripts to run on client Triton Server Host instance:
36
+
37
+
`$ bash build.sh host` - builds Docker image for Triton Server Host with name specified in config.properties
38
+
39
+
`$ bash start.sh host` - runs Docker container for localhost Triton Server Host for model specified in config.properties
40
+
41
+
`$ bash stop.sh host` - (**run after inference is finished**) stops Docker container for Triton Server Host and removes temporary files.
42
+
43
+
Scripts to run on client Triton Server Client instance:
44
+
45
+
`$ bash build.sh client` - builds Docker image for Triton Server Client with name specified in config.properties
46
+
47
+
`$ bash start.sh client` - runs Docker container for Triton Server Client for model specified in config.properties
48
+
49
+
`$ bash run_test.sh` - sends requests to remote Triton Server Host for model specified in config.properties. Values for sequence length, number of iterations, run mode can be passed as an argument.
50
+
51
+
`$ bash stop.sh client` - (**run after inference is finished**) stops Docker container for Triton Server Client.
52
+
53
+
## Additional info
54
+
Downloading and loading models take some time, so please wait until you run run_test.sh script.
55
+
Model loading progress can be tracked by following Triton Server Host docker container logs.
56
+
57
+
## License
58
+
AI Inference samples project is licensed under Apache License Version 2.0. Refer to the [LICENSE](../LICENSE) file for the full license text and copyright notice.
59
+
60
+
This distribution includes third party software governed by separate license terms.
61
+
62
+
3-clause BSD license:
63
+
-[model.py](./model_utils/bert_common/1/model.py) - for Intel® Extension for PyTorch optimized workload
64
+
65
+
This third party software, even if included with the distribution of the Intel software, may be governed by separate license terms, including without limitation, third party license terms, other Intel software license terms, and open source software license terms. These separate license terms govern your use of the third party programs as set forth in the [THIRD-PARTY-PROGRAMS](./THIRD-PARTY-PROGRAMS) file.
66
+
67
+
## Trademark Information
68
+
Intel, the Intel logo and Intel Xeon are trademarks of Intel Corporation or its subsidiaries.
69
+
* Other names and brands may be claimed as the property of others.
0 commit comments