Skip to content

Commit 7d9f8ee

Browse files
authored
Merge pull request #2432 from othakkar/othakkar/remove_tf_serving_dev
[development] Remove TF Serving
2 parents aa8c9b7 + 9ceed9f commit 7d9f8ee

File tree

4 files changed

+100
-178
lines changed

4 files changed

+100
-178
lines changed

AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_Enabling_Auto_Mixed_Precision_for_TransferLearning/README.md

Lines changed: 65 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The `Enable Auto-Mixed Precision for Transfer Learning with TensorFlow*` sample guides you through the process of enabling auto-mixed precision to use low-precision datatypes, like bfloat16, for transfer learning with TensorFlow* (TF).
44

5-
The sample demonstrates the end-to-end pipeline tasks typically performed in a deep learning use-case: training (and retraining), inference optimization, and serving the model with TensorFlow Serving.
5+
The sample demonstrates the tasks typically performed in a deep learning use-case: training (and retraining), and inference optimization. The sample also includes tips and boilerplate code for serving the model with TensorFlow Serving.
66

77
| Area | Description
88
|:--- |:---
@@ -37,10 +37,6 @@ You will need to download and install the following toolkits, tools, and compone
3737

3838
Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions.
3939

40-
- **TensorFlow Serving**
41-
42-
See *TensorFlow Serving* [*Installation*](https://www.tensorflow.org/tfx/serving/setup) for detailed installation options.
43-
4440
- **Other dependencies**
4541

4642
Install using PIP and the `requirements.txt` file supplied with the sample: `$pip install -r requirements.txt --no-deps`. <br> The `requirements.txt` file contains the necessary dependencies to run the Notebook.
@@ -112,6 +108,70 @@ You will see diagrams comparing performance and analysis. This includes performa
112108
113109
For performance analysis, you will see histograms showing different Tensorflow* operations in the analyzed pre-trained model pb file.
114110
111+
## Serve the model with TensorFlow Serving
112+
113+
### Installation
114+
See *TensorFlow Serving* [*Installation*](https://www.tensorflow.org/tfx/serving/setup) for detailed installation options.
115+
116+
### Example Code
117+
118+
Create a copy of the optimized model in a well-defined directory hierarchy with a version number "1".
119+
120+
```
121+
!mkdir serving
122+
!cp -r models/my_optimized_model serving/1
123+
```
124+
125+
```
126+
os.environ["MODEL_DIR"] = os.getcwd() + "/serving"
127+
```
128+
129+
This is where we start running TensorFlow Serving and load our model. After it loads we can start making inference requests using REST. There are some important parameters:
130+
- **rest_api_port**: The port that you'll use for REST requests.
131+
- **model_name**: You'll use this in the URL of REST requests. It can be anything.
132+
- **model_base_path**: This is the path to the directory where you've saved your model.
133+
134+
```
135+
%%bash --bg
136+
nohup tensorflow_model_server --rest_api_port=8501 --model_name=rn50 --model_base_path=${MODEL_DIR} > server.log 2>&1
137+
```
138+
139+
#### Prepare the testing data for prediction
140+
141+
```
142+
for image_batch, labels_batch in val_ds:
143+
print(image_batch.shape)
144+
print(labels_batch.shape)
145+
break
146+
test_data, test_labels = image_batch.numpy(), labels_batch.numpy()
147+
```
148+
149+
#### Make REST requests
150+
151+
Now let's create the JSON object for a batch of three inference requests and we'll send a predict request as a POST to our server's REST endpoint, and pass it three examples.
152+
153+
```
154+
import json
155+
import matplotlib.pyplot as plt
156+
157+
def show(idx, title):
158+
plt.figure()
159+
plt.imshow(test_data[idx])
160+
plt.axis('off')
161+
plt.title('\n\n{}'.format(title), fontdict={'size': 16})
162+
163+
data = json.dumps({"signature_name": "serving_default", "instances": test_data[0:3].tolist()})
164+
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
165+
166+
headers = {"content-type": "application/json"}
167+
json_response = requests.post('http://localhost:8501/v1/models/rn50:predict', data=data, headers=headers)
168+
predictions = json.loads(json_response.text)['predictions']
169+
170+
for i in range(0,3):
171+
show(i, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
172+
class_names[np.argmax(predictions[i])], np.argmax(predictions[i]), class_names[test_labels[i]], test_labels[i]))
173+
```
174+
115175
## License
116176
117177
Code samples are licensed under the MIT license. See

AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_Enabling_Auto_Mixed_Precision_for_TransferLearning/enabling_automixed_precision_for_transfer_learning_with_tensorflow.ipynb

Lines changed: 31 additions & 170 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@
3232
"import tensorflow_hub as hub\n",
3333
"from datetime import datetime\n",
3434
"import requests\n",
35-
"from copy import deepcopy\n",
3635
"print(\"We are using Tensorflow version: \", tf.__version__)"
3736
]
3837
},
@@ -443,19 +442,33 @@
443442
"id": "8a03faef",
444443
"metadata": {},
445444
"source": [
446-
"Let's measure the performance of the model we just saved using the `tf_benchmark.py` script that runs inference on dummy data."
445+
"Let's measure the performance of the model we just saved using the `tf_benchmark.py` script that runs inference on dummy data.\n",
446+
"\n",
447+
"_Note: We only use the auto-mixed precision policy if the underlying system is the 4th Gen Intel® Xeon® scalable processor (codenamed Sapphire Rapids)_"
448+
]
449+
},
450+
{
451+
"cell_type": "code",
452+
"execution_count": null,
453+
"id": "db6aa4b4",
454+
"metadata": {},
455+
"outputs": [],
456+
"source": [
457+
"if arch == 'SPR':\n",
458+
" PRECISION = \"bfloat16\"\n",
459+
"else:\n",
460+
" PRECISION = \"float32\"\n",
461+
"print(\"Precision for inference: \", PRECISION)"
447462
]
448463
},
449464
{
450465
"cell_type": "code",
451466
"execution_count": null,
452467
"id": "fd855747",
453-
"metadata": {
454-
"scrolled": false
455-
},
468+
"metadata": {},
456469
"outputs": [],
457470
"source": [
458-
"run scripts/tf_benchmark.py --model_path models/my_saved_model --num_warmup 5 --num_iter 50 --precision float32 --batch_size 32 --disable_optimize"
471+
"!python scripts/tf_benchmark.py --model_path models/my_saved_model --num_warmup 5 --num_iter 50 --precision PRECISION --batch_size 32 --disable_optimize"
459472
]
460473
},
461474
{
@@ -486,7 +499,7 @@
486499
"metadata": {},
487500
"outputs": [],
488501
"source": [
489-
"run scripts/freeze_optimize_v2.py --input_saved_model_dir=models/my_saved_model --output_saved_model_dir=models/my_optimized_model"
502+
"!python scripts/freeze_optimize_v2.py --input_saved_model_dir=models/my_saved_model --output_saved_model_dir=models/my_optimized_model"
490503
]
491504
},
492505
{
@@ -501,12 +514,10 @@
501514
"cell_type": "code",
502515
"execution_count": null,
503516
"id": "480dddda",
504-
"metadata": {
505-
"scrolled": false
506-
},
517+
"metadata": {},
507518
"outputs": [],
508519
"source": [
509-
"run scripts/tf_benchmark.py --model_path models/my_optimized_model --num_warmup 5 --num_iter 50 --precision float32 --batch_size 32"
520+
"!python scripts/tf_benchmark.py --model_path models/my_optimized_model --num_warmup 5 --num_iter 50 --precision PRECISION --batch_size 32"
510521
]
511522
},
512523
{
@@ -526,174 +537,24 @@
526537
"metadata": {},
527538
"outputs": [],
528539
"source": [
529-
"run scripts/plot.py"
530-
]
531-
},
532-
{
533-
"cell_type": "markdown",
534-
"id": "8157a5ec",
535-
"metadata": {},
536-
"source": [
537-
"### TensorFlow Serving\n",
538-
"\n",
539-
"In this section, we will initialize and run TensorFlow Serving natively to serve our retrained model."
540-
]
541-
},
542-
{
543-
"cell_type": "code",
544-
"execution_count": null,
545-
"id": "6a00c32d",
546-
"metadata": {},
547-
"outputs": [],
548-
"source": [
549-
"!mkdir serving\n",
550-
"!cp -r models/my_optimized_model serving/1"
551-
]
552-
},
553-
{
554-
"cell_type": "code",
555-
"execution_count": null,
556-
"id": "a45b5438",
557-
"metadata": {},
558-
"outputs": [],
559-
"source": [
560-
"os.environ[\"MODEL_DIR\"] = os.getcwd() + \"/serving\""
561-
]
562-
},
563-
{
564-
"cell_type": "markdown",
565-
"id": "edcd77c4",
566-
"metadata": {},
567-
"source": [
568-
"This is where we start running TensorFlow Serving and load our model. After it loads we can start making inference requests using REST. There are some important parameters:\n",
569-
"- **rest_api_port**: The port that you'll use for REST requests.\n",
570-
"- **model_name**: You'll use this in the URL of REST requests. It can be anything.\n",
571-
"- **model_base_path**: This is the path to the directory where you've saved your model."
572-
]
573-
},
574-
{
575-
"cell_type": "code",
576-
"execution_count": null,
577-
"id": "34aee14f",
578-
"metadata": {},
579-
"outputs": [],
580-
"source": [
581-
"%%bash --bg\n",
582-
"nohup tensorflow_model_server --rest_api_port=8501 --model_name=rn50 --model_base_path=${MODEL_DIR} > server.log 2>&1"
583-
]
584-
},
585-
{
586-
"cell_type": "code",
587-
"execution_count": null,
588-
"id": "e486894a",
589-
"metadata": {},
590-
"outputs": [],
591-
"source": [
592-
"!tail server.log"
593-
]
594-
},
595-
{
596-
"cell_type": "markdown",
597-
"id": "7dc7606d",
598-
"metadata": {},
599-
"source": [
600-
"**Prepare the testing data for prediction**"
540+
"!python scripts/plot.py"
601541
]
602542
},
603543
{
604544
"cell_type": "code",
605545
"execution_count": null,
606-
"id": "c9dfa9d8",
546+
"id": "7c1bd119-ffc1-4761-a614-c2ffd83e6b4c",
607547
"metadata": {},
608548
"outputs": [],
609-
"source": [
610-
"for image_batch, labels_batch in val_ds:\n",
611-
" print(image_batch.shape)\n",
612-
" print(labels_batch.shape)\n",
613-
" break\n",
614-
"test_data, test_labels = image_batch.numpy(), labels_batch.numpy()"
615-
]
616-
},
617-
{
618-
"cell_type": "markdown",
619-
"id": "5d4e5f62",
620-
"metadata": {},
621-
"source": [
622-
"First, let's take a look at a random example from our test data."
623-
]
624-
},
625-
{
626-
"cell_type": "code",
627-
"execution_count": null,
628-
"id": "e2761dcf",
629-
"metadata": {},
630-
"outputs": [],
631-
"source": [
632-
"import matplotlib.pyplot as plt\n",
633-
"\n",
634-
"def show(idx, title):\n",
635-
" plt.figure()\n",
636-
" plt.imshow(test_data[idx])\n",
637-
" plt.axis('off')\n",
638-
" plt.title('\\n\\n{}'.format(title), fontdict={'size': 16})\n",
639-
"\n",
640-
"import random\n",
641-
"rando = random.randint(0,test_data.shape[0]-1)\n",
642-
"show(rando, 'An Example Image:')"
643-
]
644-
},
645-
{
646-
"cell_type": "markdown",
647-
"id": "3b362658",
648-
"metadata": {},
649-
"source": [
650-
"#### Make a request to your model in TensorFlow Serving\n",
651-
"\n",
652-
"Now let's create the JSON object for a batch of three inference requests, and see how well our model recognizes things:"
653-
]
654-
},
655-
{
656-
"cell_type": "code",
657-
"execution_count": null,
658-
"id": "831bf2d1",
659-
"metadata": {
660-
"scrolled": true
661-
},
662-
"outputs": [],
663-
"source": [
664-
"import json\n",
665-
"data = json.dumps({\"signature_name\": \"serving_default\", \"instances\": test_data[0:3].tolist()})\n",
666-
"print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))"
667-
]
668-
},
669-
{
670-
"cell_type": "markdown",
671-
"id": "427f3c8b",
672-
"metadata": {},
673-
"source": [
674-
"#### Make REST requests\n",
675-
"\n",
676-
"We'll send a predict request as a POST to our server's REST endpoint, and pass it three examples."
677-
]
678-
},
679-
{
680-
"cell_type": "code",
681-
"execution_count": null,
682-
"id": "3d7f5e5e",
683-
"metadata": {},
684-
"outputs": [],
685-
"source": [
686-
"headers = {\"content-type\": \"application/json\"}\n",
687-
"json_response = requests.post('http://localhost:8501/v1/models/rn50:predict', data=data, headers=headers)\n",
688-
"predictions = json.loads(json_response.text)['predictions']\n",
689-
"\n",
690-
"for i in range(0,3):\n",
691-
" show(i, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(\n",
692-
" class_names[np.argmax(predictions[i])], np.argmax(predictions[i]), class_names[test_labels[i]], test_labels[i]))"
693-
]
549+
"source": []
694550
}
695551
],
696552
"metadata": {
553+
"kernelspec": {
554+
"display_name": "Python 3 (ipykernel)",
555+
"language": "python",
556+
"name": "python3"
557+
},
697558
"language_info": {
698559
"codemirror_mode": {
699560
"name": "ipython",
@@ -704,7 +565,7 @@
704565
"name": "python",
705566
"nbconvert_exporter": "python",
706567
"pygments_lexer": "ipython3",
707-
"version": "3.8.12"
568+
"version": "3.10.12"
708569
}
709570
},
710571
"nbformat": 4,
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
notebook
1+
neural_compressor==2.4.1
22
Pillow
3-
tensorflow_hub
3+
py-cpuinfo
44
requests
5+
tensorflow_hub==0.16.0

AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_Enabling_Auto_Mixed_Precision_for_TransferLearning/scripts/tf_benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ def run_benchmark(model_details, args, find_graph_def):
190190
throughput = 1.0 / avg_time * args.batch_size
191191
print('Batch size = %d' % args.batch_size)
192192
print("Latency: {:.3f} ms".format(latency))
193-
print("Throughput: {:.2f} fps".format(throughput))
193+
print("Throughput: {:.2f} images per sec".format(throughput))
194194

195195
# Logging to a file
196196
log_file = open("log.txt", "a")

0 commit comments

Comments
 (0)