You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `Intel® Neural Compressor (INC) TensorFlow* Getting Started*` Sample demonstrates using the Intel® Neural Compressor (INC), which is part of the Intel® AI Tools with the with Intel® Optimizations for TensorFlow* to speed up inference by simplifying the process of converting the FP32 model to INT8/BF16.
3
+
The `Intel® Neural Compressor TensorFlow* Getting Started*` Sample demonstrates using the Intel® Neural Compressor, which is part of the Intel® AI Tools with the with Intel® Optimizations for TensorFlow* to speed up inference by simplifying the process of converting the FP32 model to INT8/BF16.
4
4
5
-
| Area | Description
5
+
| Property | Description
6
6
|:--- |:---
7
-
| What you will learn | How to use Intel® Neural Compressor (INC) tool to quantize the AI model based on TensorFlow* and speed up the inference on Intel® Xeon® CPUs
8
-
| Time to complete | 10 minutes
9
7
| Category | Getting Started
8
+
| What you will learn | How to use Intel® Neural Compressor tool to quantize the AI model based on TensorFlow* and speed up the inference on Intel® Xeon® CPUs
9
+
| Time to complete | 10 minutes
10
+
10
11
11
12
## Purpose
12
13
13
-
This sample shows the process of building a convolutional neural network (CNN) model to recognize handwritten numbers and demonstrates how to increase the inference performance by using Intel® Neural Compressor (INC). Low-precision optimizations can speed up inference. Intel® Neural Compressor (INC) simplifies the process of converting the FP32 model to INT8/BF16. At the same time, Intel® Neural Compressor (INC) tunes the quantization method to reduce the accuracy loss, which is a big blocker for low-precision inference.
14
+
This sample shows the process of building a convolutional neural network (CNN) model to recognize handwritten numbers and demonstrates how to increase the inference performance by using Intel® Neural Compressor. Low-precision optimizations can speed up inference. Intel® Neural Compressor simplifies the process of converting the FP32 model to INT8/BF16. At the same time, Intel® Neural Compressor tunes the quantization method to reduce the accuracy loss, which is a big blocker for low-precision inference.
14
15
15
16
You can achieve higher inference performance by converting the FP32 model to INT8 or BF16 model. Additionally, Intel® Deep Learning Boost (Intel® DL Boost) in Intel® Xeon® Scalable processors and Xeon® processors provides hardware acceleration for INT8 and BF16 models.
16
17
17
-
You will learn how to train a CNN model with Keras and TensorFlow*, use Intel® Neural Compressor (INC) to quantize the model, and compare the performance to see the benefit of Intel® Neural Compressor (INC).
18
+
You will learn how to train a CNN model with Keras and TensorFlow*, use Intel® Neural Compressor to quantize the model, and compare the performance to see the benefit of Intel® Neural Compressor.
18
19
19
20
## Prerequisites
20
21
21
22
| Optimized for | Description
22
23
|:--- |:---
23
24
| OS | Ubuntu* 20.04 (or newer) <br> Windows 11, 10*
| Software | Intel® Neural Compressor, Intel Optimization for TensorFlow
26
27
27
-
### Intel® Neural Compressor (INC) and Sample Code Versions
28
+
### Intel® Neural Compressor and Sample Code Versions
28
29
29
-
>**Note**: See the [Intel® Neural Compressor (INC)](https://github.com/intel/neural-compressor) GitHub repository for more information and recent changes.
30
+
>**Note**: See the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) GitHub repository for more information and recent changes.
30
31
31
-
This sample is updated regularly to match the Intel® Neural Compressor (INC) version in the latest Intel® AI Tools release. If you want to get the sample code for an earlier toolkit release, check out the corresponding git tag.
32
+
This sample is updated regularly to match the Intel® Neural Compressor version in the latest Intel® AI Tools release. If you want to get the sample code for an earlier toolkit release, check out the corresponding git tag.
32
33
33
34
1. List the available git tags.
34
35
```
@@ -63,14 +64,14 @@ You will need to download and install the following toolkits, tools, and compone
63
64
The sample demonstrates how to:
64
65
65
66
- Use Keras from TensorFlow* to build and train a CNN model.
66
-
- Define a function and class for Intel® Neural Compressor (INC) to
67
+
- Define a function and class for Intel® Neural Compressor to
67
68
quantize the CNN model.
68
-
- The Intel® Neural Compressor (INC) can run on any Intel® CPU to quantize the AI model.
69
+
- The Intel® Neural Compressor can run on any Intel® CPU to quantize the AI model.
69
70
- The quantized AI model has better inference performance than the FP32 model on Intel CPUs.
70
71
- Specifically, the latest Intel® Xeon® Scalable processors and Xeon® processors provide hardware acceleration for such tasks.
71
72
- Test the performance of the FP32 model and INT8 (quantization) model.
72
73
73
-
## Prepare the Environment
74
+
## Environment Setup
74
75
If you have already set up the PIP or Conda environment and installed AI Tools go directly to Run the Notebook.
75
76
76
77
### On Linux* (Only applicable to AI Tools Offline Installer)
@@ -79,9 +80,13 @@ If you have already set up the PIP or Conda environment and installed AI Tools g
79
80
80
81
When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
81
82
82
-
#### Activate Conda
83
+
#### Setup Conda Environment
83
84
84
-
You can list the available conda environments using a command similar to the following
85
+
You can list the available conda environments using a command similar to the following.
86
+
87
+
##### Option 1: Clone Conda Environment from AI Toolkit Conda Environment
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)* or *[Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html)*.
You should see log print and images showing the performance comparison with absolute and relative data and analysis between FP32 and INT8.
241
+
242
+
Following is an example. Your data should be different with them.
243
+
244
+
```
245
+
#absolute data
246
+
throughputs_times [1, 2.51508607887295]
247
+
latencys_times [1, 0.38379207710795576]
248
+
accuracys_times [0, -0.009999999999990905]
249
+
250
+
#relative data
251
+
throughputs_times [1, 2.51508607887295]
252
+
latencys_times [1, 0.38379207710795576]
253
+
accuracys_times [0, -0.009999999999990905]
254
+
```
255
+
256
+

257
+

215
258
216
259
#### Troubleshooting
217
260
218
261
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility.
219
262
263
+
## Related Samples
264
+
265
+
[Pytorch `Getting Started with Intel® Neural Compressor for Quantization` Sample](../INC-Quantization-Sample-for-PyTorch)
0 commit comments