|
1 |
| -# `PyTorch* Inference Optimizations with Advanced Matrix Extensions Bfloat16 Integer8` Sample |
| 1 | +# PyTorch* Inference Optimizations with Advanced Matrix Extensions Bfloat16 Integer8 Sample |
2 | 2 |
|
3 | 3 | The `PyTorch* Inference Optimizations with Advanced Matrix Extensions Bfloat16 Integer8` sample demonstrates how to perform inference using the ResNet50 and BERT models using the Intel® Extension for PyTorch (IPEX).
|
4 | 4 |
|
5 | 5 | The Intel® Extension for PyTorch (IPEX) extends PyTorch* with optimizations for extra performance boost on Intel® hardware. While most of the optimizations will be included in future PyTorch* releases, the extension delivers up-to-date features and optimizations for PyTorch on Intel® hardware. For example, newer optimizations include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).
|
6 | 6 |
|
7 |
| -| Area | Description |
8 |
| -|:--- |:--- |
9 |
| -| What you will learn | Inference performance improvements using Intel® Extension for PyTorch (IPEX) with Intel® AMX BF16/INT8 |
10 |
| -| Time to complete | 5 minutes |
11 |
| -| Category | Code Optimization |
| 7 | +| Property | Description |
| 8 | +|:--- |:--- |
| 9 | +| Category | Code Optimization |
| 10 | +| What you will learn | How to start using Intel® Extension for PyTorch with Intel® AMX BF16/INT8 for inference performance improvements. |
| 11 | +| Time to complete | 5 minutes |
12 | 12 |
|
13 | 13 | ## Purpose
|
14 | 14 |
|
15 |
| -The Intel® Extension for PyTorch (IPEX) allows you to speed up inference on Intel® Xeon Scalable processors with lower precision data formats and specialized computer instructions. The bfloat16 (BF16) data format uses half the bit width of floating-point-32 (FP32), which lessens the amount of memory needed and execution time to process. Likewise, the integer8 (INT8) data format uses half the bit width of BF16. You should notice performance optimization with the Intel® AMX instruction set when compared to Intel® Vector Neural Network Instructions (Intel® VNNI). |
| 15 | +The Intel® Extension for PyTorch* allows you to speed up inference on Intel® Xeon Scalable processors with lower precision data formats and specialized computer instructions. The bfloat16 (BF16) data format uses half the bit width of floating-point-32 (FP32), which lessens the amount of memory needed and execution time to process. Likewise, the integer8 (INT8) data format uses half the bit width of BF16. You should notice performance optimization with the Intel® AMX instruction set when compared to Intel® Vector Neural Network Instructions (Intel® VNNI). |
16 | 16 |
|
17 | 17 | ## Prerequisites
|
18 | 18 |
|
19 | 19 | | Optimized for | Description
|
20 | 20 | |:--- |:---
|
21 |
| -| OS | Ubuntu* 18.04 or newer |
| 21 | +| OS | Ubuntu* 22.04 or newer |
22 | 22 | | Hardware | 4th Gen Intel® Xeon® Scalable Processors or newer
|
23 |
| -| Software | Intel® Extension for PyTorch (IPEX) |
| 23 | +| Software | Intel® Extension for PyTorch* |
24 | 24 |
|
25 |
| -### For Local Development Environments |
26 |
| - |
27 |
| -You will need to download and install the following toolkits, tools, and components to use the sample. |
28 |
| - |
29 |
| -- **Intel® AI Analytics Toolkit (AI Kit)** |
30 |
| - |
31 |
| - You can get the AI Kit from [Intel® AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit-download.html). |
32 |
| - |
33 |
| -### For Intel® DevCloud |
34 |
| - |
35 |
| -The necessary tools and components are already installed in the environment. You do not need to install additional components. See *[Intel® DevCloud for oneAPI](https://DevCloud.intel.com/oneapi/get_started/)* for information. |
| 25 | +> **Note**: AI and Analytics samples are validated on AI Tools Offline Installer. For the full list of validated platforms refer to [Platform Validation](https://github.com/oneapi-src/oneAPI-samples/tree/master?tab=readme-ov-file#platform-validation). |
36 | 26 |
|
37 | 27 | ## Key Implementation Details
|
38 | 28 |
|
39 |
| -This code sample will perform inference on the ResNet50 and BERT models while using Intel® Extension for PyTorch (IPEX). For each pretrained model, there is a warm-up run of 20 samples before running inference on the specified number of samples (i.e. 1000) to record the time. Intel® AMX is supported on BF16 and INT8 data types starting with the 4th Gen Xeon Scalable Processors. The inference time will be compared, which showcases the speedup over FP32 when using VNNI and Intel® AMX on both BF16 and INT8. The following run cases are executed: |
40 |
| - |
41 |
| -1. FP32 (baseline) |
42 |
| -2. BF16 using AVX512_CORE_AMX |
43 |
| -3. INT8 using AVX512_CORE_VNNI |
44 |
| -4. INT8 using AVX512_CORE_AMX |
45 |
| - |
46 |
| -The Intel® oneAPI Deep Neural Network Library (oneDNN) reference guide contains a page about [CPU Dispatcher Control](https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/performance-profiling-and-inspection/cpu-dispatcher-control.html) where you can set the instruction set to AVX-512 and Intel® AMX during runtime. Previous instruction sets are also available. |
47 |
| - |
48 |
| -To run with INT8, the model is quantized using the quantization feature from Intel® Extension for PyTorch (IPEX). TorchScript is also used in all inference run cases to deploy the model in graph mode instead of imperative mode for faster runtime. |
49 |
| - |
50 |
| -The sample tutorial contains one Jupyter Notebook and a Python script. You can use either. |
51 |
| - |
52 |
| -### Jupyter Notebook |
53 |
| - |
54 |
| -| Notebook | Description |
55 |
| -|:--- |:--- |
56 |
| -|`IntelPyTorch_InferenceOptimizations_AMX_BF16_INT8.ipynb` | PyTorch* Inference Optimizations with Advanced Matrix Extensions BF16/INT8 |
57 |
| - |
58 |
| -### Python Scripts |
59 |
| - |
60 |
| -| Script | Description |
61 |
| -|:--- |:--- |
62 |
| -|`pytorch_inference_amx.py` | The script performs inference with Intel® AMX BF16/INT8 and compares the performance against the baseline of FP32 |
63 |
| -|`pytorch_inference_vnni.py` | The script performs inference with VNNI INT8 and compares the performance against the baseline of FP32 |
64 |
| - |
65 |
| -## Set Environment Variables |
66 |
| - |
67 |
| -When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development. |
68 |
| - |
69 |
| -## Run the `PyTorch* Inference Optimizations with Advanced Matrix Extensions Bfloat16 Integer8` Sample |
70 |
| - |
71 |
| -### On Linux* |
| 29 | +- This code sample will perform inference on the ResNet50 and BERT models while using Intel® Extension for PyTorch*. For each pretrained model, there is a warm-up run of 20 samples before running inference on the specified number of samples (i.e. 1000) to record the time. Intel® AMX is supported on BF16 and INT8 data types starting with the 4th Gen Xeon Scalable Processors. The inference time will be compared, which showcases the speedup over FP32 when using VNNI and Intel® AMX on both BF16 and INT8. |
72 | 30 |
|
73 |
| -> **Note**: If you have not already done so, set up your CLI |
74 |
| -> environment by sourcing the `setvars` script in the root of your oneAPI installation. |
75 |
| -> |
76 |
| -> Linux*: |
77 |
| -> - For system wide installations: `. /opt/intel/oneapi/setvars.sh` |
78 |
| -> - For private installations: ` . ~/intel/oneapi/setvars.sh` |
79 |
| -> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'` |
80 |
| -> |
81 |
| -> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*. |
| 31 | +- The following run cases are executed: |
| 32 | + 1. FP32 (baseline) |
| 33 | + 2. BF16 using AVX512_CORE_AMX |
| 34 | + 3. INT8 using AVX512_CORE_VNNI |
| 35 | + 4. INT8 using AVX512_CORE_AMX |
82 | 36 |
|
83 |
| -#### Activate Conda |
| 37 | +- The Intel® oneAPI Deep Neural Network Library (oneDNN) reference guide contains a page about [CPU Dispatcher Control](https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/performance-profiling-and-inspection/cpu-dispatcher-control.html) where you can set the instruction set to AVX-512 and Intel® AMX during runtime. Previous instruction sets are also available. |
84 | 38 |
|
85 |
| -1. Activate the Conda environment. |
86 |
| - ``` |
87 |
| - conda activate pytorch |
88 |
| - ``` |
89 |
| -2. Activate Conda environment without Root access (Optional). |
| 39 | +- To run with INT8, the model is quantized using the quantization feature from Intel® Extension for PyTorch. TorchScript is also used in all inference run cases to deploy the model in graph mode instead of imperative mode for faster runtime. |
90 | 40 |
|
91 |
| - By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it. |
92 |
| -
|
93 |
| - You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following. |
94 |
| -
|
95 |
| - ``` |
96 |
| - conda create --name user_pytorch --clone pytorch |
97 |
| - conda activate user_pytorch |
98 |
| - ``` |
99 |
| -
|
100 |
| -#### Additional Environment Setup |
101 |
| -
|
102 |
| -- **Additional Packages** |
103 |
| -
|
104 |
| - You will need to install these additional packages in *requirements.txt*. |
105 |
| - ``` |
106 |
| - python -m pip install -r requirements.txt |
107 |
| - ``` |
108 |
| -
|
109 |
| -- **Jupyter Kernelspec** |
110 |
| -
|
111 |
| - Add the jupyter kernelspec. This step is essential to ensure the notebook uses the environment you set up. |
112 |
| - ``` |
113 |
| - python -m ipykernel install --user --name=user_pytorch |
114 |
| - ``` |
115 |
| -
|
116 |
| -
|
117 |
| -#### Running the Jupyter Notebook |
118 |
| -
|
119 |
| -1. Change to the sample directory. |
120 |
| -2. Launch Jupyter Notebook. |
121 |
| - ``` |
122 |
| - jupyter notebook --ip=0.0.0.0 --port 8888 --allow-root |
123 |
| - ``` |
124 |
| -3. Follow the instructions to open the URL with the token in your browser. |
125 |
| -4. Locate and select the Notebook. |
126 |
| - ``` |
127 |
| - IntelPyTorch_InferenceOptimizations_AMX_BF16_INT8.ipynb |
128 |
| - ``` |
129 |
| -5. Change your Jupyter Notebook kernel to **user_pytorch**. |
130 |
| -6. Run every cell in the Notebook in sequence. |
131 |
| -
|
132 |
| -#### Running on the Command Line (Optional) |
133 |
| -
|
134 |
| -1. Change to the sample directory. |
135 |
| -2. Run the script. |
136 |
| - ``` |
137 |
| - python pytorch_inference_amx.py |
138 |
| - python pytorch_inference_vnni.py |
139 |
| - ``` |
140 |
| -
|
141 |
| -### Troubleshooting |
142 |
| -
|
143 |
| -If you encounter environment issues, you can create a new conda environment with the desired Python version, then install Intel® Extension for PyTorch (IPEX) for CPU by following these [instructions](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html). Finally, install all packages in *requirements.txt*. |
| 41 | +## Environment Setup |
| 42 | +You will need to download and install the following toolkits, tools, and components to use the sample. |
144 | 43 |
|
145 |
| -If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility. |
| 44 | +**1. Get Intel® AI Tools** |
| 45 | + |
| 46 | +Required AI Tools: Intel® Extension for PyTorch* (CPU) |
| 47 | + |
| 48 | +If you have not already, select and install these Tools via [AI Tools Selector](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-tools-selector.html). AI and Analytics samples are validated on AI Tools Offline Installer. It is recommended to select Offline Installer option in AI Tools Selector. |
| 49 | + |
| 50 | +>**Note**: If Docker option is chosen in AI Tools Selector, refer to [Working with Preset Containers](https://github.com/intel/ai-containers/tree/main/preset) to learn how to run the docker and samples. |
| 51 | +
|
| 52 | +**2. (Offline Installer) Activate the AI Tools bundle base environment** |
| 53 | +If the default path is used during the installation of AI Tools: |
| 54 | +``` |
| 55 | +source $HOME/intel/oneapi/intelpython/bin/activate |
| 56 | +``` |
| 57 | +If a non-default path is used: |
| 58 | +``` |
| 59 | +source <custom_path>/bin/activate |
| 60 | +``` |
| 61 | + |
| 62 | +**3. (Offline Installer) Activate relevant Conda environment** |
| 63 | +``` |
| 64 | +conda activate pytorch |
| 65 | +``` |
| 66 | + |
| 67 | +**4. Clone the GitHub repository** |
| 68 | +``` |
| 69 | +git clone https://github.com/oneapi-src/oneAPI-samples.git |
| 70 | +cd oneAPI-samples/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_InferenceOptimizations_AMX_BF16_INT8 |
| 71 | +``` |
| 72 | + |
| 73 | +**5. Install dependencies** |
| 74 | +>**Note**: Before running the following commands, make sure your Conda/Python environment with AI Tools installed is activated |
| 75 | +
|
| 76 | +``` |
| 77 | +pip install -r requirements.txt |
| 78 | +pip install notebook |
| 79 | +``` |
| 80 | +For Jupyter Notebook, refer to [Installing Jupyter](https://jupyter.org/install) for detailed installation instructions. |
| 81 | + |
| 82 | +## Run the Sample |
| 83 | +>**Note**: Before running the sample, make sure [Environment Setup](https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_InferenceOptimizations_AMX_BF16_INT8#environment-setup) is completed. |
| 84 | +
|
| 85 | +Go to the section which corresponds to the installation method chosen in [AI Tools Selector](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-tools-selector.html) to see relevant instructions: |
| 86 | +* [AI Tools Offline Installer (Validated)](#ai-tools-offline-installer-validated) |
| 87 | +* [Conda/PIP](#condapip) |
| 88 | +* [Docker](#docker) |
| 89 | + |
| 90 | +### AI Tools Offline Installer (Validated) |
| 91 | + |
| 92 | +**1. Register Conda kernel to Jupyter Notebook kernel** |
| 93 | + |
| 94 | +If the default path is used during the installation of AI Tools: |
| 95 | +``` |
| 96 | +$HOME/intel/oneapi/intelpython/envs/pytorch/bin/python -m ipykernel install --user --name=pytorch |
| 97 | +``` |
| 98 | +If a non-default path is used: |
| 99 | +``` |
| 100 | +<custom_path>/bin/python -m ipykernel install --user --name=pytorch |
| 101 | +``` |
| 102 | +**2. Launch Jupyter Notebook** |
| 103 | +``` |
| 104 | +jupyter notebook --ip=0.0.0.0 --port 8888 --allow-root |
| 105 | +``` |
| 106 | +**3. Follow the instructions to open the URL with the token in your browser** |
| 107 | + |
| 108 | +**4. Select the Notebook** |
| 109 | +``` |
| 110 | +IntelPyTorch_InferenceOptimizations_AMX_BF16_INT8.ipynb |
| 111 | +``` |
| 112 | +**5. Change the kernel to `pytorch`** |
| 113 | + |
| 114 | +**6. Run every cell in the Notebook in sequence** |
| 115 | + |
| 116 | +### Conda/PIP |
| 117 | +> **Note**: Before running the instructions below, make sure your Conda/Python environment with AI Tools installed is activated |
| 118 | +
|
| 119 | +**1. Register Conda/Python kernel to Jupyter Notebook kernel** |
| 120 | +For Conda: |
| 121 | +``` |
| 122 | +<CONDA_PATH_TO_ENV>/bin/python -m ipykernel install --user --name=<your-env-name> |
| 123 | +``` |
| 124 | +To know <CONDA_PATH_TO_ENV>, run `conda env list` and find your Conda environment path. |
| 125 | + |
| 126 | +For PIP: |
| 127 | +``` |
| 128 | +python -m ipykernel install --user --name=<your-env-name> |
| 129 | +``` |
| 130 | +**2. Launch Jupyter Notebook** |
| 131 | +``` |
| 132 | +jupyter notebook --ip=0.0.0.0 --port 8888 --allow-root |
| 133 | +``` |
| 134 | +**3. Follow the instructions to open the URL with the token in your browser** |
| 135 | + |
| 136 | +**4. Select the Notebook** |
| 137 | +``` |
| 138 | +IntelPyTorch_InferenceOptimizations_AMX_BF16_INT8.ipynb |
| 139 | +``` |
| 140 | +**5. Change the kernel to `<your-env-name>`** |
| 141 | + |
| 142 | +**6. Run every cell in the Notebook in sequence** |
| 143 | + |
| 144 | +### Docker |
| 145 | +AI Tools Docker images already have Get Started samples pre-installed. Refer to [Working with Preset Containers](https://github.com/intel/ai-containers/tree/main/preset) to learn how to run the docker and samples. |
146 | 146 |
|
147 | 147 | ## Example Output
|
148 | 148 |
|
149 | 149 | If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample will print out the runtimes and charts of relative performance with the FP32 model without any optimizations as the baseline.
|
150 | 150 |
|
151 | 151 | The performance speedups using Intel® AMX BF16 and INT8 are approximate on ResNet50 and BERT. Performance will vary based on your hardware and software versions. To see a larger performance gap between VNNI and Intel® AMX, increase the batch size. For even more speedup, consider using the Intel® Extension for PyTorch (IPEX) [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html).
|
152 | 152 |
|
| 153 | +## Related Samples |
| 154 | + |
| 155 | +* [PyTorch Training Optimizations with Advanced Matrix Extensions Bfloat16](https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16) |
| 156 | +* [Intel PyTorch GPU Inference Optimization with AMP](https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_GPU_InferenceOptimization_with_AMP) |
| 157 | + |
153 | 158 | ## License
|
154 | 159 |
|
155 | 160 | Code samples are licensed under the MIT license. See
|
156 |
| -[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. |
| 161 | +[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) |
| 162 | +for details. |
| 163 | + |
| 164 | +Third party program Licenses can be found here: |
| 165 | +[third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt) |
157 | 166 |
|
158 |
| -Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt) |
| 167 | +*Other names and brands may be claimed as the property of others. [Trademarks](https://www.intel.com/content/www/us/en/legal/trademarks.html) |
0 commit comments