Skip to content

Commit f0ab805

Browse files
WIP
1 parent 222a367 commit f0ab805

File tree

1 file changed

+221
-0
lines changed

1 file changed

+221
-0
lines changed
Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
PyTorch 2 Export Quantization with NNCF quantization and OpenVINO runtime
2+
===========================================================================
3+
4+
**Author**: dlyakhov, asuslov, aamir, # TODO: add required authors
5+
6+
Introduction
7+
--------------
8+
9+
This tutorial introduces the steps for utilizing the `Neural Network Compression Framework (nncf) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ to generate a quantized model customized
10+
for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
11+
12+
The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
13+
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
14+
OpenVINO is the new backend that compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
15+
16+
The quantization flow mainly includes three steps:
17+
18+
- Step 1: OpenVINO and NNCF installation.
19+
- Step 2: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
20+
- Step 3: Apply the Quantization flow based on the captured FX Graph.
21+
- Step 4: Lower the quantized model into OpenVINO representation with the API ``torch.compile``.
22+
23+
The high-level architecture of this flow could look like this:
24+
25+
::
26+
27+
float_model(Python) Example Input
28+
\ /
29+
\ /
30+
—--------------------------------------------------------
31+
| export |
32+
—--------------------------------------------------------
33+
|
34+
FX Graph in ATen
35+
|
36+
|
37+
—--------------------------------------------------------
38+
| nncf.quantize |
39+
—--------------------------------------------------------
40+
|
41+
Quantized Model
42+
|
43+
—--------------------------------------------------------
44+
| Lower into Inductor |
45+
—--------------------------------------------------------
46+
|
47+
OpenVINO model
48+
49+
Post Training Quantization
50+
----------------------------
51+
52+
Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_
53+
for post training quantization.
54+
55+
1. OpenVINO and NNCF installation
56+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57+
OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.openvino.ai/2024/get-started/install-openvino.html>`_:
58+
59+
.. code-block:: bash
60+
61+
pip install -U pip
62+
pip install openvino, nncf
63+
64+
65+
2. Capture FX Graph
66+
^^^^^^^^^^^^^^^^^^^^^
67+
68+
We will start by performing the necessary imports, capturing the FX Graph from the eager module.
69+
70+
.. code-block:: python
71+
72+
import torch
73+
import torchvision.models as models
74+
import copy
75+
import openvino.torch
76+
77+
import nncf
78+
from nncf.torch import disable_patching
79+
80+
# Create the Eager Model
81+
model_name = "resnet18"
82+
model = models.__dict__[model_name](pretrained=True)
83+
84+
# Set the model to eval mode
85+
model = model.eval()
86+
87+
# Create the data, using the dummy data here as an example
88+
traced_bs = 50
89+
x = torch.randn(traced_bs, 3, 224, 224).contiguous(memory_format=torch.channels_last)
90+
example_inputs = (x,)
91+
92+
# Capture the FX Graph to be quantized
93+
with torch.no_grad():
94+
with disable_patching():
95+
exported_model = torch.export.export(model, example_inputs).module()
96+
97+
98+
Next, we will have the FX Module to be quantized.
99+
100+
3. Apply Quantization
101+
^^^^^^^^^^^^^^^^^^^^^^^
102+
103+
Before the quantization, we need to create an instance of the nncf.Dataset class that represents the calibration dataset.
104+
The ``nncf.Dataset`` class can be a wrapper over the framework dataset object that is used for model training or validation
105+
The class constructor receives the dataset object and an optional transformation function.
106+
107+
The transformation function is a function that takes a sample from the dataset and returns data that can be passed to the model for inference.
108+
For example, this function can take a tuple of a data tensor and labels tensor and return the former while ignoring the latter.
109+
The transformation function is used to avoid modifying the dataset code to make it compatible with the quantization API.
110+
The function is applied to each sample from the dataset before passing it to the model for inference.
111+
The following code snippet shows how to create an instance of the ``nncf.Dataset`` class:
112+
113+
.. code-block:: python
114+
115+
calibration_loader = torch.utils.data.DataLoader([example_inputs])
116+
117+
def transform_fn(data_item):
118+
# In the transformation function,
119+
# user can separate labels and input data
120+
# from the given data item:
121+
# images, _ = data_item
122+
return data_item
123+
124+
calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)
125+
126+
If there is no framework dataset object, you can create your own entity that implements the Iterable interface in Python,
127+
for example, the list of images, and returns data samples feasible for inference. In this case, a transformation function is not required.
128+
129+
Once the dataset is ready and the model object is instantiated, you can apply 8-bit quantization to it.
130+
131+
.. code-block:: python
132+
133+
with disable_patching():
134+
quantized_model = nncf.quantize(exported_model, calibration_dataset)
135+
136+
``nncf.quantize()`` function has several optional parameters that allow tuning the quantization process to get a more accurate model.
137+
Below is the list of parameters and their description:
138+
139+
* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). None is default, i.e. no specific scheme is defined.
140+
.. code-block:: python
141+
142+
nncf.quantize(model, dataset, model_type=nncf.ModelType.Transformer)
143+
144+
* ``preset`` - defines quantization scheme for the model. Two types of presets are available:
145+
146+
* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations
147+
148+
* ``MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.
149+
150+
.. code-block:: python
151+
152+
nncf.quantize(model, dataset, preset=nncf.QuantizationPreset.MIXED)
153+
154+
* ``fast_bias_correction`` - when set to False, enables a more accurate bias (error) correction algorithm that can be used to improve the accuracy of the model. True is used by default to minimize quantization time.
155+
156+
.. code-block:: python
157+
158+
nncf.quantize(model, dataset, fast_bias_correction=False)
159+
160+
* ``subset_size`` - defines the number of samples from the calibration dataset that will be used to estimate quantization parameters of activations. The default value is 300.
161+
162+
.. code-block:: python
163+
164+
nncf.quantize(model, dataset, subset_size=1000)
165+
166+
* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
167+
168+
.. code-block:: python
169+
170+
#Exclude by layer name:
171+
names = ['layer_1', 'layer_2', 'layer_3']
172+
nncf.quantize(model, dataset, ignored_scope=nncf.IgnoredScope(names=names))
173+
174+
#Exclude by layer type:
175+
types = ['Conv2d', 'Linear']
176+
nncf.quantize(model, dataset, ignored_scope=nncf.IgnoredScope(types=types))
177+
178+
#Exclude by regular expression:
179+
regex = '.*layer_.*'
180+
nncf.quantize(model, dataset, ignored_scope=nncf.IgnoredScope(patterns=regex))
181+
182+
#Exclude by subgraphs:
183+
# In this case, all nodes along all simple paths in the graph
184+
# from input to output nodes will be excluded from the quantization process.
185+
subgraph = nncf.Subgraph(inputs=['layer_1', 'layer_2'], outputs=['layer_3'])
186+
nncf.quantize(model, dataset, ignored_scope=nncf.IgnoredScope(subgraphs=[subgraph]))
187+
188+
189+
* ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``NPU``.
190+
191+
.. code-block:: python
192+
193+
nncf.quantize(model, dataset, target_device=nncf.TargetDevice.CPU)
194+
195+
* ``advanced_parameters`` - used to specify advanced quantization parameters for fine-tuning the quantization algorithm. Defined by nncf.quantization.advanced_parameters NNCF submodule. None is default.
196+
197+
After these steps, we finished running the quantization flow, and we will get the quantized model.
198+
199+
200+
4. Lower into OpenVINO representation
201+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
202+
203+
After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(…, backend=”openvino”) <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ functionality.
204+
205+
.. code-block:: python
206+
207+
with torch.no_grad():
208+
optimized_model = torch.compile(quantized_model, backend="openvino")
209+
210+
# Running some benchmark
211+
optimized_model(*example_inputs)
212+
213+
214+
The optimized model is using low-level kernels designed specifically for Intel CPU.
215+
This should significantly speed up inference time in comparison with the eager model.
216+
217+
Conclusion
218+
------------
219+
220+
With this tutorial, we introduce how to use torch.compile with the OpenVINO backend with models quantized via ``nncf.quantize``.
221+
For further information, please visit `complete example on renset18 model <https://github.com/openvinotoolkit/nncf/tree/v2.14.0/examples/post_training_quantization/torch_fx/resnet18>`_.

0 commit comments

Comments
 (0)