Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f0ab805
WIP
daniil-lyakhov Nov 21, 2024
acf1647
OpenVINOQuantizer
daniil-lyakhov Jan 28, 2025
5b1c99a
Apply suggestions from code review
daniil-lyakhov Feb 7, 2025
b2eaa82
Comments
daniil-lyakhov Feb 7, 2025
810899a
NNCF API docs
daniil-lyakhov Feb 20, 2025
82a47a5
Comments
daniil-lyakhov Feb 24, 2025
26f044b
fold_quantize=False
daniil-lyakhov Feb 24, 2025
75d3549
Update prototype_source/openvino_quantizer.rst
daniil-lyakhov Apr 11, 2025
e8e94d3
Merge branch 'main' into dl/fx/openvino_quantizer
svekars Apr 12, 2025
f09a85f
Spelling / comments
daniil-lyakhov Apr 14, 2025
2c766e7
Merge branch 'main' into dl/fx/openvino_quantizer
svekars Apr 15, 2025
b424f92
Merge branch 'main' into dl/fx/openvino_quantizer
svekars Apr 15, 2025
f3137be
prototype_index.rst is updated
daniil-lyakhov Apr 16, 2025
b7d2781
Apply suggestions from code review
daniil-lyakhov Apr 16, 2025
bb3c2f8
Merge remote-tracking branch 'origin/main' into dl/fx/openvino_quantizer
daniil-lyakhov Apr 22, 2025
c093c76
Update prototype_source/openvino_quantizer.rst
daniil-lyakhov Apr 22, 2025
ccc02d6
Remove Docs Survey Banner (#3340)
sekyondaMeta Apr 22, 2025
090823f
Merge branch 'main' into dl/fx/openvino_quantizer
svekars Apr 22, 2025
71695c7
Fix code snippet format issue in inductor_windows (#3339)
ZhaoqiongZ Apr 22, 2025
35c68ea
Add a note that foreach feature is a prototype (#3341)
svekars Apr 22, 2025
a5632da
Updating tutorials for 2.7. (#3338)
AlannaBurke Apr 23, 2025
0a422c2
Merge branch 'main' into dl/fx/openvino_quantizer
svekars Apr 23, 2025
7fc877b
Adjust torch.compile() best practices (#3336)
punkeel Apr 28, 2025
bdeca26
fix index format (#3343)
ZhaoqiongZ Apr 28, 2025
1988e26
fix a typo in optimization_tutorial.py (#3333)
partev Apr 28, 2025
70d2154
fix a typo in zeroing_out_gradients.py (#3337)
partev Apr 28, 2025
7e97977
Merge branch 'main' into dl/fx/openvino_quantizer
svekars Apr 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions _templates/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -211,14 +211,5 @@

<img height="1" width="1" style="border-style:none;" alt="" src="https://www.googleadservices.com/pagead/conversion/795629140/?label=txkmCPmdtosBENSssfsC&amp;guid=ON&amp;script=0"/>

<script>
//temporarily add a link to survey
var survey = '<div class="survey-banner"><p><i class="fas fa-poll" aria-hidden="true">&nbsp </i> Take the <a href="https://forms.gle/KZ4xGL65VRMYNbbG6">PyTorch Docs/Tutorials survey</a>.</p></div>'
if ($(".pytorch-call-to-action-links").length) {
$(".pytorch-call-to-action-links").before(survey);
} else {
$("#pytorch-article").prepend(survey);
}
</script>

{% endblock %}
2 changes: 1 addition & 1 deletion beginner_source/basics/optimization_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def forward(self, x):
# (`read more <https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html>`__ about hyperparameter tuning)
#
# We define the following hyperparameters for training:
# - **Number of Epochs** - the number times to iterate over the dataset
# - **Number of Epochs** - the number of times to iterate over the dataset
# - **Batch Size** - the number of data samples propagated through the network before the parameters are updated
# - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.
#
Expand Down
11 changes: 11 additions & 0 deletions en-wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -698,3 +698,14 @@ TorchServe
Inductor’s
onwards
recompilations
BiasCorrection
ELU
GELU
NNCF
OpenVINO
OpenVINOQuantizer
PReLU
Quantizer
SmoothQuant
quantizer
quantizers
12 changes: 5 additions & 7 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,11 @@ Welcome to PyTorch Tutorials

**What's new in PyTorch tutorials?**

* `Dynamic Compilation Control with torch.compiler.set_stance <https://pytorch.org/tutorials/recipes/torch_compiler_set_stance_tutorial.html>`__
* `Accelerating PyTorch Transformers by replacing nn.Transformer with Nested Tensors and torch.compile() <https://pytorch.org/tutorials/intermediate/transformer_building_blocks.html>`__
* `Understanding the torch.export Flow and Solutions to Common Challenges <https://pytorch.org/tutorials/recipes/torch_export_challenges_solutions.html>`__
* Updated `torch.export Tutorial <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html#constraints-dynamic-shapes>`__ with automatic dynamic shapes ``Dim.AUTO``
* Updated `torch.export AOTInductor Tutorial for Python runtime <https://pytorch.org/tutorials/recipes/torch_export_aoti_python.html>`__
* Updated `Using User-Defined Triton Kernels with torch.compile <https://pytorch.org/tutorials/recipes/torch_compile_user_defined_triton_kernel_tutorial.html#composability>`__ with new ``torch.library.triton_op``
* Updated `Compile Time Caching in torch.compile <https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html>`__ with new ``Mega-Cache``
* `Utilizing Torch Function modes with torch.compile <https://pytorch.org/tutorials/recipes/torch_compile_torch_function_modes.html>`__
* `Context Parallel Tutorial <https://pytorch.org/tutorials/prototype/context_parallel.html>`__
* `PyTorch 2 Export Quantization with Intel GPU Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html>`__
* `(beta) Explicit horizontal fusion with foreach_map and torch.compile <https://pytorch.org/tutorials/recipes/foreach_map.html>`__
* Updated `Inductor Windows CPU Tutorial <https://pytorch.org/tutorials/prototype/inductor_windows.html>`__

.. raw:: html

Expand Down
17 changes: 13 additions & 4 deletions intermediate_source/torch_compile_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,11 @@ def forward(self, x):
return torch.nn.functional.relu(self.lin(x))

mod = MyModule()
opt_mod = torch.compile(mod)
print(opt_mod(t))
mod.compile()
print(mod(t))
## or:
# opt_mod = torch.compile(mod)
# print(opt_mod(t))

######################################################################
# torch.compile and Nested Calls
Expand Down Expand Up @@ -135,8 +138,8 @@ def forward(self, x):
return torch.nn.functional.relu(self.outer_lin(x))

outer_mod = OuterModule()
opt_outer_mod = torch.compile(outer_mod)
print(opt_outer_mod(t))
outer_mod.compile()
print(outer_mod(t))

######################################################################
# We can also disable some functions from being compiled by using
Expand Down Expand Up @@ -197,6 +200,12 @@ def outer_function():
# 4. **Compile Leaf Functions First:** In complex models with multiple nested
# functions and modules, start by compiling the leaf functions or modules first.
# For more information see `TorchDynamo APIs for fine-grained tracing <https://pytorch.org/docs/stable/torch.compiler_fine_grain_apis.html>`__.
#
# 5. **Prefer ``mod.compile()`` over ``torch.compile(mod)``:** Avoids ``_orig_`` prefix issues in ``state_dict``.
#
# 6. **Use ``fullgraph=True`` to catch graph breaks:** Helps ensure end-to-end compilation, maximizing speedup
# and compatibility with ``torch.export``.


######################################################################
# Demonstrating Speedups
Expand Down
28 changes: 15 additions & 13 deletions prototype_source/inductor_windows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,9 @@ Install a Compiler

C++ compiler is required for TorchInductor optimization, let's take Microsoft Visual C++ (MSVC) as an example.

1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
#. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.

1. During Installation, select **Workloads** and then **Desktop & Mobile**.
1. Select a checkmark on **Desktop Development with C++** and install.
#. During Installation, select **Workloads** and then **Desktop & Mobile**. Select a checkmark on **Desktop Development with C++** and install.

.. image:: ../_static/img/install_msvc.png

Expand All @@ -44,18 +43,21 @@ Next, let's configure our environment.

"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
#. Create and activate a virtual environment: ::

#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU <https://pytorch.org/docs/main/notes/get_start_xpu.html>`_ for XPU usage.

#. Here is an example of how to use TorchInductor on Windows:
.. code-block:: python

import torch
device="cpu" # or "xpu" for XPU
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device)))

.. code-block:: python

import torch
device="cpu" # or "xpu" for XPU
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device)))

#. Below is the output of the above example::

Expand Down
250 changes: 250 additions & 0 deletions prototype_source/openvino_quantizer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
PyTorch 2 Export Quantization for OpenVINO torch.compile Backend
===========================================================================

**Authors**: `Daniil Lyakhov <https://github.com/daniil-lyakhov>`_, `Aamir Nazir <https://github.com/anzr299>`_, `Alexander Suslov <https://github.com/alexsu52>`_, `Yamini Nimmagadda <https://github.com/ynimmaga>`_, `Alexander Kozlov <https://github.com/AlexKoff88>`_

Prerequisites
--------------
- `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
- `How to Write a Quantizer for PyTorch 2 Export Quantization <https://pytorch.org/tutorials/prototype/pt2e_quantizer.html>`_

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Introduction
--------------

.. note::

This is an experimental feature, the quantization API is subject to change.

This tutorial demonstrates how to use ``OpenVINOQuantizer`` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
``OpenVINOQuantizer`` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.

The PyTorch 2 export quantization flow uses ``torch.export`` to capture the model into a graph and performs quantization transformations on top of the ATen graph.
This approach is expected to have significantly higher model coverage, improved flexibility, and a simplified UX.
OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.

The quantization flow mainly includes four steps:

- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
- Step 2: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
- Step 3: Lower the quantized model into OpenVINO representation with the `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ API.
- Optional step 4: : Improve quantized model metrics via `quantize_pt2e <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_ method.

The high-level architecture of this flow could look like this:

::

float_model(Python) Example Input
\ /
\ /
—--------------------------------------------------------
| export |
—--------------------------------------------------------
|
FX Graph in ATen
|
| OpenVINOQuantizer
| /
—--------------------------------------------------------
| prepare_pt2e |
| | |
| Calibrate
| | |
| convert_pt2e |
—--------------------------------------------------------
|
Quantized Model
|
—--------------------------------------------------------
| Lower into Inductor |
—--------------------------------------------------------
|
OpenVINO model

Post Training Quantization
----------------------------

Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_
for post training quantization.

Prerequisite: OpenVINO and NNCF installation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.openvino.ai/2024/get-started/install-openvino.html>`_:

.. code-block:: bash

pip install -U pip
pip install openvino, nncf


1. Capture FX Graph
^^^^^^^^^^^^^^^^^^^^^

We will start by performing the necessary imports, capturing the FX Graph from the eager module.

.. code-block:: python

import copy
import openvino.torch
import torch
import torchvision.models as models
from torch.ao.quantization.quantize_pt2e import convert_pt2e
from torch.ao.quantization.quantize_pt2e import prepare_pt2e

import nncf.torch

# Create the Eager Model
model_name = "resnet18"
model = models.__dict__[model_name](pretrained=True)

# Set the model to eval mode
model = model.eval()

# Create the data, using the dummy data here as an example
traced_bs = 50
x = torch.randn(traced_bs, 3, 224, 224)
example_inputs = (x,)

# Capture the FX Graph to be quantized
with torch.no_grad(), nncf.torch.disable_patching():
exported_model = torch.export.export(model, example_inputs).module()



2. Apply Quantization
^^^^^^^^^^^^^^^^^^^^^^^

After we capture the FX Module to be quantized, we will import the OpenVINOQuantizer.


.. code-block:: python

from nncf.experimental.torch.fx import OpenVINOQuantizer

quantizer = OpenVINOQuantizer()

``OpenVINOQuantizer`` has several optional parameters that allow tuning the quantization process to get a more accurate model.
Below is the list of essential parameters and their description:


* ``preset`` - defines quantization scheme for the model. Two types of presets are available:

* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations

* ``MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.

.. code-block:: python

OpenVINOQuantizer(preset=nncf.QuantizationPreset.MIXED)

* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.

.. code-block:: python

OpenVINOQuantizer(model_type=nncf.ModelType.Transformer)

* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:

.. code-block:: python

#Exclude by layer name:
names = ['layer_1', 'layer_2', 'layer_3']
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(names=names))

#Exclude by layer type:
types = ['Conv2d', 'Linear']
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(types=types))

#Exclude by regular expression:
regex = '.*layer_.*'
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(patterns=regex))

#Exclude by subgraphs:
# In this case, all nodes along all simple paths in the graph
# from input to output nodes will be excluded from the quantization process.
subgraph = nncf.Subgraph(inputs=['layer_1', 'layer_2'], outputs=['layer_3'])
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(subgraphs=[subgraph]))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I find more information about OpenVINOQuantizer parameters?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, we don't have a dedicated page about the OpenVINOQuantizer yet. We have a dedicated page for the nncf.quantize and its parameters, but the subset of parameters is not equivalent

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a link to nncf API docs, which should be updated with this PR: openvinotoolkit/nncf#3277


* ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``NPU``.

.. code-block:: python

OpenVINOQuantizer(target_device=nncf.TargetDevice.CPU)

For further details on `OpenVINOQuantizer` please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.OpenVINOQuantizer>`_.

After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.

.. code-block:: python

prepared_model = prepare_pt2e(exported_model, quantizer)

Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model.

.. code-block:: python

# We use the dummy data as an example here
prepared_model(*example_inputs)

Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt2e`` takes a calibrated model and produces a quantized model.

.. code-block:: python

quantized_model = convert_pt2e(prepared_model, fold_quantize=False)

After these steps, we finished running the quantization flow, and we will get the quantized model.


3. Lower into OpenVINO representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(…, backend=”openvino”) <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ functionality.

.. code-block:: python

with torch.no_grad(), nncf.torch.disable_patching():
optimized_model = torch.compile(quantized_model, backend="openvino")

# Running some benchmark
optimized_model(*example_inputs)



The optimized model is using low-level kernels designed specifically for Intel CPU.
This should significantly speed up inference time in comparison with the eager model.

4. Optional: Improve quantized model metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

NNCF implements advanced quantization algorithms like `SmoothQuant <https://arxiv.org/abs/2211.10438>`_ and `BiasCorrection <https://arxiv.org/abs/1906.04721>`_, which help
to improve the quantized model metrics while minimizing the output discrepancies between the original and compressed models.
These advanced NNCF algorithms can be accessed via the NNCF `quantize_pt2e` API:

.. code-block:: python

from nncf.experimental.torch.fx import quantize_pt2e

calibration_loader = torch.utils.data.DataLoader(...)


def transform_fn(data_item):
images, _ = data_item
return images


calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)
quantized_model = quantize_pt2e(
exported_model, quantizer, calibration_dataset, smooth_quant=True, fast_bias_correction=False
)


For further details, please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_
and a complete `example on Resnet18 quantization <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch_fx/resnet18/README.md>`_.

Conclusion
------------

This tutorial introduces how to use torch.compile with the OpenVINO backend and the OpenVINO quantizer.
For more details on NNCF and the NNCF Quantization Flow for PyTorch models, refer to the `NNCF Quantization Guide <https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/quantizing-models-post-training/basic-quantization-flow.html.>`_.
For additional information, check out the `OpenVINO Deployment via torch.compile Documentation <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
Loading
Loading