-
Notifications
You must be signed in to change notification settings - Fork 0
[Tutorial] OpenVINOQuantizer #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4b67782
to
acf1647
Compare
|
||
# Create the data, using the dummy data here as an example | ||
traced_bs = 50 | ||
x = torch.randn(traced_bs, 3, 224, 224).contiguous(memory_format=torch.channels_last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the memory format to be channels_last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy past from the original tutorial, removed, thanks!
example_inputs = (x,) | ||
|
||
# Capture the FX Graph to be quantized | ||
with torch.no_grad(), disable_patching(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is disable_patching() needed both during export and inference with torch.compile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes: export will fail with an error and performance of the compiled model will be ruined without it
=========================================================================== | ||
|
||
**Author**: dlyakhov, asuslov, aamir, # TODO: add required authors | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import nncf | ||
from nncf.torch import disable_patching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import nncf | |
from nncf.torch import disable_patching | |
import nncf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that does not work. We can do import nncf.torch
and then do nncf.torch.disable_patching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import nncf.torch
is introduced, please check
# from input to output nodes will be excluded from the quantization process. | ||
subgraph = nncf.Subgraph(inputs=['layer_1', 'layer_2'], outputs=['layer_3']) | ||
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(subgraphs=[subgraph])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can I find more information about OpenVINOQuantizer parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, we don't have a dedicated page about the OpenVINOQuantizer yet. We have a dedicated page for the nncf.quantize
and its parameters, but the subset of parameters is not equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a link to nncf API docs, which should be updated with this PR: openvinotoolkit/nncf#3277
Conclusion | ||
------------ | ||
|
||
With this tutorial, we introduce how to use torch.compile with the OpenVINO backend and the OpenVINO quantizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add somethink like that:
For more information about NNCF and NNCF Quantization Flow for PyTorch models, please visit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please check
Co-authored-by: Alexander Suslov <[email protected]> Co-authored-by: Yamini Nimmagadda <[email protected]>
f4f592f
to
af4eb02
Compare
af4eb02
to
810899a
Compare
|
||
The quantization flow mainly includes four steps: | ||
|
||
- Step 1: Install OpenVINO and NNCF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the quantization flow itself does not includer step 1. It is just a prerequisite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, fixed
Introduction | ||
-------------- | ||
|
||
This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more attractive if to give the user an idea why it may need to use OpenVINOQuantizer (e.g. it is more accurate, performant, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! Description of advantages of OpenVINOQuantizer was added
c6c6e46
to
82a47a5
Compare
Co-authored-by: Alexander Suslov <[email protected]>
1c6bc7c
to
f09a85f
Compare
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Removing the docs survey banner
Fix code snippet format issue in inductor_windows --------- Co-authored-by: Svetlana Karslioglu <[email protected]>
* Add a note that foreach feature is a prototype
Update the What's New section. --------- Co-authored-by: Svetlana Karslioglu <[email protected]>
* Adjust torch.compile() best practices 1. Add best practice to prefer `mod.compile` over `torch.compile(mod)`, which avoids `_orig_` naming problems. Repro steps: - opt_mod = torch.compile(mod) - train opt_mod - save checkpoint In another script, potentially on a machine that does NOT support `torch.compile`: load checkpoint. This fails with an error, because the checkpoint on `opt_mod` got its params renamed by `torch.compile`: ``` RuntimeError: Error(s) in loading state_dict for VQVAE: Missing key(s) in state_dict: "embedding.weight", "encoder.encoder.net.0.weight", "encoder.encoder.net.0.bias", ... Unexpected key(s) in state_dict: "_orig_mod.embedding.weight", "_orig_mod.encoder.encoder.net.0.weight", "_orig_mod.encoder.encoder.net.0.bias", ... ``` - Add best practice to use, or at least try, `fullgraph=True`. This doesn't always work, but we should encourage it. --------- Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Fixes #ISSUE_NUMBER
Description
Checklist