Skip to content

Commit 127ae4e

Browse files
Merge remote-tracking branch 'origin/develop' into dl/quantization/passes_for_splitted_graphs
2 parents c4c1c91 + 7af953a commit 127ae4e

File tree

144 files changed

+2535
-1384
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+2535
-1384
lines changed

codecov.yml

Lines changed: 62 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,7 @@ ignore:
55

66
codecov:
77
notify:
8-
after_n_builds: 2
9-
wait_for_ci: no
8+
wait_for_ci: true
109
max_report_age: off
1110

1211
coverage:
@@ -15,6 +14,7 @@ coverage:
1514
default:
1615
branches:
1716
- develop
17+
target: 90%
1818
informational: true
1919
only_pulls: true
2020
paths:
@@ -23,15 +23,72 @@ coverage:
2323
default:
2424
branches:
2525
- develop
26+
target: 90%
2627
informational: true
2728
only_pulls: true
2829
paths:
29-
- "nncf/onnx"
30-
- "nncf/common" # extend this once we collect coverage reports for more than just onnx and common part of precommit
30+
- "nncf"
3131

3232
comment:
33-
layout: "diff, flags, files"
33+
layout: "reach, diff, files, flags, components"
3434
require_changes: false
3535

3636
require_head: false
3737
require_base: false
38+
39+
flag_management:
40+
# Flag coverage percentage seems to show the "percentage of lines under the flag path covered as reported ONLY
41+
# by the upload with the corresponding flag", so e.g. for COMMON the flag coverage percentage will report the
42+
# percentage of common code tested ONLY by the common tests, and e.g. not by backend-specific precommit parts
43+
# (which also run common code and are therefore indirectly providing coverage). Ideally each flag-specific path
44+
# would be described below with the corresponding flag and provide valuable information on whether the test code base
45+
# is written efficiently, e.g. that the backend-specific tests predominantly validate backend-specific code and the
46+
# common tests completely cover the common code on their own. However, if we set all flags with paths here, then the
47+
# total repo coverage percentage will sink, because codecov currently reports the overall coverage based on the union
48+
# of the "flag" coverages - not the "component" coverages (see below) - and currently NNCF's precommit tests are
49+
# biased toward validating common code via backend-specific tests. In the future the tests will be gradually
50+
# refactored to have more "locality" in what each precommit section tests.
51+
individual_flags:
52+
- name: COMMON
53+
paths:
54+
- nncf/common
55+
- nncf/quantization
56+
57+
component_management:
58+
# In contrast to the "flag" coverage above, the "component" display seems to calculate percentage based on the
59+
# coverage information from ALL uploads for the code in the specified path. With this, the "component" coverage
60+
# percentage is a better representation of what sub-paths in the NNCF code base are covered with at least one test,
61+
# without distinction whether the test was run in the
62+
individual_components:
63+
- component_id: common
64+
name: common
65+
paths:
66+
- nncf/common
67+
- "!nncf/**/torch_*.py"
68+
- "!nncf/**/tensorflow_*.py"
69+
- "!nncf/**/onnx_*.py"
70+
- "!nncf/**/openvino_*.py"
71+
- component_id: torch
72+
name: torch
73+
paths:
74+
- nncf/torch
75+
- nncf/**/torch_*.py
76+
- component_id: tensorflow
77+
name: tensorflow
78+
paths:
79+
- nncf/tensorflow
80+
- nncf/**/tensorflow_*.py
81+
- component_id: onnx
82+
name: onnx
83+
paths:
84+
- nncf/onnx
85+
- nncf/**/onnx_*.py
86+
- component_id: openvino
87+
name: openvino
88+
paths:
89+
- nncf/openvino
90+
- nncf/**/openvino_*.py
91+
- component_id: quantization
92+
name: ptq
93+
paths:
94+
- nncf/quantization

docs/compression_algorithms/CompressWeights.md

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,30 @@ The Weights Compression algorithm is aimed at compressing the weights of the mod
88

99
#### Supported modes
1010

11-
By default, weights are compressed to 8-bit integer data type - "INT8" mode.
11+
By default, weights are compressed asymmetrically to 8-bit integer data type - "INT8_ASYM" mode.
1212
OpenVINO backend also supports 3 modes of mixed precision weight quantization with a 4-bit data type as a primary precision - INT4_SYM, INT4_ASYM and NF4. The primary precision in case of INT4_SYM mode is unsigned 4-bit integer and weights are quantized to it [symmetrically](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#symmetric-quantization) with a fixed zero point equals to 8. In case of INT4_ASYM mode - also unsigned 4-bit integer, but weight are quantized to it [asymmetrically](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#asymmetric-quantization) with a typical non-fixed zero point. In case of NF4 mode - [nf4](https://arxiv.org/pdf/2305.14314v1.pdf) data type without zero point.
1313
All 4-bit modes have a grouped quantization support, when small group of weights (e.g. 128) in the channel dimension share quantization parameters (scale).
1414
All embeddings and last linear layers are always compressed to 8-bit integer data type.
15-
Percent of the rest layers compressed to 4-bit can be configured by "ratio" parameter. E.g. ratio=0.9 means 90% of layers compressed to the corresponding 4-bit data type and the rest to 8-bit integer data type.
15+
Percent of the rest layers compressed to 4-bit can be configured by "ratio" parameter. E.g. ratio=0.9 means 90% of layers compressed to the corresponding 4-bit data type and the rest to 8-bit asymmetric integer data type.
1616

1717
#### User guide
1818

19-
- Compress weights to 8-bit integer data type.
19+
- Compress weights asymmetrically to 8-bit integer data type.
2020

2121
```python
2222
from nncf import compress_weights
2323
compressed_model = compress_weights(model)
2424
```
2525

26-
- Compress weights symmetrically to 4-bit integer data type with group size = 128, except embeddings and last linear layers - they are compressed to 8-bit integer data type.
26+
- Compress weights symmetrically to 8-bit integer data type.
27+
28+
```python
29+
from nncf import compress_weights
30+
from nncf import CompressWeightsMode
31+
compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM)
32+
```
33+
34+
- Compress weights symmetrically to 4-bit integer data type with group size = 128, except embeddings and last linear layers - they are compressed asymmetrically to 8-bit integer data type.
2735

2836
```python
2937
from nncf import compress_weights
@@ -36,7 +44,7 @@ compressed_model = compress_weights(model, mode=CompressWeightsMode.INT4_SYM)
3644
If the accuracy or perplexity is still not satisfying, there are 2 more hyper-parameters to tune: `group_size` and `ratio`.
3745
Lower group size and less ratio of 4-bit layers usually improve accuracy at the sacrifice of inference speed.
3846
Below is the example how to compress weights of 90% of layers to 4-bit integer asymmetrically with the group size 64, and
39-
the rest of layers to 8-bit integer data type. The same parametrization is applicable for `INT4_SYM` mode.
47+
the rest of layers to 8-bit asymmetric integer data type. The same parametrization is applicable for `INT4_SYM` mode.
4048

4149
```python
4250
from nncf import compress_weights
@@ -45,7 +53,7 @@ compressed_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, g
4553
```
4654

4755
- `NF4` mode can be considered for improving accuracy, but currently models quantized to nf4 should not be faster models
48-
quantized to 8-bit integer. Here's the example how to compress weights to nf4 data type with group size = 128.
56+
quantized to 8-bit asymmetric integer. Here's the example how to compress weights to nf4 data type with group size = 128.
4957
Different `group_size` and `ratio` are also supported.
5058

5159
```python
@@ -79,7 +87,7 @@ Here is the perplexity and model size before and after weight compression for di
7987
</tr>
8088
<tr>
8189
<td class="tg-0pky">databricks/dolly-v2-3b</td>
82-
<td class="tg-0pky">int8</td>
90+
<td class="tg-0pky">int8_asym</td>
8391
<td class="tg-0pky">5.07</td>
8492
<td class="tg-0pky">0.05</td>
8593
<td class="tg-0pky">2.6</td>
@@ -107,7 +115,7 @@ Here is the perplexity and model size before and after weight compression for di
107115
</tr>
108116
<tr>
109117
<td class="tg-0pky">facebook/opt-6.7b</td>
110-
<td class="tg-0pky">int8</td>
118+
<td class="tg-0pky">int8_asym</td>
111119
<td class="tg-0pky">4.27</td>
112120
<td class="tg-0pky">0.01</td>
113121
<td class="tg-0pky">6.2</td>
@@ -135,7 +143,7 @@ Here is the perplexity and model size before and after weight compression for di
135143
</tr>
136144
<tr>
137145
<td class="tg-0pky">meta-llama/Llama-2-7b-chat-hf</td>
138-
<td class="tg-0pky">int8</td>
146+
<td class="tg-0pky">int8_asym</td>
139147
<td class="tg-0pky">3.29</td>
140148
<td class="tg-0pky">0.01</td>
141149
<td class="tg-0pky">6.3</td>
@@ -163,7 +171,7 @@ Here is the perplexity and model size before and after weight compression for di
163171
</tr>
164172
<tr>
165173
<td class="tg-0pky">togethercomputer/RedPajama-INCITE-7B-Instruct</td>
166-
<td class="tg-0pky">int8</td>
174+
<td class="tg-0pky">int8_asym</td>
167175
<td class="tg-0pky">4.17</td>
168176
<td class="tg-0pky">0.02</td>
169177
<td class="tg-0pky">6.4</td>
@@ -191,7 +199,7 @@ Here is the perplexity and model size before and after weight compression for di
191199
</tr>
192200
<tr>
193201
<td class="tg-0pky">meta-llama/Llama-2-13b-chat-hf</td>
194-
<td class="tg-0pky">int8</td>
202+
<td class="tg-0pky">int8_asym</td>
195203
<td class="tg-0pky">2.91</td>
196204
<td class="tg-0pky">0</td>
197205
<td class="tg-0pky">12.1</td>
@@ -218,7 +226,7 @@ Here is the perplexity and model size before and after weight compression for di
218226
- The algorithm is supported for OpenVINO and PyTorch models.
219227
- The compression applies in-place.
220228
- The compressed model is not trainable.
221-
- INT4_SYM, INT4_ASYM and NF4 modes, grouped quantization and mixed precision selection is available for OpenVINO backend only.
229+
- INT8_SYM, INT4_SYM, INT4_ASYM and NF4 modes, grouped quantization and mixed precision selection is available for OpenVINO backend only.
222230
- NF4 support is experimental - models quantized to nf4 should not be faster models quantized to 8-bit integer.
223231

224232
#### Additional resources

examples/tensorflow/object_detection/main.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -323,8 +323,7 @@ def run(config):
323323

324324
# Training parameters
325325
epochs = config.epochs
326-
steps_per_epoch = train_builder.steps_per_epoch
327-
num_test_batches = test_builder.steps_per_epoch
326+
steps_per_epoch, num_test_batches = train_builder.steps_per_epoch, test_builder.steps_per_epoch
328327

329328
# Create model builder
330329
model_builder = get_model_builder(config)
@@ -336,10 +335,7 @@ def run(config):
336335
)
337336

338337
resume_training = config.ckpt_path is not None
339-
340-
compression_state = None
341-
if resume_training:
342-
compression_state = load_compression_state(config.ckpt_path)
338+
compression_state = load_compression_state(config.ckpt_path) if resume_training else None
343339

344340
with TFModelManager(model_builder.build_model, config.nncf_config, weights=config.get("weights", None)) as model:
345341
with strategy.scope():
@@ -384,6 +380,8 @@ def run(config):
384380
test_step = create_test_step_fn(strategy, compress_model, predict_post_process_fn)
385381

386382
if "train" in config.mode:
383+
if config.weights is None and not resume_training:
384+
logger.warning("Pretrained checkpoint is not provided. This may lead to poor training results!")
387385
if is_accuracy_aware_training(config):
388386
train_summary_writer = SummaryWriter(config.log_dir, "train")
389387
timer = Timer()

examples/torch/classification/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ python main.py \
6464
- Use the `--resume` flag with the path to a previously saved model to resume training.
6565
- For Torchvision-supported image classification models, set `"pretrained": true` inside the NNCF config JSON file supplied via `--config` to initialize the model to be compressed with Torchvision-supplied pretrained weights, or, alternatively:
6666
- Use the `--weights` flag with the path to a compatible PyTorch checkpoint in order to load all matching weights from the checkpoint into the model - useful if you need to start compression-aware training from a previously trained uncompressed (FP32) checkpoint instead of performing compression-aware training from scratch.
67-
- Use the `--no_strip_on_export` to export not stripped model.
67+
- Use `--export-model-path` to specify the path to export the model in OpenVINO or ONNX format by using the .xml or .onnx suffix, respectively.
68+
- Use the `--no-strip-on-export` to export not stripped model.
69+
- Use the `--export-to-ir-via-onnx` to to export to OpenVINO, will produce the serialized OV IR object by first exporting the torch model object to an .onnx file and then converting that .onnx file to an OV IR file.
6870

6971
### Validate Your Model Checkpoint
7072

@@ -86,7 +88,7 @@ To export trained model to the ONNX format, use the following command:
8688
python main.py -m export \
8789
--config=configs/quantization/mobilenet_v2_imagenet_int8.json \
8890
--resume=../../results/quantization/mobilenet_v2_int8/6/checkpoints/epoch_1.pth \
89-
--to-onnx=../../results/mobilenet_v2_int8.onnx
91+
--to-ir=../../results
9092
```
9193

9294
### Export to OpenVINO™ Intermediate Representation (IR)

examples/torch/classification/configs/binarization/resnet18_imagenet_binarization_dorefa.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,6 @@
2727
"{re}ResNet/Sequential\\[layer4\\]/BasicBlock\\[0\\]/Sequential\\[downsample\\]/.*"]
2828
}
2929
],
30-
"no_strip_on_export": true
30+
"no_strip_on_export": true,
31+
"export_to_ir_via_onnx": true
3132
}

examples/torch/classification/configs/binarization/resnet18_imagenet_binarization_xnor.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,6 @@
2727
"{re}ResNet/Sequential\\[layer4\\]/BasicBlock\\[0\\]/Sequential\\[downsample\\]/.*"]
2828
}
2929
],
30-
"no_strip_on_export": true
30+
"no_strip_on_export": true,
31+
"export_to_ir_via_onnx": true
3132
}

examples/torch/classification/configs/mixed_precision/mobilenet_v2_imagenet_mixed_int_autoq_staged.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,5 +40,6 @@
4040
"lr_poly_drop_duration_epochs": 10
4141
}
4242
},
43-
"no_strip_on_export": true
43+
"no_strip_on_export": true,
44+
"export_to_ir_via_onnx": true
4445
}

examples/torch/classification/configs/mixed_precision/mobilenet_v2_imagenet_mixed_int_hawq.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,6 @@
3535
}
3636
}
3737
},
38-
"no_strip_on_export": true
38+
"no_strip_on_export": true,
39+
"export_to_ir_via_onnx": true
3940
}

examples/torch/classification/configs/mixed_precision/mobilenet_v2_imagenet_mixed_int_manual_staged.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,5 +166,6 @@
166166
"disable_wd_start_epoch": 50
167167
}
168168
},
169-
"no_strip_on_export": true
169+
"no_strip_on_export": true,
170+
"export_to_ir_via_onnx": true
170171
}

examples/torch/classification/configs/mixed_precision/resnet50_imagenet_mixed_int_autoq_staged.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,5 +45,6 @@
4545
"lr_poly_drop_duration_epochs": 10
4646
}
4747
},
48-
"no_strip_on_export": true
48+
"no_strip_on_export": true,
49+
"export_to_ir_via_onnx": true
4950
}

0 commit comments

Comments
 (0)