You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md
-55Lines changed: 0 additions & 55 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -268,61 +268,6 @@ file.
268
268
269
269
---
270
270
271
-
### AutoQ
272
-
273
-
NNCF provides an alternate mode, namely AutoQ, for mixed-precision automation. It is an AutoML-based technique that automatically learns the layer-wise bitwidth with explored experiences. Based on [HAQ](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_HAQ_Hardware-Aware_Automated_Quantization_With_Mixed_Precision_CVPR_2019_paper.pdf), AutoQ utilizes an actor-critic algorithm, Deep Deterministic Policy Gradient (DDPG) for efficient search over the bitwidth space. DDPG is trained in an episodic fashion, converging to a deterministic mixed-precision policy after a number of episodes. An episode is constituted by stepping, the DDPG transitions from quantizer to quantizer sequentially to predict a precision of a layer. Each quantizer essentially denotes a state in RL framework and it is represented by attributes of the associated layers. For example, a quantizer for 2D Convolution is represented by its quantizer Id (integer), input and output channel size, feature map dimension, stride size, if it is depthwise, number of parameters etc. It is recommended to check out `_get_layer_attr` in [quantization_env.py](/src/nncf/torch/automl/environment/quantization_env.py#L370) for the featurization of different network layer types.
274
-
275
-
When the agent enters a state/quantizer, it receives the state features and forward passes them through its network. The output of the forward pass is a scalar continuous action output which is subsequently mapped to the bitwidth options of the particular quantizer. The episode terminates after the prediction of the last quantizer and a complete layer-wise mixed-precision policy is obtained. To ensure a policy fits in the user-specified compression ratio, the policy is post processed by reducing the precision sequentially from the last quantizer until the compression ratio is met.
276
-
277
-
To evaluate the goodness of a policy, NNCF backend quantizes the workload accordingly and performs evaluation with the user-registered function. The evaluated score, together with the state embedding, predicted action are appended to an experience vault to serve for DDPG learning. The learning is carried out by sampling the data point from the experience vault for supervised training of the DDPG network. This process typically happens at a fixed interval. In the current implementation, it is performed after each episode evaluation. For bootstrapping, exploration and diversity of experience, noise is added to action output. As the episodic iterations progress, the noise magnitude is gradually reduced to zero, a deterministic mixed-precision policy is converged at the end of the episodes. NNCF currently keeps track of the best policy and uses it for fine tuning.
278
-
279
-
```json5
280
-
{
281
-
"target_device":"NPU",
282
-
"compression": {
283
-
"algorithm":"quantization",
284
-
"initializer": {
285
-
"precision": {
286
-
"type":"autoq",
287
-
"bits": [
288
-
2,
289
-
4,
290
-
8
291
-
],
292
-
"iter_number":300,
293
-
"compression_ratio":0.15,
294
-
"eval_subset_ratio":0.20,
295
-
"dump_init_precision_data":true
296
-
}
297
-
}
298
-
}
299
-
}
300
-
```
301
-
302
-
The snippet above demonstrates the specification of AutoQ in NNCF config. ```target_device``` determines the bitwidth choices available for a particular layer. ```bits``` also defines the precision space of quantizer but it is only active in the absence of target device.
303
-
304
-
```iter_number``` is synonymous to the number of episodes. A good choice depends on the number of quantizers in a workload and also the number of bitwidth choice. The larger the number, more episodes are required.
305
-
306
-
```compression_ratio``` is the target model size after quantization, relative to total parameters size in FP32. E.g. uniformly int8 quantized model is 0.25 in compression ratio, 0.125 for uniform int4 quantization.
307
-
308
-
```eval_subset_ratio``` is ratio of dataset to be used for evaluation for each iteration. It is used by the callback function. (See below).
309
-
310
-
```dump_init_precision_data``` dumps AutoQ's episodic metrics as tensorboard events, viewable in Tensorboard.
311
-
312
-
As briefly mentioned earlier, user is required to register a callback function for policy evaluation. The interface of the callback is a model object and torch loader object. The callback must return a scalar metric. The callback function and a torch loader are registered via ```register_default_init_args```.
313
-
314
-
Following is an example of wrapping ImageNet validation loop as a callback. Top5 accuracy is chosen as the scalar objective metric. ```autoq_eval_fn``` and ```val_loader``` are registered in the call of ```register_default_init_args```.
>_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_.
0 commit comments