Skip to content

Commit 9d389fb

Browse files
committed
doc: add compiler features and LP support doc
Signed-off-by: Prashant Gaikwad <pgaikwad@nvidia.com>
1 parent 9860451 commit 9d389fb

File tree

2 files changed

+133
-0
lines changed

2 files changed

+133
-0
lines changed

CompilerFeatures.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# DLA Compiler
2+
3+
### Layers and features support
4+
5+
|Layer &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Feature &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|FP16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|INT8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|
6+
|-----------|---------------|-------|-------|
7+
|**Convolution**||&#10004;|&#10004;|
8+
||Dilation|&#10004;|&#10004;|
9+
||Winograd|&#10004;|Not implemented in SW|
10+
|**Deconvolution**||&#10004;|&#10004;|
11+
||With padding|Not implemented in SW|Not implemented in SW|
12+
||Winograd|Not implemented in SW|Not implemented in SW|
13+
|**Fully Connected**||&#10004;|&#10004;|
14+
||Winograd|Not implemented in SW|Not implemented in SW|
15+
|**Group Convolution**||&#10004;|Not implemented in SW|
16+
||Winograd|&#10004;|Not implemented in SW|
17+
|**Pooling**||&#10004;|&#10004;|
18+
||Max|&#10004;|&#10004;|
19+
||Min|&#10004;|&#10004;|
20+
||Avg|&#10004;|&#10004;|
21+
||Inclusive padding|&#10004;|&#10004;|
22+
||Exclusive padding|Not supported in HW| Not supported in HW|
23+
|**Activation**||||
24+
||Bias|&#10004;|&#10004;|
25+
||BatchNorm|&#10004;|&#10004;|
26+
||Scale|&#10004;|&#10004;|
27+
||Sigmoid|&#10004;|Not implemented in SW|
28+
||Tanh|&#10004;|Not implemented in SW|
29+
||EltWise SUM|&#10004;|&#10004;|
30+
||EltWise SUB|Not supported in HW|Not supported in HW|
31+
||EltWise MIN|&#10004;|Not implemented in SW|
32+
||EltWise MAX|&#10004;|Not implemented in SW|
33+
|**LRN**||&#10004;|Not implemented in SW|
34+
35+
### Networks verification report
36+
37+
|Network &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Configuration &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|fp16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |int8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |
38+
|-------|----|----|----|
39+
|MNIST|nv_full,nv_large,nv_small|Verified|Verified|
40+
|ResNet-18|nv_full,nv_large,nv_small|Verified|Verified|
41+
|ResNet-50|nv_full,nv_large,nv_small|Verified|Verified|
42+
43+
### Known limitations
44+
- Not supported in HW
45+
- Dilation with Winograd
46+
- EltWise SUB
47+
- Pooling and convolution layers where pad size is greater than kernel size
48+
- Not implemented in SW
49+
- Deconvolution with strides > 32
50+
- Deconvolution with input/output padding
51+
52+

LowPrecision.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Low precision support in NVDLA
2+
3+
Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. NVDLA architecture includes INT8 (8-bit) precision support. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.
4+
5+
### NVDLA architecture for INT8 precision support includes the following:
6+
- INT8 input/output data read/write
7+
- 32-bit internal pipeline, avoids saturation in mathematical computations
8+
- Per-tensor input scaling using input converters
9+
- Per-tensor and per-kernel output re-scaling using output converters
10+
11+
### Steps to generate INT8 quantized model:
12+
- Analyze the dynamic range of per-layer tensors and calculate scale factors
13+
- Quantize model weights and determine the converter parameters using scale factors
14+
15+
#### Analyze dynamic range of per-layer tensors and calculate scale factors
16+
A calibration tool can collect the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. The NVDLA Compiler uses the following JSON schema to import scale factors.
17+
18+
##### JSON schema for calibration table
19+
20+
```
21+
{
22+
"type" : "object",
23+
"description": "JSON schema for calibration table",
24+
"layer" : {
25+
"type": "array",
26+
"description": "per-layer scale factor for output tensor, scale factor can be described using either scale or min/max",
27+
"oneOf": ["scale", {"min", "max"}],
28+
"scale": {
29+
"type": "float",
30+
"description": "scale value calibrated for output tensor of layer"
31+
},
32+
"min": {
33+
"type": float",
34+
"description": "minimum value of the source precision dynamic range for output tensor of layer"
35+
},
36+
"max": {
37+
"type": "float",
38+
"description": "maximum value of the source precision dynamic range for output tensor of layer"
39+
},
40+
"offset": {
41+
"type" : "integer",
42+
"description": "offset used for asymmetric scaling, it should be 0 for symmetric scaling"
43+
}
44+
}
45+
}
46+
```
47+
48+
##### Sample calibration table for first few layers of ResNet-50 using symmetric scaling
49+
50+
```
51+
{
52+
"data" : {
53+
"scale": 0.00781453,
54+
"min": 0,
55+
"max": 0,
56+
"offset": 0
57+
},
58+
"conv1" : {
59+
"scale": 0.0891214,
60+
"min": 0,
61+
"max": 0,
62+
"offset": 0
63+
},
64+
"pool1" : {
65+
"scale": 0.0891214,
66+
"min": 0,
67+
"max": 0,
68+
"offset": 0
69+
},
70+
"res2a_branch1" : {
71+
"scale": 0.119546,
72+
"min": 0,
73+
"max": 0,
74+
"offset": 0
75+
}
76+
}
77+
```
78+
79+
#### Quantize model weights and determine the converter parameters
80+
81+
The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.

0 commit comments

Comments
 (0)