Skip to content

Commit d6735fb

Browse files
committed
QUANTIZED_OP.md fix markdown syntax
1 parent 7d8de8a commit d6735fb

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

QUANTIZED_OP.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ Some operators (in some frameworks) would invoke intermediate floating-point rep
1616
ONNXruntime generally handles output_scale by [MlasRequantizeOutput](https://github.com/microsoft/onnxruntime/blob/8d737f977056444a307f1b7f0bcd402fba62d790/onnxruntime/core/mlas/lib/quantize.cpp#L357)(int Input, int Output, float scale); which uses intermediate floating-point representation -- `float`.
1717

1818
## Quantized Convolutions
19-
`output_multiplier` = `input_scale` * `weight_scale` / `output_scale`
20-
Reminded that TFLite uses <double>, while ONNXruntime and Caffe2 use <float> for scales.
19+
`output_multiplier` = `input_scale` * `weight_scale` / `output_scale`.
20+
Reminded that TFLite uses `double`, while ONNXruntime and Caffe2 use `float` for scales.
2121
### TFLite
2222
The quantized multiplier is calculated as (the `shift` is a power-of-two normalizer to normalize output_multiplier in [0.5,1) )
2323
```cpp=
@@ -68,7 +68,7 @@ But it applies the `Roundings` in **A1**.
6868
When I try to match bit-exactness result, the combination of `PerTensor-A1` and `PerChannel-B2` is found by brute-force.
6969

7070
### ONNX runtime
71-
It casts `<int>acc` to `<float>`, multiply by <float>output_multiplier, and requantize the result.
71+
It casts `<int>acc` to `<float>`, multiply by `<float>output_multiplier`, and requantize the result.
7272

7373
### Caffe2
7474
It uses single-precision scales, the computation is the same as mentioned **A2**.

0 commit comments

Comments
 (0)