1- Arithmetic Details
2- ------------------
1+ .. _mli_fpd_fmt :
2+
3+ MLI Fixed-Point Data Format
4+ ---------------------------
35
46 The MLI Library is targeting ARCv2DSP based platform and implies
57 efficient usage of its DSP Features. For this reason, there is some
68 specificity of basic data types and arithmetical operations using it
79 in comparison with operations using float-point values.
810
9- .. _mli_fpd_fmt :
10-
11- MLI Fixed-Point Data Format
12- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
13-
1411 Default MLI Fixed-point data format (represented by tensors of
1512 ``MLI_EL_FX_8 `` and ``MLI_EL_FX_16 `` element types) reflects general signed
1613 values interpreted by typical Q notation [1,2]. The following
@@ -23,14 +20,23 @@ MLI Fixed-Point Data Format
2320 non-sign bits are assumed to hold an integer part.
2421
2522.. note ::
26- For more information regarding Q notation, see entries [1] & [2] of :ref: `refs `.
27-
23+ For more information regarding Q notation, see
24+
25+ - `Q Notation `_
26+
27+ - `Q Notation tips and tricks `_
28+
29+ .. _Q notation : https://en.wikipedia.org/wiki/Q_(number_format)
30+
31+ .. _Q Notation tips and tricks : http://x86asm.net/articles/fixed-point-arithmetic-and-tricks/
32+
33+ ..
2834
2935Data storage
30- ^^^^^^^^^^^^
36+ ~~~~~~~~~~~~
3137
3238 The container of the tensor’s values is always signed two’s
33- complemented integer numbers: 8 bit for ``MLI_EL_FX_8 `` (also referred to as ``fx8 ``) and
39+ complemented integer numbers: 8 bit for ``MLI_EL_FX_8 `` (also referred to as ``fx8 ``) and
3440 16 bit for ``MLI_EL_FX_16 `` (also referred to as ``fx16 ``). ``mli_tensor `` keeps only number
3541 of fractional bits (see ``fx.frac_bits `` in :ref: `mli_el_prm_u `),
3642 which corresponds to the second designation above.
@@ -82,7 +88,7 @@ Data storage
8288.. _op_fx_val :
8389
8490Operations on FX values
85- ^^^^^^^^^^^^^^^^^^^^^^^
91+ ~~~~~~~~~~~~~~~~~~~~~~~
8692
8793 Arithmetical operations are actually performed on signed integers
8894 according to the rules for two’s complemented integer numbers. Q
@@ -92,7 +98,7 @@ Operations on FX values
9298.. _data_fmt_conv :
9399
94100Data Format Conversion
95- ''''''''''''''''''''''
101+ ^^^^^^^^^^^^^^^^^^^^^^
96102
97103 Conversion between real values and fx value might be performed
98104 according to the following formula:
@@ -163,15 +169,15 @@ Where:
163169 ``Round(0x24>>(4–1)) = Round(0x24>>3) = (0x24 + (1<<(3-1))) >> 3 = 0x28>>3 = 0x5 in Q.1(2.5) ``
164170
165171Addition and Subtraction
166- ''''''''''''''''''''''''
172+ ^^^^^^^^^^^^^^^^^^^^^^^^
167173
168174 In fixed point arithmetic, addition and subtraction are performed as
169175 they are for general integer values but only when the input values
170176 are in the same format. Otherwise, ensure that you perform conversion
171177 to bring the input values into the same format before operation.
172178
173179Multiplication
174- ''''''''''''''
180+ ^^^^^^^^^^^^^^
175181
176182 For multiplication input operands do not have to be of the same
177183 format. The width of the integer part of the result is the sum of
@@ -203,7 +209,7 @@ Multiplication
203209 result.
204210
205211Division
206- ''''''''
212+ ^^^^^^^^
207213
208214 For division, input operands also do not have to be of the same
209215 format. The result has a format containing the difference of bits in
@@ -214,8 +220,6 @@ Division
214220 - For a dividend ``x `` in Q16.16 format and a divisor y in Q7.10 format,
215221 the format of the result ``x/y `` is Q(16-7).(16-10), or Q9.6 format.
216222
217- \
218-
219223 - For a dividend ``x `` in Q7.8 format and a divisor y in Q3.12 format, the
220224 format of the result ``x/y `` is in Q4.-4 format.
221225
@@ -229,7 +233,7 @@ Division
229233 significant bits) is required.
230234
231235Accumulation
232- ''''''''''''
236+ ^^^^^^^^^^^^
233237
234238 Even single addition might result in overflow if all bits of operands
235239 are used and both of them hold the maximum (or minimum) values. It
@@ -258,14 +262,14 @@ Accumulation
258262 operation.
259263
260264ARCv2DSP Implementation Specifics
261- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
265+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262266
263267 The MLI Library is designed keeping performance in mind as one of the
264268 main goals. This section deals with manual model adaptation of MLI
265269 library.
266270
267271Bias for MAC-based Kernels
268- ''''''''''''''''''''''''''
272+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
269273
270274 MAC based kernels (convolutions, fully connected, recurrent, etc)
271275 typically use several input tensors including input feature map,
@@ -285,7 +289,7 @@ Bias for MAC-based Kernels
285289 must be less or equal to 10 (since 7+3=10) for correct bias.
286290
287291Configurability of Output Tensors Fractional Bits
288- ''''''''''''''''''''''''''''''''''''''''''''''''''
292+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
289293
290294 Not all primitives provide possibility to configure output tensor
291295 format – some of them derive it based on inputs or used algorithm,
@@ -311,7 +315,7 @@ Configurability of Output Tensors Fractional Bits
311315 Output configurability is specified in description for each primitive.
312316
313317Quantization: Influence of Accumulator Bit Depth
314- ''''''''''''''''''''''''''''''''''''''''''''''''
318+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315319
316320 The MLI Library applies neither saturation nor post-multiplication
317321 shift with rounding in accumulation. Saturation is performed only for
0 commit comments