Skip to content

Commit ed323b6

Browse files
authored
Merge pull request #345 from foss-for-synopsys-dwc-arc-processors/mli2_doc_recent_chng
Recent changes of MLI 2 docs
2 parents af53d0b + f6979d1 commit ed323b6

33 files changed

+695
-242
lines changed

doc/documents/data_movement/data_movement.rst

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,9 @@ The size of the array is defined by ``MLI_MAX_RANK``.
120120
| ``perm_dim`` | ``uint8_t[]`` | Array to specify reordering of dimensions. For example, to convert |
121121
| | | from CHW layout to HWC layout this array would be {1, 2, 0}. |
122122
+---------------------+----------------+---------------------------------------------------------------------+
123-
| ``padding_pre`` | ``uint8_t[]`` | Number of padded samples before the data for each dimension. |
124-
| | | Padded samples is set to zero. |
123+
| ``padding_pre`` | ``uint8_t[]`` | Number of padded samples before the input data for each dimension. |
124+
| | | Padding is a virtual extension of the input tensor. |
125+
| | | Padded samples are set to zero. |
125126
+---------------------+----------------+---------------------------------------------------------------------+
126127
| ``padding_post`` | ``uint8_t[]`` | Number of padded samples after the data for each dimension. |
127128
| | | Padded samples is set to zero. |
@@ -361,7 +362,7 @@ initialization. Table :ref:`t_mli_mov_prep` describes the parameters of this fun
361362
.. code:: c
362363
363364
mli_status
364-
mli_mov_prepare(mli_mov_handle_t* h, mli_tensor* src, mli_mov_cfg_t* cfg, mli_tensor* dst);
365+
mli_mov_prepare(mli_mov_handle_t* h, const mli_tensor* src, const mli_mov_cfg_t* cfg, mli_tensor* dst);
365366
..
366367
367368
.. _t_mli_mov_prep:
@@ -399,7 +400,7 @@ an assert is triggered.
399400
.. code:: c
400401
401402
mli_status
402-
mli_mov_start(mli_mov_handle_t* h, mli_tensor* src, mli_mov_cfg_t* cfg, mli_tensor* dst);
403+
mli_mov_start(mli_mov_handle_t* h, const mli_tensor* src, const mli_mov_cfg_t* cfg, mli_tensor* dst);
403404
..
404405
405406
.. _t_mli_mov_start:
@@ -493,8 +494,8 @@ This function takes a pointer to the handle used for ``mli_mov_prepare`` and ret
493494
after the transaction completes or in case of an error.
494495

495496

496-
Restrictions for source and destination tensors
497-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
497+
Restrictions for Source and Destination Tensors
498+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
498499

499500
``src`` and ``dst`` tensors for all functions of asynchronous data move set must comply to the following conditions:
500501

-15 KB
Loading
-10.6 KB
Loading

doc/documents/mli_api_data/data_formats.rst

Lines changed: 15 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,8 @@ specific scale ratios:
180180
Round\left( \left( \frac{x_{fp32}}{(s_{fx}*2^{- n})} \right) + z \right) = \ Round\left( \left( \frac{x_{fp32}}{(1*2^{- n})} \right) + 0 \right) = Round\left( x_{fp32}*2^{n} \right) = x_{{fx}}
181181
..
182182
183+
.. _quant_accum_infl:
184+
183185
Quantization: Influence of Accumulator Bit Depth
184186
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185187

@@ -195,17 +197,19 @@ Number of available bits depends on the operands’ types and the platform.
195197

196198
- ``sa8`` operands with 32-bit accumulator uses 1 sign bit and 31 significant bits. ``sa8`` operands
197199
have 1 sign and 7 significant bits. Single multiplication of such operands results in
198-
7 + 7 = 14 significant bits for output. Thus for MAC-based kernels, 17 accumulation bits
199-
(as 31-(7+7)=17) are available which can be used to perform up to 2^17 = 131072 operations
200-
without overflow. For simple accumulation, 31 – 7 = 24 bits are available which are guaranteed
201-
to perform up to 2^24 = 16777216 operations without overflow.
200+
7 + 7 + 1 = 15 significant bits for output. Here one extra bit is required to handle multiplication
201+
of max negative values (-32768 * -32768 = 1073741824 – the value of 31 bits depth).
202+
Thus for MAC-based kernels, 16 accumulation bits (as 31-(7+7+1)=16) are available which can be used to
203+
perform up to 2^16 = 65536 operations without overflow. For simple accumulation, 31 – 7 = 24 bits are
204+
available which are guaranteed to perform up to 2^24 = 16777216 operations without overflow.
202205

203206
- ``fx16`` operands with 40-bit accumulator is uses 1 sign bit and 39 significant bits. ``fx16``
204207
operands have 1 sign and 15 significant bits. A multiplication of such operands results in
205-
15 + 15 = 30 significant bits for output. For MAC-based kernels, 39 – (15+15) = 9 accumulation
206-
bits are available, which can be used to perform up to 2^9 = 512 operations without overflow.
207-
For simple accumulation, 39 – 15 = 24 bits are available which perform up to 2^24 = 16777216
208-
operations without overflow.
208+
15 + 15 + 1 = 31 significant bits for output. Here one extra bit is required to handle multiplication
209+
of max negative values (-128 * -128 = 16384 – the value of 15 bits depth). For MAC-based kernels,
210+
39 – (15+15+1) = 8 accumulation bits are available, which can be used to perform up to 2^8 = 256
211+
operations without overflow. For simple accumulation, 39 – 15 = 24 bits are available which
212+
perform up to 2^24 = 16777216 operations without overflow.
209213
..
210214
211215
In general, the number of accumulations required for one output value calculation can be
@@ -220,23 +224,7 @@ estimated in advance.
220224
between operands.
221225
..
222226
223-
The file ``mli_config.h`` exports a set of defines that hold the number of accumulator bits
224-
for the different operand combinations. These values can vary depending upon the selected
225-
hardware platform. :ref:`pf_sp_acc_def` lists the defines.
226-
227-
.. _pf_sp_acc_def:
228-
.. table:: Platform Specific Accumulator Bit Defines
229-
:align: center
230-
:widths: 60, 30
231-
232-
+-------------------------+---------------+
233-
| **Define** | **Operands** |
234-
+=========================+===============+
235-
| MLI_ACCU_BITS_SA8_SA8 | sa8 x sa8 |
236-
+-------------------------+---------------+
237-
| MLI_ACCU_BITS_FX16_FX16 | fx16 x fx16 |
238-
+-------------------------+---------------+
239-
| MLI_ACCU_BITS_FX16_FX8 | fx16 x fx8 |
240-
+-------------------------+---------------+
241-
..
227+
Special functions to determine the number of the available accumulator guard bits for the different operand
228+
combination are provided. These values can be different when compiled on a different platform.
229+
These functions are defined in :ref:`num_of_accu_bits` section.
242230

doc/documents/mli_api_data/kernel_sp_conf_struct.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,9 @@ describe fields of existing MLI configuration structures:
3030
- Table :ref:`t_mli_prelu_cfg_desc`
3131

3232
- Table :ref:`t_mli_mov_cfg_desc`
33-
34-
- Table :ref:`t_mli_sub_tensor_cfg_desc`
33+
34+
..
35+
- Table :ref:`t_mli_sub_tensor_cfg_desc`
3536
3637

3738

doc/documents/mli_api_data/mli_api_data.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ describe how function parameters are grouped into structures.
1616
data_formats.rst
1717
data_layouts.rst
1818
mli_tensor_data_struct.rst
19+
mli_lut_data_struct.rst
1920
kernel_sp_conf_struct.rst
2021
error_codes.rst
2122
debug_modes.rst
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
.. _mli_lut_data_struct:
2+
3+
mli_lut Data Structure
4+
--------------------------
5+
6+
Several functions use a look-up table (LUT) to perform data transformation. The LUT represents a function in a
7+
table form that can be used to transform input values (function argument) to output values (function result).
8+
The ``mli_lut`` structure is a representation of such a table.
9+
10+
The ``mli_lut`` struct describes the data in the LUT, including the format of its input and output.
11+
12+
13+
.. code:: c
14+
15+
typedef struct _mli_lut{
16+
mli_data_container data;
17+
mli_element_type type;
18+
int32_t length;
19+
int32_t in_frac_bits;
20+
int32_t out_frac_bits;
21+
int32_t input_offset;
22+
int32_t output_offset;
23+
} mli_lut;
24+
..
25+
26+
See :ref:`mli_tens_data_struct` for the definition of ``mli_data_container`` and ``mli_element_type`` structures.
27+
The following table describes the fields in the mli_lut structure.
28+
29+
.. _mli_lut_struct_table:
30+
.. table:: mli_lut Structure Field Descriptions
31+
:align: center
32+
:widths: 50, 50, 130
33+
34+
+-------------------+------------------------+-----------------------------------------------------------------------------+
35+
| **Field name** | **type** | **Comment** |
36+
+===================+========================+=============================================================================+
37+
| | | This field has a union of different possible data container types. |
38+
| ``data`` | ``mli_data_container`` | Pointer of specified type (see the type field in this table) should point |
39+
| | | to an array with the LUT table data. |
40+
+-------------------+------------------------+-----------------------------------------------------------------------------+
41+
| ``data.capacity`` | ``uint32_t`` | Size in bytes of the allocated memory that the data field points to. |
42+
+-------------------+------------------------+-----------------------------------------------------------------------------+
43+
| ``type`` | ``mli_element_type`` | Enum depicting the type of the element stored in the data field. |
44+
| | | Values in this enum are listed in section :ref:`mli_tens_data_struct`. |
45+
| | | Only ``MLI_EL_FX_8`` and ``MLI_EL_FX_16`` entities are supported. |
46+
+-------------------+------------------------+-----------------------------------------------------------------------------+
47+
| ``length`` | ``int32_t`` | Number of values stored in the LUT table |
48+
+-------------------+------------------------+-----------------------------------------------------------------------------+
49+
| ``in_frac_bits`` | ``int32_t`` | Number of fractional bits for the LUT input (argument) |
50+
+-------------------+------------------------+-----------------------------------------------------------------------------+
51+
| ``out_frac_bits`` | ``int32_t`` | Number of fractional bits for the LUT output (result) |
52+
+-------------------+------------------------+-----------------------------------------------------------------------------+
53+
| ``input_offset`` | ``int32_t`` | Offset of input argument which is added before applying the LUT function. |
54+
+-------------------+------------------------+-----------------------------------------------------------------------------+
55+
| ``output_offset`` | ``int32_t`` | Offset of output which is subtracted from LUT function result. |
56+
+-------------------+------------------------+-----------------------------------------------------------------------------+
57+
58+
..
59+

doc/documents/mli_api_data/mli_tensor_data_struct.rst

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ shape of this array, its data format, and the way it is organized in memory.
1212
typedef struct mli_tensor {
1313
mli_data_container data;
1414
uint32_t shape[MLI_MAX_RANK];
15-
uint32_t mem_stride[MLI_MAX_RANK];
15+
int32_t mem_stride[MLI_MAX_RANK];
1616
uint32_t rank;
1717
mli_element_type el_type;
1818
mli_element_params el_params;
@@ -86,7 +86,7 @@ and ``mli_data_container`` is defined as follows:
8686
..
8787
8888

89-
Table :ref:`mli_tnsr_struc` describes the fields in the mli_tensor structure.
89+
The table :ref:`mli_tnsr_struc` describes the fields in the mli_tensor structure.
9090

9191
.. _mli_tnsr_struc:
9292
.. table:: mli_tensor Structure Field Descriptions
@@ -109,23 +109,40 @@ Table :ref:`mli_tnsr_struc` describes the fields in the mli_tensor structure.
109109
| | | scalar tensors (tensors with a single element), this field is not a |
110110
| | | pointer, but it contains the data itself. |
111111
+-------------------+------------------------+-----------------------------------------------------------------------------+
112-
| ``data.capacity`` | ``unit32_t`` | Size in bytes of the allocated memory that the data field points to. In |
112+
| ``data.capacity`` | ``uint32_t`` | Size in bytes of the allocated memory that the data field points to. In |
113113
| | | case there is no buffer attached (``rank == 0``), the capacity is set to 0. |
114114
+-------------------+------------------------+-----------------------------------------------------------------------------+
115-
| ``shape`` | ``unit32_t[]`` | Array with tensor dimensions. Dimensions are stored in order starting from |
115+
| ``shape`` | ``uint32_t[]`` | Array with tensor dimensions. Dimensions are stored in order starting from |
116116
| | | the one with the largest stride between the data portions. |
117117
| | | For example, for tensor T of size (channels, height width) stored in HWC |
118118
| | | layout, shape[0] = height, shape[1] = width, shape[2] = channels. Shape[3] |
119119
| | | is unused. The size of the array is defined by ``MLI_MAX_RANK*``. |
120120
+-------------------+------------------------+-----------------------------------------------------------------------------+
121-
| ``mem_stride`` | ``unit32_t[]`` | Array with the distance (in elements) to the next element in the same |
122-
| | | dimension. To compute the size in bytes, the number of elements needs to be |
121+
| ``mem_stride`` | ``int32_t[]`` | Array with the distance (in elements) to the next element in the same |
122+
| | | dimension. Positive values are supported only. |
123+
| | | To compute the size in bytes, the number of elements needs to be |
123124
| | | multiplied by the bytes per element. For example, for a matrix |
124125
| | | A(rows,columns), ``mem_stride[1]`` contains the distance to the next |
125126
| | | element (=1 in this example), and ``mem_stride[0]`` contains the distance |
126127
| | | from one row to the next (=columns in this example). The size of the array |
127128
| | | is defined by ``MLI_MAX_RANK*``.If the mem_stride is set to 0, it is |
128129
| | | computed from the shape. |
130+
| | | |
131+
| | | Manually-set values of ``mem_stride`` array must decrease gradually and |
132+
| | | must not be less than if they would be computed from the shape. For |
133+
| | | example, for a tensor of shape :math:`[Height, Width, Channels)`: |
134+
| | | |
135+
| | | - ``mem_stride[0] >= 1 x Channels x Width`` |
136+
| | | AND ``mem_stride[0] >= mem_stride[1]`` |
137+
| | | |
138+
| | | - ``mem_stride[1] >= 1*Channels`` AND ``mem_stride[1] >= mem_stride[2]`` |
139+
| | | |
140+
| | | - ``mem_stride[2] >= 1`` |
141+
| | | |
142+
| | | In case the mem_stride is computed from the shape, the kernel does not |
143+
| | | update this field in the tensor struct. The only exception is the |
144+
| | | ``mli_move`` function, which can write the ``mem_stride`` field of the |
145+
| | | ``dst`` tensor. |
129146
+-------------------+------------------------+-----------------------------------------------------------------------------+
130147
| ``rank`` | ``uint32_t`` | Number of dimensions of this tensor (Must be less or equal to |
131148
| | | ``MLI_MAX_RANK*``) |
@@ -169,12 +186,12 @@ channels in the tensor ``(array_size = shape[dim])``.
169186
| | | - ``sa.dim >= 0``: Pointer to an array of zero points relating to |
170187
| | | configured dimension (``sa.dim``). |
171188
+------------------------+------------------------+-----------------------------------------------------------------------------+
172-
| ``sa.scale`` | ``mli_data_container`` | 16-bit signed scale factors. |
189+
| ``sa.scale`` | ``mli_data_container`` | 16-bit signed scale factors. Only positive scale factors are supported. |
173190
| | | |
174-
| | | - ``sa.dim < 0``: Single value for all data in tensor |
191+
| | | - If ``sa.dim < 0``: ``sa.scale`` is a single value for all data in tensor |
175192
| | | |
176-
| | | - ``sa.dim >= 0``: Pointer to an array of scale factors related to |
177-
| | | configured dimension (``sa.dim``). |
193+
| | | - If ``sa.dim >= 0``: ``sa.scale`` is a pointer to an array of |
194+
| | | scale factors related to configured dimension (``sa.dim``). |
178195
+------------------------+------------------------+-----------------------------------------------------------------------------+
179196
| ``sa.dim`` | ``int32_t`` | Tensor dimension to which the arrays of quantization parameters apply |
180197
+------------------------+------------------------+-----------------------------------------------------------------------------+

0 commit comments

Comments
 (0)