foss-for-synopsys-dwc-arc-processors
diff --git a/‎doc/documents/data_movement/data_movement.rst‎
Lines changed: 7 additions & 6 deletions b/‎doc/documents/data_movement/data_movement.rst‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎doc/documents/images/gru_schematic.png‎
-15 KB b/‎doc/documents/images/gru_schematic.png‎
-15 KB
diff --git a/‎doc/documents/images/lstm_schematic.png‎
-10.6 KB b/‎doc/documents/images/lstm_schematic.png‎
-10.6 KB
diff --git a/‎doc/documents/mli_api_data/data_formats.rst‎
Lines changed: 15 additions & 27 deletions b/‎doc/documents/mli_api_data/data_formats.rst‎
Lines changed: 15 additions & 27 deletions
diff --git a/‎doc/documents/mli_api_data/kernel_sp_conf_struct.rst‎
Lines changed: 3 additions & 2 deletions b/‎doc/documents/mli_api_data/kernel_sp_conf_struct.rst‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎doc/documents/mli_api_data/mli_api_data.rst‎
Lines changed: 1 addition & 0 deletions b/‎doc/documents/mli_api_data/mli_api_data.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/documents/mli_api_data/mli_lut_data_struct.rst‎
Lines changed: 59 additions & 0 deletions b/‎doc/documents/mli_api_data/mli_lut_data_struct.rst‎
Lines changed: 59 additions & 0 deletions
diff --git a/‎doc/documents/mli_api_data/mli_tensor_data_struct.rst‎
Lines changed: 27 additions & 10 deletions b/‎doc/documents/mli_api_data/mli_tensor_data_struct.rst‎
Lines changed: 27 additions & 10 deletions
@@ -120,8 +120,9 @@ The size of the array is defined by ``MLI_MAX_RANK``.
    | ``perm_dim``        | ``uint8_t[]``  | Array to specify reordering of dimensions. For example, to convert  |
    |                     |                | from CHW layout to HWC layout this array would be {1, 2, 0}.        |
    +---------------------+----------------+---------------------------------------------------------------------+
-   | ``padding_pre``     | ``uint8_t[]``  | Number of padded samples before the data for each dimension.        |
-   |                     |                | Padded samples is set to zero.                                      |
+   | ``padding_pre``     | ``uint8_t[]``  | Number of padded samples before the input data for each dimension.  |
+   |                     |                | Padding is a virtual extension of the input tensor.                 |
+   |                     |                | Padded samples are set to zero.                                     |
    +---------------------+----------------+---------------------------------------------------------------------+
    | ``padding_post``    | ``uint8_t[]``  | Number of padded samples after the data for each dimension.         |
    |                     |                | Padded samples is set to zero.                                      |
@@ -361,7 +362,7 @@ initialization. Table :ref:`t_mli_mov_prep` describes the parameters of this fun
 .. code:: c
 
    mli_status
-   mli_mov_prepare(mli_mov_handle_t* h, mli_tensor* src, mli_mov_cfg_t* cfg, mli_tensor* dst);
+   mli_mov_prepare(mli_mov_handle_t* h, const mli_tensor* src, const mli_mov_cfg_t* cfg, mli_tensor* dst);
 ..
 
 .. _t_mli_mov_prep:
@@ -399,7 +400,7 @@ an assert is triggered.
 .. code:: c
 
    mli_status
-   mli_mov_start(mli_mov_handle_t* h, mli_tensor* src, mli_mov_cfg_t* cfg, mli_tensor* dst);
+   mli_mov_start(mli_mov_handle_t* h, const mli_tensor* src, const mli_mov_cfg_t* cfg, mli_tensor* dst);
 ..
 
 .. _t_mli_mov_start:
@@ -493,8 +494,8 @@ This function takes a pointer to the handle used for ``mli_mov_prepare`` and ret
 after the transaction completes or in case of an error.
 
 
-Restrictions for source and destination tensors
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Restrictions for Source and Destination Tensors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 ``src`` and ``dst`` tensors for all functions of asynchronous data move set must comply to the following conditions:
 
 
@@ -180,6 +180,8 @@ specific scale ratios:
    Round\left( \left( \frac{x_{fp32}}{(s_{fx}*2^{- n})} \right) + z \right) = \ Round\left( \left( \frac{x_{fp32}}{(1*2^{- n})} \right) + 0 \right) = Round\left( x_{fp32}*2^{n} \right) = x_{{fx}}
 ..
 
+.. _quant_accum_infl:
+
 Quantization: Influence of Accumulator Bit Depth   
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -195,17 +197,19 @@ Number of available bits depends on the operands’ types and the platform.
 
    - ``sa8`` operands with 32-bit accumulator uses 1 sign bit and 31 significant bits. ``sa8`` operands 
      have 1 sign and 7 significant bits. Single multiplication of such operands results in 
-     7 + 7 = 14 significant bits for output. Thus for MAC-based kernels, 17 accumulation bits 
-     (as 31-(7+7)=17) are available which can be used to perform up to 2^17 = 131072 operations 
-     without overflow. For simple accumulation, 31 – 7 = 24 bits are available which are guaranteed 
-     to perform up to 2^24 = 16777216 operations without overflow.
+     7 + 7 + 1 = 15 significant bits for output. Here one extra bit is required to handle multiplication 
+     of max negative values (-32768 * -32768 = 1073741824 – the value of 31 bits depth). 
+     Thus for MAC-based kernels, 16 accumulation bits (as 31-(7+7+1)=16) are available which can be used to
+     perform up to 2^16 = 65536 operations without overflow. For simple accumulation, 31 – 7 = 24 bits are
+     available which are guaranteed to perform up to 2^24 = 16777216 operations without overflow.
 
    - ``fx16`` operands with 40-bit accumulator is uses 1 sign bit and 39 significant bits. ``fx16`` 
      operands have 1 sign and 15 significant bits. A multiplication of such operands results in 
-     15 + 15 = 30 significant bits for output. For MAC-based kernels, 39 – (15+15) = 9 accumulation 
-     bits are available, which can be used to perform up to 2^9 = 512 operations without overflow. 
-     For simple accumulation, 39 – 15 = 24 bits are available which perform up to 2^24 = 16777216 
-     operations without overflow.
+     15 + 15 + 1 = 31 significant bits for output. Here one extra bit is required to handle multiplication 
+     of max negative values (-128 * -128 = 16384 – the value of 15 bits depth). For MAC-based kernels, 
+     39 – (15+15+1) = 8 accumulation bits are available, which can be used to perform up to 2^8 = 256 
+     operations without overflow. For simple accumulation, 39 – 15 = 24 bits are available which 
+     perform up to 2^24 = 16777216 operations without overflow.
 ..
 
 In general, the number of accumulations required for one output value calculation can be  
@@ -220,23 +224,7 @@ estimated in advance.
      between operands.
 ..
 
-The file ``mli_config.h`` exports a set of defines that hold the number of accumulator bits 
-for the different operand combinations. These values can vary depending upon the selected
-hardware platform. :ref:`pf_sp_acc_def` lists the defines. 
-
-.. _pf_sp_acc_def:
-.. table:: Platform Specific Accumulator Bit Defines
-   :align: center
-   :widths: 60, 30 
-   
-   +-------------------------+---------------+
-   | **Define**              | **Operands**  |
-   +=========================+===============+
-   | MLI_ACCU_BITS_SA8_SA8   | sa8 x sa8     |
-   +-------------------------+---------------+
-   | MLI_ACCU_BITS_FX16_FX16 | fx16 x fx16   |
-   +-------------------------+---------------+
-   | MLI_ACCU_BITS_FX16_FX8  | fx16 x fx8    |
-   +-------------------------+---------------+   
-..
+Special functions to determine the number of the available accumulator guard bits for the different operand 
+combination are provided. These values can be different when compiled on a different platform. 
+These functions are defined in :ref:`num_of_accu_bits` section.
 
@@ -30,8 +30,9 @@ describe fields of existing MLI configuration structures:
  - Table :ref:`t_mli_prelu_cfg_desc`
 
  - Table :ref:`t_mli_mov_cfg_desc`
- 
- - Table :ref:`t_mli_sub_tensor_cfg_desc`
+
+..
+   - Table :ref:`t_mli_sub_tensor_cfg_desc`
 
 
 
 
@@ -16,6 +16,7 @@ describe how function parameters are grouped into structures.
    data_formats.rst
    data_layouts.rst
    mli_tensor_data_struct.rst
+   mli_lut_data_struct.rst
    kernel_sp_conf_struct.rst
    error_codes.rst
    debug_modes.rst   
 
@@ -0,0 +1,59 @@
+.. _mli_lut_data_struct:
+
+mli_lut Data Structure
+--------------------------
+
+Several functions use a look-up table (LUT) to perform data transformation.  The LUT represents a function in a 
+table form that can be used to transform input values (function argument) to output values (function result). 
+The ``mli_lut`` structure is a representation of such a table.
+
+The ``mli_lut`` struct describes the data in the LUT, including the format of its input and output.
+
+
+.. code:: c
+
+   typedef struct _mli_lut{
+      mli_data_container data;
+      mli_element_type type;
+      int32_t length;
+      int32_t in_frac_bits;
+      int32_t out_frac_bits;
+      int32_t input_offset;
+      int32_t output_offset;
+   } mli_lut;
+..
+
+See :ref:`mli_tens_data_struct` for the definition of ``mli_data_container`` and ``mli_element_type`` structures. 
+The following table describes the fields in the mli_lut structure.
+   
+.. _mli_lut_struct_table:  
+.. table:: mli_lut Structure Field Descriptions
+   :align: center
+   :widths: 50, 50, 130 
+   
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | **Field name**    | **type**               | **Comment**                                                                 |
+   +===================+========================+=============================================================================+
+   |                   |                        | This field has a union of different possible data container types.          |
+   |   ``data``        | ``mli_data_container`` | Pointer of specified type (see the type field in this table) should point   |
+   |                   |                        | to an array with the LUT table data.                                        |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``data.capacity`` | ``uint32_t``           | Size in bytes of the allocated memory that the data field points to.        |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``type``          | ``mli_element_type``   | Enum depicting the type of the element stored in the data field.            |
+   |                   |                        | Values in this enum are listed in section :ref:`mli_tens_data_struct`.      |
+   |                   |                        | Only ``MLI_EL_FX_8`` and ``MLI_EL_FX_16`` entities are supported.           |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``length``        | ``int32_t``            | Number of values stored in the LUT table                                    |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``in_frac_bits``  | ``int32_t``            | Number of fractional bits for the LUT input (argument)                      |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``out_frac_bits`` | ``int32_t``            | Number of fractional bits for the LUT output (result)                       |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``input_offset``  | ``int32_t``            | Offset of input argument which is added before applying the LUT function.   |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+   | ``output_offset`` | ``int32_t``            | Offset of output which is subtracted from LUT function result.              |
+   +-------------------+------------------------+-----------------------------------------------------------------------------+
+     
+..
+   
@@ -12,7 +12,7 @@ shape of this array, its data format, and the way it is organized in memory.
    typedef struct mli_tensor {
       mli_data_container data;
       uint32_t shape[MLI_MAX_RANK];
-      uint32_t mem_stride[MLI_MAX_RANK];
+      int32_t mem_stride[MLI_MAX_RANK];
       uint32_t rank;
       mli_element_type el_type;
       mli_element_params el_params;
@@ -86,7 +86,7 @@ and ``mli_data_container`` is defined as follows:
 ..
 
 
-Table :ref:`mli_tnsr_struc` describes the fields in the mli_tensor structure.
+The table :ref:`mli_tnsr_struc` describes the fields in the mli_tensor structure.
 
 .. _mli_tnsr_struc:  
 .. table:: mli_tensor Structure Field Descriptions
@@ -109,23 +109,40 @@ Table :ref:`mli_tnsr_struc` describes the fields in the mli_tensor structure.
    |                   |                        | scalar tensors (tensors with a single element), this field is not a         |
    |                   |                        | pointer, but it contains the data itself.                                   |
    +-------------------+------------------------+-----------------------------------------------------------------------------+
-   | ``data.capacity`` | ``unit32_t``           | Size in bytes of the allocated memory that the data field points to. In     |
+   | ``data.capacity`` | ``uint32_t``           | Size in bytes of the allocated memory that the data field points to. In     |
    |                   |                        | case there is no buffer attached (``rank == 0``), the capacity is set to 0. |
    +-------------------+------------------------+-----------------------------------------------------------------------------+
-   | ``shape``         | ``unit32_t[]``         | Array with tensor dimensions. Dimensions are stored in order starting from  |
+   | ``shape``         | ``uint32_t[]``         | Array with tensor dimensions. Dimensions are stored in order starting from  |
    |                   |                        | the one with the largest stride between the data portions.                  |
    |                   |                        | For example, for tensor T of size (channels, height width) stored in HWC    |
    |                   |                        | layout, shape[0] = height, shape[1] = width, shape[2] = channels. Shape[3]  |
    |                   |                        | is unused. The size of the array is defined by ``MLI_MAX_RANK*``.           |
    +-------------------+------------------------+-----------------------------------------------------------------------------+
-   | ``mem_stride``    | ``unit32_t[]``         | Array with the distance (in elements) to the next element in the same       |
-   |                   |                        | dimension. To compute the size in bytes, the number of elements needs to be |
+   | ``mem_stride``    | ``int32_t[]``          | Array with the distance (in elements) to the next element in the same       |
+   |                   |                        | dimension. Positive values are supported only.                              |
+   |                   |                        | To compute the size in bytes, the number of elements needs to be            |
    |                   |                        | multiplied by the bytes per element. For example, for a matrix              |
    |                   |                        | A(rows,columns), ``mem_stride[1]`` contains the distance to the next        |
    |                   |                        | element (=1 in this example), and ``mem_stride[0]`` contains the distance   |
    |                   |                        | from one row to the next (=columns in this example). The size of the array  |
    |                   |                        | is defined by ``MLI_MAX_RANK*``.If the mem_stride is set to 0, it is        |
    |                   |                        | computed from the shape.                                                    |
+   |                   |                        |                                                                             |
+   |                   |                        | Manually-set values of ``mem_stride`` array must decrease gradually and     |
+   |                   |                        | must not be less than if they would be computed from the shape. For         |
+   |                   |                        | example, for a tensor of shape :math:`[Height, Width, Channels)`:           |
+   |                   |                        |                                                                             |
+   |                   |                        |  - ``mem_stride[0] >= 1 x Channels x Width``                                |
+   |                   |                        |    AND ``mem_stride[0] >= mem_stride[1]``                                   |
+   |                   |                        |                                                                             |
+   |                   |                        |  - ``mem_stride[1] >= 1*Channels`` AND ``mem_stride[1] >= mem_stride[2]``   |
+   |                   |                        |                                                                             |
+   |                   |                        |  - ``mem_stride[2] >= 1``                                                   |
+   |                   |                        |                                                                             |
+   |                   |                        | In case the mem_stride is computed from the shape, the kernel does not      |
+   |                   |                        | update this field in the tensor struct. The only exception is the           |
+   |                   |                        | ``mli_move`` function, which can write the ``mem_stride`` field of the      |
+   |                   |                        | ``dst`` tensor.                                                             |
    +-------------------+------------------------+-----------------------------------------------------------------------------+
    | ``rank``          | ``uint32_t``           | Number of dimensions of this tensor (Must be less or equal to               |
    |                   |                        | ``MLI_MAX_RANK*``)                                                          |
@@ -169,12 +186,12 @@ channels in the tensor ``(array_size = shape[dim])``.
    |                        |                        | - ``sa.dim >= 0``: Pointer to an array of zero points relating to           |
    |                        |                        |   configured dimension (``sa.dim``).                                        |
    +------------------------+------------------------+-----------------------------------------------------------------------------+
-   | ``sa.scale``           | ``mli_data_container`` | 16-bit signed scale factors.                                                |
+   | ``sa.scale``           | ``mli_data_container`` | 16-bit signed scale factors. Only positive scale factors are supported.     |
    |                        |                        |                                                                             |
-   |                        |                        | - ``sa.dim < 0``: Single value for all data in tensor                       |
+   |                        |                        | - If ``sa.dim < 0``: ``sa.scale`` is a single value for all data in tensor  |
    |                        |                        |                                                                             |
-   |                        |                        | - ``sa.dim >= 0``:  Pointer to an array of scale factors related to         |
-   |                        |                        |   configured dimension (``sa.dim``).                                        |
+   |                        |                        | - If ``sa.dim >= 0``:  ``sa.scale`` is a pointer to an array of             |
+   |                        |                        |   scale factors related to configured dimension (``sa.dim``).               |
    +------------------------+------------------------+-----------------------------------------------------------------------------+
    | ``sa.dim``             | ``int32_t``            | Tensor dimension to which the arrays of quantization parameters apply       |
    +------------------------+------------------------+-----------------------------------------------------------------------------+