Skip to content

Commit 2b79e39

Browse files
committed
review comments from Dmitry incorporated
1 parent b769f1c commit 2b79e39

File tree

1 file changed

+21
-24
lines changed

1 file changed

+21
-24
lines changed

doc/documents/Examples_Tutorials/Examples_Tutorials.rst

Lines changed: 21 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -18,25 +18,25 @@ Our assumption is that you are familiar with:
1818

1919
- **Caffe framework basics**
2020

21-
The development process of MLI-based embedded application is depicted with diagram:
21+
The proposed development process of MLI-based embedded application is depicted with diagram:
2222

2323
.. image:: ../images/1_depl_process.png
2424
:align: center
2525
:alt: MLI-Based Application Development Process
2626

2727
..
2828
29-
1. Model definition and training in some appropriate framework. Ensure that you consider all limitations of the target platform here including memory restriction, MHz budget, and quantization effect in some cases.
29+
1. Model definition and training in some appropriate framework. Ensure that you consider all limitations of the target platform here including memory restriction and frequency budget.
3030

31-
2. Model deployment implies construction of tested and verified ML module with a defined interface. Hence, wrap the module into file-to-file application for convenient debugging and verification.
31+
2. Model deployment implies construction of tested and verified ML module with a defined interface. It is recommended to wrap the module into file-to-file application for convenient debugging and verification.
3232
MLI CIFAR-10 example is exactly of this “unit-testing” kind of applications.
3333

3434
3. Integrate this module into the target embedded application code with real data.
3535

3636
This tutorial focuses on the second step – model deployment.
3737
Manual deployment consists of two main parts:
3838

39-
- Deploying data — This is obvious because training implies tuning of model parameters.
39+
- Deploying data — Training implies tuning of model parameters.
4040

4141
- Deploying operations — The model consists of not only parameters but also algorithm that uses some basic operations or machine learning primitives.
4242

@@ -121,7 +121,7 @@ Using defined pieces of Python code, you can extract all the required data from
121121
Collect Data Range Statistic for Each Layer
122122
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
123123

124-
Quantization process is not only meant to convert weights data to fixed point representation, but also meant to define ranges of all the intermediate data for each layer. For this purpose, run the model on some representative data subset and gather statistics for all intermediate results. It is better to use all training subsets, or even all the dataset.
124+
Quantization process is not only meant to convert weights data to fixed point representation, but also meant to define ranges of all the intermediate data for each layer. For this purpose, run the model on some representative data subset and gather statistics for all intermediate results. It is recommended to use full training subset.
125125

126126
To accomplish this using previously defined instruments, see this sample code:
127127

@@ -180,10 +180,10 @@ MLI supports fixed point format defined by Q-notation (see section MLI Fixed-Poi
180180
:widths: auto
181181

182182
+---------------+---------------------------------------------------------------+---------------------------------------------------------------+
183-
| | **Maximum abs values of tensors** | **Maximum abs values of tensors** |
183+
| | **Maximum abs values of tensors** | **Minimum abs values of tensors** |
184184
| **CIFAR10** +---------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
185-
| | Layer input | Layer weights | Layer bias | Layer out | Layer input | Layer weights | Layer bias | Layer out |
186-
| | Max ABS value | Max ABS value | Max ABS value | Max ABS value | Max ABS value | Max ABS value | Max ABS value | Max ABS value |
185+
| | Layer Input | Layer Weights | Layer Bias | Layer Out | Layer Input | Layer Weights | Layer bias | Layer out |
186+
| | Max ABS Value | Max ABS Value | Max ABS Value | Max ABS Value | Bits | Bits | Bits | Bits |
187187
+===============+===============+===============+===============+===============+===============+===============+===============+===============+
188188
| Layer 1_conv | 0.99 | 0.49 | 0.73 | 7.03 | 0 | -1 | 0 | 3 |
189189
+---------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
@@ -245,20 +245,11 @@ Consider a small example not directly related to the CIFAR-10:
245245
| | (updated) | | |
246246
+------------------------------+-----------------------+-------------------+------------------+
247247

248+
..
248249
249-
250-
251-
252-
253-
Ensure that you follow these steps:
254-
255-
1. For a convolution layer, define the number of integer bits as in the previous example.
256-
257-
2. For each output value, the compute the number of required sequential accumulations: 32[number of channels] * (5*5) [kernel size] +1 [bias] = 801 operations. Hence, 10 extra bits are required for accumulation.
258-
259-
3. Since the number of extra bits is less than the allocated bits for integer - 9, increase number of integer bits for layer input.
260-
261-
For the following fully connected layer, 11 extra bits are required and you need to distribute 2 bits. It’s recommended to do it evenly between operands. Note that number of convolution’s output fractional bits also needs to be changed to be aligned with next fully connected input.
250+
For convolution layer X, number of integer bits are defined as before. And for each output value, the following number of sequential accumulations is required: 32[number of channels] * (5*5) [kernel size] +1 [bias] = 801 operations. 10 extra bits are required for accumulation while only 9 are available. For this reason, the number of integer bits for layer input are increased.
251+
252+
For the following fully connected layer, 11 extra bits are required and 2 bits need to be distributed. It’s recommended to do this evenly between operands. Note that number of convolution’s output fractional bits also needs to be changed to be aligned with next fully connected input.
262253

263254
For 8-bit operands,you do not need to perform this adjustment unless your MAC series is more than 131072 operations in which case, apply similar approach. After considering accumulator restrictions for CIFAR-10 example with 16-bit operands, you get the following table:
264255

@@ -293,15 +284,15 @@ For 8-bit operands,you do not need to perform this adjustment unless your MAC se
293284

294285

295286
.. note::
296-
Defining Q format in this way, you can guarantee that accumulator is not saturated while a single output is being calculated. But the restriction may be loosened if you are sure about your data. For example, look at the final fully connected layer above: 9 bits are enough if we do not consider bias addition. Analyze how likely is it that for 1 extra addition result will overflow the defined range. Moreover, saturation of results might have a minor effect on the network accuracy.
287+
Defining Q format in this way, you can guarantee that accumulator is not saturated while a single output is being calculated. But the restriction may be loosened if you are sure about your data. For example, look at the final fully connected layer above: 9 bits (512 MACs) are enough if we do not consider bias addition. Analyze how likely is it that for 1 extra addition result will overflow the defined range. Moreover, saturation of results might have a minor effect on the network accuracy.
297288
..
298289
299290
Quantize Weights According to Defined Q-Format
300291
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
301292

302293
After extracting coefficients in numpy array objects and defining Qm.n format for data, define MLI structures for kernels and export the quantized data.
303294

304-
Consider a static allocation of data. To extract weights, you may make pre-processor quantize data for you in compile-time by wrapping each coefficient into some macro. It is slower and uses more memory resources of your machine for compilation, but it is worth if the model is not so big.
295+
Consider a static allocation of data. To extract weights, you may make pre-processor quantize data for you in compile-time by wrapping each coefficient into macro function. It is slower and uses more memory resources of your machine for compilation, but it is worth if the model is not so big.
305296

306297
.. code:: c++
307298

@@ -321,7 +312,7 @@ Consider a static allocation of data. To extract weights, you may make pre-proce
321312
};
322313
..
323314
324-
Alternatively, you may quantize data externally Layer 1_conv in the same way and just put it into code.
315+
Alternatively, you can quantize data externally in the same way and just put it into code.
325316

326317
.. code:: c++
327318

@@ -413,6 +404,8 @@ Transpose data by permute layer with appropriate parameters:
413404
| | .. |
414405
+---------------------------+---------------------------------------------------------------+
415406

407+
408+
Next, consider convolution and ReLU layers:
416409

417410
.. image:: ../images/4_op_map_step2.png
418411
:align: center
@@ -472,6 +465,8 @@ Parameters of all convolutions in the model are the same, so you may use the onl
472465

473466
..
474467
468+
MLI Pooling behavior differs from Caffe default behavior. In Caffe, padding is implied for some combinations of layer parameters, even if not specified. You should indicate padding clearly because it is meant in the Caffe. It was done for compatibility with other frameworks.
469+
475470
.. table:: Example Pooling Layer with Padding
476471
:widths: 20, 130
477472

@@ -534,6 +529,8 @@ Consider the last two operations:
534529

535530
..
536531
532+
Fully connected (referred as Inner Product in Caffe) and softmax don’t require any specific analysis.
533+
537534
.. table:: Example of Function Choosing Optimal Specialization
538535
:widths: 20, 130
539536

0 commit comments

Comments
 (0)