Skip to content

Commit 930fb88

Browse files
committed
upload content
1 parent d550498 commit 930fb88

File tree

1 file changed

+47
-0
lines changed

1 file changed

+47
-0
lines changed

chapter_model_deployment/Model_Inference.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,3 +196,50 @@ assembly performance:
196196
5. Instruction prefetching: Load the required data from the main memory
197197
to the cache in advance to reduce the access latency.
198198

199+
**2. Algorithm optimization**
200+
201+
For most AI models, 90% or more of the inference time of the entire
202+
network is spent on computing convolution and matrix multiplication
203+
operators. This section focuses on the optimization of convolution
204+
operator algorithms, which can be applied to various hardware devices.
205+
The computation of convolution can be converted into the multiplication
206+
of two matrices, and we have elaborated on the optimization of the GEMM
207+
algorithm in Section :ref:`ch-deploy/parallel-inference`. For different hardware,
208+
appropriate matrix blocking can optimize data load/store efficiency and
209+
instruction parallelism. This helps to maximize the utilization of the
210+
hardware's computing power, thereby improving the inference performance.
211+
212+
**(1) Img2col**
213+
214+
Img2col is often used to convert convolution into matrix multiplication.
215+
Convolutional layers typically operate on 4D inputs in NHWC format.
216+
Figure :numref:`ch-deploy/conv_nhwc` is a diagram of convolution. The
217+
input shape is (1, IH, IW, IC), the convolution kernel shape is (OC, KH,
218+
KW, IC), and the output shape is (1, OH, OW, OC).
219+
220+
![Generalconvolution](../img/ch08/ch09-conv_nhwc.png)
221+
:label:`ch-deploy/conv_nhwc`
222+
223+
As shown in Figure
224+
:numref:`ch-deploy/img2col_input`, the Img2col rules for
225+
convolution are as follows: The input is reordered to obtain the matrix
226+
on the right. The number of rows corresponds to the number of OH \* OW
227+
outputs. For a row vector, Img2col processes KH \* KW data points of
228+
each input channel in sequence, from the first channel to channel IC.
229+
230+
![Img2col on the convolutioninput](../img/ch08/ch09-img2col_input.png)
231+
:label:`ch-deploy/img2col_input`
232+
233+
As shown in Figure
234+
:numref:`ch-deploy/img2col_weight`, the weights are rearranged.
235+
One convolution kernel is expanded into one column of the weight matrix.
236+
This means that there are OC columns in total. On each column vector, KH
237+
\* KW data values on the first input channel are arranged first, and
238+
then on subsequent channels until the channel IC. In this manner, the
239+
convolution operation is converted into the multiplication of two
240+
matrices. In practice, the data rearrangement of Img2col and GEMM is
241+
performed simultaneously to save time.
242+
243+
![Img2col on the convolutionkernel](../img/ch08/ch09-img2col_weight.png)
244+
:label:`ch-deploy/img2col_weight`
245+

0 commit comments

Comments
 (0)