Skip to content

Commit 0e2060b

Browse files
committed
debug
1 parent 930fb88 commit 0e2060b

File tree

3 files changed

+102
-0
lines changed

3 files changed

+102
-0
lines changed

.DS_Store

2 KB
Binary file not shown.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
_build
2+
.DS_Store

chapter_model_deployment/Model_Inference.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,3 +243,104 @@ performed simultaneously to save time.
243243
![Img2col on the convolutionkernel](../img/ch08/ch09-img2col_weight.png)
244244
:label:`ch-deploy/img2col_weight`
245245

246+
**(2) Winograd**
247+
248+
Convolution is essentially considered as matrix multiplication. The time
249+
complexity of multiplying two 2D matrices is $O(n^3)$. The Winograd
250+
algorithm can reduce the complexity of matrix multiplication.
251+
252+
Assume that a 1D convolution operation is denoted as ***F***($m$, $r$),
253+
where $m$ indicates the number of outputs, and $r$ indicates the number
254+
of convolution kernels. The input is
255+
$\textit{\textbf{d}}=[d_0 \ d_1 \ d_2 \ d_3]$, and the convolution
256+
kernel is $g=[g_0 \ g_1 \ g_2]^{\rm T}$. The convolution operation may
257+
be written using matrices as Equation
258+
:eqref:`ch-deploy/conv-matmul-one-dimension`, which contains six
259+
multiplications and four additions.
260+
261+
$$
262+
\textit{\textbf{F}}(2, 3)=
263+
\left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
264+
\left[ \begin{matrix} y_0 \\ y_1 \end{matrix} \right]
265+
$$
266+
:eqlabel:`equ:ch-deploy/conv-matmul-one-dimension`
267+
268+
In the preceding equation, there are repeated elements $d_1$ and $d_2$
269+
in the input matrix. As such, there is space for optimization for matrix
270+
multiplication converted from convolution compared with general matrix
271+
multiplication. The matrix multiplication result may be obtained by
272+
computing an intermediate variable $m_0-m_3$, as shown in Equation
273+
:eqref:`ch-deploy/conv-2-winograd`:
274+
275+
$$
276+
\textit{\textbf{F}}(2, 3)=
277+
\left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
278+
\left[ \begin{matrix} m_0+m_1+m_2 \\ m_1-m_2+m_3 \end{matrix} \right]
279+
$$
280+
:eqlabel:`equ:ch-deploy/conv-2-winograd`
281+
282+
where $m_0-m_3$ are computed as Equation
283+
:eqref:`ch-deploy/winograd-param`:
284+
285+
$$
286+
\begin{aligned}
287+
m_0=(d_0-d_2) \times g_0 \\
288+
m_1=(d_1+d_2) \times (\frac{g_0+g_1+g_2}{2}) \\
289+
m_2=(d_0-d_2) \times (\frac{g_0-g_1+g_2}{2}) \\
290+
m_3=(d_1-d_3) \times g_2
291+
\end{aligned}
292+
$$
293+
:eqlabel:`equ:ch-deploy/winograd-param`
294+
295+
The indirect computation of r1 and r2 by computing $m_0-m_3$ involves
296+
four additions of the input $d$ and four multiplications and four
297+
additions of the output $m$. Because the weights are constant during
298+
inference, the operations on the convolution kernel can be performed
299+
during graph compilation, which is excluded from the online runtime. In
300+
total, there are four multiplications and eight additions --- fewer
301+
multiplications and more additions compared with direct computation
302+
(which has six multiplications and four additions). In computer systems,
303+
multiplications are generally more time-consuming than additions.
304+
Decreasing the number of multiplications while adding a small number of
305+
additions can accelerate computation.
306+
307+
In a matrix form, the computation can be written as Equation
308+
:eqref:`ch-deploy/winograd-matrix`, where $\odot$ indicates the
309+
multiplication of corresponding locations, and ***A***, ***B***, and
310+
***G*** are all constant matrices. The matrix here is used to facilitate
311+
clarity --- in real-world use, faster computation can be achieved if the
312+
matrix computation is performed based on the handwritten form, as
313+
provided in Equation
314+
:eqref:`ch-deploy/winograd-param`.
315+
316+
$$\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g) \odot (\textit{\textbf{B}}^{\rm T}d)$$
317+
:eqlabel:`equ:ch-deploy/winograd-matrix`
318+
319+
$$\textit{\textbf{B}}^{\rm T}=
320+
\left[ \begin{matrix} 1 & 0 & -1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & -1 \end{matrix} \right]$$
321+
:eqlabel:`equ:ch-deploy/winograd-matrix-bt`
322+
323+
$$\textit{\textbf{G}}=
324+
\left[ \begin{matrix} 1 & 0 & 0 \\ 0.5 & 0.5 & 0.5 \\ 0.5 & -0.5 & 0.5 \\ 0 & 0 & 1 \end{matrix} \right]$$
325+
:eqlabel:`equ:ch-deploy/winograd-matrix-g`
326+
327+
$$\textit{\textbf{A}}^{\rm T}=
328+
\left[ \begin{matrix} 1 & 1 & -1 & 0 \\ 0 & 1 & -1 & -1 \end{matrix} \right] \\$$
329+
:eqlabel:`equ:ch-deploy/winograd-matrix-at`
330+
331+
In deep learning, 2D convolution is typically used. When ***F***(2, 3)
332+
is extended to ***F***(2$\times$`<!-- -->`{=html}2,
333+
3$\times$`<!-- -->`{=html}3), it can be written in a matrix form, as
334+
shown in Equation
335+
:eqref:`ch-deploy/winograd-two-dimension-matrix`. In this case,
336+
Winograd has 16 multiplications, reducing the computation complexity by
337+
2.25 times compared with 36 multiplications of the original convolution.
338+
339+
$$\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g\textit{\textbf{G}}^{\rm T}) \odot (\textit{\textbf{B}}^{\rm T}d\textit{\textbf{B}})\textit{\textbf{A}}$$
340+
:eqlabel:`equ:ch-deploy/winograd-two-dimension-matrix`
341+
342+
The logical process of Winograd can be divided into four steps, as shown
343+
in Figure :numref:`ch-deploy/winograd`.
344+
345+
![Winogradsteps](../img/ch08/ch09-winograd.png)
346+
:label:`ch-deploy/winograd`

0 commit comments

Comments
 (0)