@@ -243,3 +243,104 @@ performed simultaneously to save time.
243243![ Img2col on the convolutionkernel] ( ../img/ch08/ch09-img2col_weight.png )
244244:label : ` ch-deploy/img2col_weight `
245245
246+ ** (2) Winograd**
247+
248+ Convolution is essentially considered as matrix multiplication. The time
249+ complexity of multiplying two 2D matrices is $O(n^3)$. The Winograd
250+ algorithm can reduce the complexity of matrix multiplication.
251+
252+ Assume that a 1D convolution operation is denoted as *** F*** ($m$, $r$),
253+ where $m$ indicates the number of outputs, and $r$ indicates the number
254+ of convolution kernels. The input is
255+ $\textit{\textbf{d}}=[ d_0 \ d_1 \ d_2 \ d_3] $, and the convolution
256+ kernel is $g=[ g_0 \ g_1 \ g_2] ^{\rm T}$. The convolution operation may
257+ be written using matrices as Equation
258+ :eqref:` ch-deploy/conv-matmul-one-dimension ` , which contains six
259+ multiplications and four additions.
260+
261+ $$
262+ \textit{\textbf{F}}(2, 3)=
263+ \left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
264+ \left[ \begin{matrix} y_0 \\ y_1 \end{matrix} \right]
265+ $$
266+ :eqlabel:` equ:ch-deploy/conv-matmul-one-dimension `
267+
268+ In the preceding equation, there are repeated elements $d_1$ and $d_2$
269+ in the input matrix. As such, there is space for optimization for matrix
270+ multiplication converted from convolution compared with general matrix
271+ multiplication. The matrix multiplication result may be obtained by
272+ computing an intermediate variable $m_0-m_3$, as shown in Equation
273+ :eqref:` ch-deploy/conv-2-winograd ` :
274+
275+ $$
276+ \textit{\textbf{F}}(2, 3)=
277+ \left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
278+ \left[ \begin{matrix} m_0+m_1+m_2 \\ m_1-m_2+m_3 \end{matrix} \right]
279+ $$
280+ :eqlabel:` equ:ch-deploy/conv-2-winograd `
281+
282+ where $m_0-m_3$ are computed as Equation
283+ :eqref:` ch-deploy/winograd-param ` :
284+
285+ $$
286+ \begin{aligned}
287+ m_0=(d_0-d_2) \times g_0 \\
288+ m_1=(d_1+d_2) \times (\frac{g_0+g_1+g_2}{2}) \\
289+ m_2=(d_0-d_2) \times (\frac{g_0-g_1+g_2}{2}) \\
290+ m_3=(d_1-d_3) \times g_2
291+ \end{aligned}
292+ $$
293+ :eqlabel:` equ:ch-deploy/winograd-param `
294+
295+ The indirect computation of r1 and r2 by computing $m_0-m_3$ involves
296+ four additions of the input $d$ and four multiplications and four
297+ additions of the output $m$. Because the weights are constant during
298+ inference, the operations on the convolution kernel can be performed
299+ during graph compilation, which is excluded from the online runtime. In
300+ total, there are four multiplications and eight additions --- fewer
301+ multiplications and more additions compared with direct computation
302+ (which has six multiplications and four additions). In computer systems,
303+ multiplications are generally more time-consuming than additions.
304+ Decreasing the number of multiplications while adding a small number of
305+ additions can accelerate computation.
306+
307+ In a matrix form, the computation can be written as Equation
308+ :eqref:` ch-deploy/winograd-matrix ` , where $\odot$ indicates the
309+ multiplication of corresponding locations, and *** A*** , *** B*** , and
310+ *** G*** are all constant matrices. The matrix here is used to facilitate
311+ clarity --- in real-world use, faster computation can be achieved if the
312+ matrix computation is performed based on the handwritten form, as
313+ provided in Equation
314+ :eqref:` ch-deploy/winograd-param ` .
315+
316+ $$ \textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g) \odot (\textit{\textbf{B}}^{\rm T}d) $$
317+ :eqlabel:` equ:ch-deploy/winograd-matrix `
318+
319+ $$ \textit{\textbf{B}}^{\rm T}=
320+ \left[ \begin{matrix} 1 & 0 & -1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & -1 \end{matrix} \right] $$
321+ :eqlabel:` equ:ch-deploy/winograd-matrix-bt `
322+
323+ $$ \textit{\textbf{G}}=
324+ \left[ \begin{matrix} 1 & 0 & 0 \\ 0.5 & 0.5 & 0.5 \\ 0.5 & -0.5 & 0.5 \\ 0 & 0 & 1 \end{matrix} \right] $$
325+ :eqlabel:` equ:ch-deploy/winograd-matrix-g `
326+
327+ $$ \textit{\textbf{A}}^{\rm T}=
328+ \left[ \begin{matrix} 1 & 1 & -1 & 0 \\ 0 & 1 & -1 & -1 \end{matrix} \right] \\ $$
329+ :eqlabel:` equ:ch-deploy/winograd-matrix-at `
330+
331+ In deep learning, 2D convolution is typically used. When *** F*** (2, 3)
332+ is extended to *** F*** (2$\times$` <!-- --> ` {=html}2,
333+ 3$\times$` <!-- --> ` {=html}3), it can be written in a matrix form, as
334+ shown in Equation
335+ :eqref:` ch-deploy/winograd-two-dimension-matrix ` . In this case,
336+ Winograd has 16 multiplications, reducing the computation complexity by
337+ 2.25 times compared with 36 multiplications of the original convolution.
338+
339+ $$ \textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g\textit{\textbf{G}}^{\rm T}) \odot (\textit{\textbf{B}}^{\rm T}d\textit{\textbf{B}})\textit{\textbf{A}} $$
340+ :eqlabel:` equ:ch-deploy/winograd-two-dimension-matrix `
341+
342+ The logical process of Winograd can be divided into four steps, as shown
343+ in Figure :numref:` ch-deploy/winograd ` .
344+
345+ ![ Winogradsteps] ( ../img/ch08/ch09-winograd.png )
346+ :label : ` ch-deploy/winograd `
0 commit comments