Skip to content

Commit c578d7a

Browse files
committed
Refine equation labeling
1 parent ef4297a commit c578d7a

File tree

4 files changed

+16
-32
lines changed

4 files changed

+16
-32
lines changed

chapter_model_deployment/Advanced_Efficient_Techniques.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,7 @@ $M_{\text{target}}(\text{prefix} + [x_1 + ... + x_{\gamma}])$. If the
5252
condition $q(x) < p(x)$ is met, the token is retained. In contrast, if
5353
not met, the token faces a rejection chance of $1 - \frac{p(x)}{q(x)}$,
5454
following which it is reselected from an adjusted distribution:
55-
$$
56-
p'(x) = norm(max(0, p(x) - q(x)))$$
55+
$$p'(x) = norm(max(0, p(x) - q(x)))$$
5756
:eqlabel:`equ:sd_adjusted` In the paper [@leviathan2023fast],
5857
Leviathan et al. have proved the correctness of this adjusted
5958
distribution for resampling.
@@ -160,14 +159,12 @@ reading and writing the large attention matrix to and from HBM. And
160159
perform computation in SRAM as much as possible.
161160

162161
The standard Scaled Dot-Product Attention [@attention] formula is
163-
$$
164-
\textbf{A} = Softmax(\frac{\textbf{QK}^T}{\sqrt{d_k}})\textbf{V}$$
162+
$$\textbf{A} = Softmax(\frac{\textbf{QK}^T}{\sqrt{d_k}})\textbf{V}$$
165163
:eqlabel:`equ:std_attn`
166164

167165
As $d_k$ is a scalar, we can simplify it into three parts:
168166

169-
$$
170-
\begin{aligned}
167+
$$\begin{aligned}
171168
\textbf{S} = \textbf{QK}^T\\
172169
\textbf{P} = Softmax(\textbf{S})\\
173170
\textbf{O} = \textbf{PV}

chapter_model_deployment/Conversion_to_Inference_Model_and_Model_Optimization.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,7 @@ understood as the simplification of an equation. The computation of
9090
Convolution is expressed as Equation
9191
:eqref:`ch-deploy/conv-equation`.
9292

93-
$$
94-
\bm{Y_{\rm conv}}=\bm{W_{\rm conv}}\cdot\bm{X_{\rm conv}}+\bm{B_{\rm conv}}$$
93+
$$\bm{Y_{\rm conv}}=\bm{W_{\rm conv}}\cdot\bm{X_{\rm conv}}+\bm{B_{\rm conv}}$$
9594
:eqlabel:`equ:ch-deploy/conv-equation`
9695

9796
Here, we do not need to understand what each variable means. Instead, we
@@ -104,8 +103,7 @@ Equation
104103
:eqref:`ch-deploy/bn-equation` is about the computation of
105104
Batchnorm:
106105

107-
$$
108-
\bm{Y_{\rm bn}}=\gamma\frac{\bm{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$
106+
$$\bm{Y_{\rm bn}}=\gamma\frac{\bm{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$
109107
:eqlabel:`equ:ch-deploy/bn-equation`
110108

111109
Similarly, it is an equation for $\bm{Y_{\rm bn}}$ with respect to
@@ -119,8 +117,7 @@ After substituting $\bm{Y_{\rm conv}}$ into $\bm{X_{\rm bn}}$ and
119117
uniting and extracting the constants, we obtain Equation
120118
:eqref:`ch-deploy/conv-bn-equation-3`.
121119

122-
$$
123-
\bm{Y_{\rm bn}}=\bm{A}\cdot\bm{X_{\rm conv}}+\bm{B}$$
120+
$$\bm{Y_{\rm bn}}=\bm{A}\cdot\bm{X_{\rm conv}}+\bm{B}$$
124121
:eqlabel:`equ:ch-deploy/conv-bn-equation-3`
125122

126123
Here, $\bm{A}$ and $\bm{B}$ are two matrices. It can be noticed that
@@ -186,8 +183,7 @@ principle of operator replacement. After decomposing Equation
186183
folding the constants, Batchnorm is defined as Equation
187184
:eqref:`ch-deploy/replace-scale`
188185

189-
$$
190-
\bm{Y_{bn}}=scale\cdot\bm{X_{bn}}+offset$$
186+
$$\bm{Y_{bn}}=scale\cdot\bm{X_{bn}}+offset$$
191187
:eqlabel:`equ:ch-deploy/replace-scale`
192188

193189
where **scale** and **offsets** are scalars. This simplified formula can

chapter_model_deployment/Model_Compression.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -172,8 +172,7 @@ $||\hat{w_c}-E(\hat{w_c})||$, respectively. Equation
172172
:eqref:`ch-deploy/post-quantization` is the calibration of the
173173
weight:
174174

175-
$$
176-
\begin{aligned}
175+
$$\begin{aligned}
177176
\hat{w_c}\leftarrow\zeta_c(\hat{w_c}+u_c) \\
178177
u_c=E(w_c)-E(\hat{w_c}) \\
179178
\zeta_c=\frac{||w_c-E(w_c)||}{||\hat{w_c}-E(\hat{w_c})||}

chapter_model_deployment/Model_Inference.md

Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -258,8 +258,7 @@ be written using matrices as Equation
258258
:eqref:`ch-deploy/conv-matmul-one-dimension`, which contains six
259259
multiplications and four additions.
260260

261-
$$
262-
\textit{\textbf{F}}(2, 3)=
261+
$$\textit{\textbf{F}}(2, 3)=
263262
\left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
264263
\left[ \begin{matrix} y_0 \\ y_1 \end{matrix} \right]$$
265264
:eqlabel:`equ:ch-deploy/conv-matmul-one-dimension`
@@ -271,17 +270,15 @@ multiplication. The matrix multiplication result may be obtained by
271270
computing an intermediate variable $m_0-m_3$, as shown in Equation
272271
:eqref:`ch-deploy/conv-2-winograd`:
273272

274-
$$
275-
\textit{\textbf{F}}(2, 3)=
273+
$$\textit{\textbf{F}}(2, 3)=
276274
\left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
277275
\left[ \begin{matrix} m_0+m_1+m_2 \\ m_1-m_2+m_3 \end{matrix} \right]$$
278276
:eqlabel:`equ:ch-deploy/conv-2-winograd`
279277

280278
where $m_0-m_3$ are computed as Equation
281279
:eqref:`ch-deploy/winograd-param`:
282280

283-
$$
284-
\begin{aligned}
281+
$$\begin{aligned}
285282
m_0=(d_0-d_2) \times g_0 \\
286283
m_1=(d_1+d_2) \times (\frac{g_0+g_1+g_2}{2}) \\
287284
m_2=(d_0-d_2) \times (\frac{g_0-g_1+g_2}{2}) \\
@@ -310,22 +307,18 @@ matrix computation is performed based on the handwritten form, as
310307
provided in Equation
311308
:eqref:`ch-deploy/winograd-param`.
312309

313-
$$
314-
\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g) \odot (\textit{\textbf{B}}^{\rm T}d)$$
310+
$$\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g) \odot (\textit{\textbf{B}}^{\rm T}d)$$
315311
:eqlabel:`equ:ch-deploy/winograd-matrix`
316312

317-
$$
318-
\textit{\textbf{B}}^{\rm T}=
313+
$$\textit{\textbf{B}}^{\rm T}=
319314
\left[ \begin{matrix} 1 & 0 & -1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & -1 \end{matrix} \right]$$
320315
:eqlabel:`equ:ch-deploy/winograd-matrix-bt`
321316

322-
$$
323-
\textit{\textbf{G}}=
317+
$$\textit{\textbf{G}}=
324318
\left[ \begin{matrix} 1 & 0 & 0 \\ 0.5 & 0.5 & 0.5 \\ 0.5 & -0.5 & 0.5 \\ 0 & 0 & 1 \end{matrix} \right]$$
325319
:eqlabel:`equ:ch-deploy/winograd-matrix-g`
326320

327-
$$
328-
\textit{\textbf{A}}^{\rm T}=
321+
$$\textit{\textbf{A}}^{\rm T}=
329322
\left[ \begin{matrix} 1 & 1 & -1 & 0 \\ 0 & 1 & -1 & -1 \end{matrix} \right] \\$$
330323
:eqlabel:`equ:ch-deploy/winograd-matrix-at`
331324

@@ -337,8 +330,7 @@ shown in Equation
337330
Winograd has 16 multiplications, reducing the computation complexity by
338331
2.25 times compared with 36 multiplications of the original convolution.
339332

340-
$$
341-
\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g\textit{\textbf{G}}^{\rm T}) \odot (\textit{\textbf{B}}^{\rm T}d\textit{\textbf{B}})\textit{\textbf{A}}$$
333+
$$\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g\textit{\textbf{G}}^{\rm T}) \odot (\textit{\textbf{B}}^{\rm T}d\textit{\textbf{B}})\textit{\textbf{A}}$$
342334
:eqlabel:`equ:ch-deploy/winograd-two-dimension-matrix`
343335

344336
The logical process of Winograd can be divided into four steps, as shown

0 commit comments

Comments
 (0)