Skip to content

Commit eb1b760

Browse files
committed
Refine equation representation
1 parent e407406 commit eb1b760

File tree

2 files changed

+9
-13
lines changed

2 files changed

+9
-13
lines changed

chapter_model_deployment/Conversion_to_Inference_Model_and_Model_Optimization.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,7 @@ Convolution is expressed as Equation
9595
[\[equ:ch-deploy/conv-equation\]](#equ:ch-deploy/conv-equation){reference-type="ref"
9696
reference="equ:ch-deploy/conv-equation"}.
9797

98-
$$[equ:ch-deploy/conv-equation]
99-
\mathbf{Y_{\rm conv}}=\mathbf{W_{\rm conv}}\cdot\mathbf{X_{\rm conv}}+\mathbf{B_{\rm conv}}$$
98+
$$\mathbf{Y_{\rm conv}}=\mathbf{W_{\rm conv}}\cdot\mathbf{X_{\rm conv}}+\mathbf{B_{\rm conv}}, \text{equ:ch-deploy/conv-equation}$$
10099

101100
Here, we do not need to understand what each variable means. Instead, we
102101
only need to keep in mind that Equation
@@ -110,8 +109,8 @@ Equation
110109
reference="equ:ch-deploy/bn-equation"} is about the computation of
111110
Batchnorm:
112111

113-
$$[equ:ch-deploy/bn-equation]:
114-
\mathbf{Y_{\rm bn}}=\gamma\frac{\mathbf{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$
112+
**equ:ch-deploy/bn-equation:**\
113+
$$\mathbf{Y_{\rm bn}}=\gamma\frac{\mathbf{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$
115114

116115
Similarly, it is an equation for $\mathbf{Y_{\rm bn}}$ with respect to
117116
$\mathbf{X_{\rm bn}}$. Other symbols in the equation represent constants.
@@ -126,8 +125,7 @@ uniting and extracting the constants, we obtain Equation
126125
[\[equ:ch-deploy/conv-bn-equation-3\]](#equ:ch-deploy/conv-bn-equation-3){reference-type="ref"
127126
reference="equ:ch-deploy/conv-bn-equation-3"}.
128127

129-
$$[equ:ch-deploy/conv-bn-equation-3]
130-
\mathbf{Y_{\rm bn}}=\mathbf{A}\cdot\mathbf{X_{\rm conv}}+\mathbf{B}$$
128+
$$\mathbf{Y_{\rm bn}}=\mathbf{A}\cdot\mathbf{X_{\rm conv}}+\mathbf{B}, \text{equ:ch-deploy/conv-bn-equation-3}$$
131129

132130
Here, $\mathbf{A}$ and $\mathbf{B}$ are two matrices. It can be noticed that
133131
Equation
@@ -195,8 +193,7 @@ folding the constants, Batchnorm is defined as Equation
195193
[\[equ:ch-deploy/replace-scale\]](#equ:ch-deploy/replace-scale){reference-type="ref"
196194
reference="equ:ch-deploy/replace-scale"}
197195

198-
$$[equ:ch-deploy/replace-scale]
199-
\mathbf{Y_{bn}}=scale\cdot\mathbf{X_{bn}}+offset$$
196+
$$\mathbf{Y_{bn}}=scale\cdot\mathbf{X_{bn}}+offset, \text{equ:ch-deploy/replace-scale} $$
200197

201198
where **scale** and **offsets** are scalars. This simplified formula can
202199
be mapped to a Scale operator.

chapter_model_deployment/Model_Compression.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ reference="equ:ch-deploy/quantization-q"}, assume that $r$ represents
5454
the floating-point number before quantization. We are then able to
5555
obtain the integer $q$ after quantization.
5656

57-
$$[equ:ch-deploy/quantization-q]q=clip(round(\frac{r}{s}+z),q_{min},q_{max})$$
57+
$$q=clip(round(\frac{r}{s}+z),q_{min},q_{max}), \text{equ:ch-deploy/quantization-q}$$
5858

5959
$clip(\cdot)$ and $round(\cdot)$ indicate the truncation and rounding
6060
operations, and $q_{min}$ and $q_{max}$ indicate the minimum and maximum
@@ -174,12 +174,12 @@ $||\hat{w_c}-E(\hat{w_c})||$, respectively. Equation
174174
reference="equ:ch-deploy/post-quantization"} is the calibration of the
175175
weight:
176176

177-
$$[equ:ch-deploy/post-quantization]
177+
$$
178178
\begin{aligned}
179179
\hat{w_c}\leftarrow\zeta_c(\hat{w_c}+u_c) \\
180180
u_c=E(w_c)-E(\hat{w_c}) \\
181181
\zeta_c=\frac{||w_c-E(w_c)||}{||\hat{w_c}-E(\hat{w_c})||}
182-
\end{aligned}$$
182+
\end{aligned}, \text{equ:ch-deploy/post-quantization}$$
183183

184184
As a general model compression method, quantization can significantly
185185
improve the memory and compression efficiency of neural networks, and
@@ -326,8 +326,7 @@ classification result of the teacher network, that is, Equation
326326
[\[c2Fcn:distill\]](#c2Fcn:distill){reference-type="ref"
327327
reference="c2Fcn:distill"}.
328328

329-
$$\mathcal{L}_{KD}(\theta_S) = \mathcal{H}(o_S,\mathbf{y}) +\lambda\mathcal{H}(\tau(o_S),\tau(o_T)),
330-
[c2Fcn:distill]$$
329+
$$\mathcal{L}_{KD}(\theta_S) = \mathcal{H}(o_S,\mathbf{y}) +\lambda\mathcal{H}(\tau(o_S),\tau(o_T)), \text{c2Fcn:distill}$$
331330

332331
where $\mathcal{H}(\cdot,\cdot)$ is the cross-entropy function, $o_S$
333332
and $o_T$ are outputs of the student network and the teacher network,

0 commit comments

Comments
 (0)