Refine equation representation

mikebo93 · mikebo93 · commit eb1b76072a5b · 2025-03-20T23:09:34.000Z
diff --git a/chapter_model_deployment/Conversion_to_Inference_Model_and_Model_Optimization.md b/chapter_model_deployment/Conversion_to_Inference_Model_and_Model_Optimization.md
@@ -95,8 +95,7 @@ Convolution is expressed as Equation
 [\[equ:ch-deploy/conv-equation\]](#equ:ch-deploy/conv-equation){reference-type="ref"
 reference="equ:ch-deploy/conv-equation"}.
 
-$$[equ:ch-deploy/conv-equation]
-\mathbf{Y_{\rm conv}}=\mathbf{W_{\rm conv}}\cdot\mathbf{X_{\rm conv}}+\mathbf{B_{\rm conv}}$$
+$$\mathbf{Y_{\rm conv}}=\mathbf{W_{\rm conv}}\cdot\mathbf{X_{\rm conv}}+\mathbf{B_{\rm conv}}, \text{equ:ch-deploy/conv-equation}$$
 
 Here, we do not need to understand what each variable means. Instead, we
 only need to keep in mind that Equation
@@ -110,8 +109,8 @@ Equation
 reference="equ:ch-deploy/bn-equation"} is about the computation of
 Batchnorm:
 
-$$[equ:ch-deploy/bn-equation]:
-\mathbf{Y_{\rm bn}}=\gamma\frac{\mathbf{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$
+**equ:ch-deploy/bn-equation:**\
+$$\mathbf{Y_{\rm bn}}=\gamma\frac{\mathbf{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$
 
 Similarly, it is an equation for $\mathbf{Y_{\rm bn}}$ with respect to
 $\mathbf{X_{\rm bn}}$. Other symbols in the equation represent constants.
@@ -126,8 +125,7 @@ uniting and extracting the constants, we obtain Equation
 [\[equ:ch-deploy/conv-bn-equation-3\]](#equ:ch-deploy/conv-bn-equation-3){reference-type="ref"
 reference="equ:ch-deploy/conv-bn-equation-3"}.
 
-$$[equ:ch-deploy/conv-bn-equation-3]
-\mathbf{Y_{\rm bn}}=\mathbf{A}\cdot\mathbf{X_{\rm conv}}+\mathbf{B}$$
+$$\mathbf{Y_{\rm bn}}=\mathbf{A}\cdot\mathbf{X_{\rm conv}}+\mathbf{B}, \text{equ:ch-deploy/conv-bn-equation-3}$$
 
 Here, $\mathbf{A}$ and $\mathbf{B}$ are two matrices. It can be noticed that
 Equation
@@ -195,8 +193,7 @@ folding the constants, Batchnorm is defined as Equation
 [\[equ:ch-deploy/replace-scale\]](#equ:ch-deploy/replace-scale){reference-type="ref"
 reference="equ:ch-deploy/replace-scale"}
 
-$$[equ:ch-deploy/replace-scale]
-\mathbf{Y_{bn}}=scale\cdot\mathbf{X_{bn}}+offset$$
+$$\mathbf{Y_{bn}}=scale\cdot\mathbf{X_{bn}}+offset, \text{equ:ch-deploy/replace-scale} $$
 
 where **scale** and **offsets** are scalars. This simplified formula can
 be mapped to a Scale operator.
diff --git a/chapter_model_deployment/Model_Compression.md b/chapter_model_deployment/Model_Compression.md
@@ -54,7 +54,7 @@ reference="equ:ch-deploy/quantization-q"}, assume that $r$ represents
 the floating-point number before quantization. We are then able to
 obtain the integer $q$ after quantization.
 
-$$[equ:ch-deploy/quantization-q]q=clip(round(\frac{r}{s}+z),q_{min},q_{max})$$
+$$q=clip(round(\frac{r}{s}+z),q_{min},q_{max}), \text{equ:ch-deploy/quantization-q}$$
 
 $clip(\cdot)$ and $round(\cdot)$ indicate the truncation and rounding
 operations, and $q_{min}$ and $q_{max}$ indicate the minimum and maximum
@@ -174,12 +174,12 @@ $||\hat{w_c}-E(\hat{w_c})||$, respectively. Equation
 reference="equ:ch-deploy/post-quantization"} is the calibration of the
 weight:
 
-$$[equ:ch-deploy/post-quantization]
+$$
 \begin{aligned}
 \hat{w_c}\leftarrow\zeta_c(\hat{w_c}+u_c) \\
 u_c=E(w_c)-E(\hat{w_c})   \\
 \zeta_c=\frac{||w_c-E(w_c)||}{||\hat{w_c}-E(\hat{w_c})||}
-\end{aligned}$$
+\end{aligned}, \text{equ:ch-deploy/post-quantization}$$
 
 As a general model compression method, quantization can significantly
 improve the memory and compression efficiency of neural networks, and
@@ -326,8 +326,7 @@ classification result of the teacher network, that is, Equation
 [\[c2Fcn:distill\]](#c2Fcn:distill){reference-type="ref"
 reference="c2Fcn:distill"}.
 
-$$\mathcal{L}_{KD}(\theta_S) = \mathcal{H}(o_S,\mathbf{y}) +\lambda\mathcal{H}(\tau(o_S),\tau(o_T)),
-[c2Fcn:distill]$$
+$$\mathcal{L}_{KD}(\theta_S) = \mathcal{H}(o_S,\mathbf{y}) +\lambda\mathcal{H}(\tau(o_S),\tau(o_T)), \text{c2Fcn:distill}$$
 
 where $\mathcal{H}(\cdot,\cdot)$ is the cross-entropy function, $o_S$
 and $o_T$ are outputs of the student network and the teacher network,