Refine equation labeling

mikebo93 · mikebo93 · commit c578d7ae4e85 · 2025-03-21T03:31:00.000Z
diff --git a/chapter_model_deployment/Advanced_Efficient_Techniques.md b/chapter_model_deployment/Advanced_Efficient_Techniques.md
@@ -52,8 +52,7 @@ $M_{\text{target}}(\text{prefix} + [x_1 + ... + x_{\gamma}])$. If the
 condition $q(x) < p(x)$ is met, the token is retained. In contrast, if
 not met, the token faces a rejection chance of $1 - \frac{p(x)}{q(x)}$,
 following which it is reselected from an adjusted distribution:
-$$
-p'(x) = norm(max(0, p(x) - q(x)))$$ 
+$$p'(x) = norm(max(0, p(x) - q(x)))$$ 
 :eqlabel:`equ:sd_adjusted` In the paper [@leviathan2023fast],
 Leviathan et al. have proved the correctness of this adjusted
 distribution for resampling.
@@ -160,14 +159,12 @@ reading and writing the large attention matrix to and from HBM. And
 perform computation in SRAM as much as possible.
 
 The standard Scaled Dot-Product Attention [@attention] formula is
-$$
-\textbf{A} = Softmax(\frac{\textbf{QK}^T}{\sqrt{d_k}})\textbf{V}$$ 
+$$\textbf{A} = Softmax(\frac{\textbf{QK}^T}{\sqrt{d_k}})\textbf{V}$$ 
 :eqlabel:`equ:std_attn`
 
 As $d_k$ is a scalar, we can simplify it into three parts:
 
-$$
-\begin{aligned}
+$$\begin{aligned}
     \textbf{S} = \textbf{QK}^T\\
     \textbf{P} = Softmax(\textbf{S})\\
     \textbf{O} = \textbf{PV}
diff --git a/chapter_model_deployment/Conversion_to_Inference_Model_and_Model_Optimization.md b/chapter_model_deployment/Conversion_to_Inference_Model_and_Model_Optimization.md
@@ -90,8 +90,7 @@ understood as the simplification of an equation. The computation of
 Convolution is expressed as Equation
 :eqref:`ch-deploy/conv-equation`.
 
-$$
-\bm{Y_{\rm conv}}=\bm{W_{\rm conv}}\cdot\bm{X_{\rm conv}}+\bm{B_{\rm conv}}$$ 
+$$\bm{Y_{\rm conv}}=\bm{W_{\rm conv}}\cdot\bm{X_{\rm conv}}+\bm{B_{\rm conv}}$$ 
 :eqlabel:`equ:ch-deploy/conv-equation`
 
 Here, we do not need to understand what each variable means. Instead, we
@@ -104,8 +103,7 @@ Equation
 :eqref:`ch-deploy/bn-equation` is about the computation of
 Batchnorm:
 
-$$
-\bm{Y_{\rm bn}}=\gamma\frac{\bm{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$ 
+$$\bm{Y_{\rm bn}}=\gamma\frac{\bm{X_{\rm bn}}-\mu_{\mathcal{B}}}{\sqrt{{\sigma_{\mathcal{B}}}^{2}+\epsilon}}+\beta$$ 
 :eqlabel:`equ:ch-deploy/bn-equation`
 
 Similarly, it is an equation for $\bm{Y_{\rm bn}}$ with respect to
@@ -119,8 +117,7 @@ After substituting $\bm{Y_{\rm conv}}$ into $\bm{X_{\rm bn}}$ and
 uniting and extracting the constants, we obtain Equation
 :eqref:`ch-deploy/conv-bn-equation-3`.
 
-$$
-\bm{Y_{\rm bn}}=\bm{A}\cdot\bm{X_{\rm conv}}+\bm{B}$$ 
+$$\bm{Y_{\rm bn}}=\bm{A}\cdot\bm{X_{\rm conv}}+\bm{B}$$ 
 :eqlabel:`equ:ch-deploy/conv-bn-equation-3`
 
 Here, $\bm{A}$ and $\bm{B}$ are two matrices. It can be noticed that
@@ -186,8 +183,7 @@ principle of operator replacement. After decomposing Equation
 folding the constants, Batchnorm is defined as Equation
 :eqref:`ch-deploy/replace-scale`
 
-$$
-\bm{Y_{bn}}=scale\cdot\bm{X_{bn}}+offset$$ 
+$$\bm{Y_{bn}}=scale\cdot\bm{X_{bn}}+offset$$ 
 :eqlabel:`equ:ch-deploy/replace-scale`
 
 where **scale** and **offsets** are scalars. This simplified formula can
diff --git a/chapter_model_deployment/Model_Compression.md b/chapter_model_deployment/Model_Compression.md
@@ -172,8 +172,7 @@ $||\hat{w_c}-E(\hat{w_c})||$, respectively. Equation
 :eqref:`ch-deploy/post-quantization` is the calibration of the
 weight:
 
-$$
-\begin{aligned}
+$$\begin{aligned}
 \hat{w_c}\leftarrow\zeta_c(\hat{w_c}+u_c) \\
 u_c=E(w_c)-E(\hat{w_c})   \\
 \zeta_c=\frac{||w_c-E(w_c)||}{||\hat{w_c}-E(\hat{w_c})||}
diff --git a/chapter_model_deployment/Model_Inference.md b/chapter_model_deployment/Model_Inference.md
@@ -258,8 +258,7 @@ be written using matrices as Equation
 :eqref:`ch-deploy/conv-matmul-one-dimension`, which contains six
 multiplications and four additions.
 
-$$
-\textit{\textbf{F}}(2, 3)=
+$$\textit{\textbf{F}}(2, 3)=
 \left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
 \left[ \begin{matrix} y_0 \\ y_1 \end{matrix} \right]$$ 
 :eqlabel:`equ:ch-deploy/conv-matmul-one-dimension`
@@ -271,17 +270,15 @@ multiplication. The matrix multiplication result may be obtained by
 computing an intermediate variable $m_0-m_3$, as shown in Equation
 :eqref:`ch-deploy/conv-2-winograd`:
 
-$$
-\textit{\textbf{F}}(2, 3)=
+$$\textit{\textbf{F}}(2, 3)=
 \left[ \begin{matrix} d_0 & d_1 & d_2 \\ d_1 & d_2 & d_3 \end{matrix} \right] \times \left[ \begin{matrix} g_0 \\ g_1 \\ g_2 \end{matrix} \right]=
 \left[ \begin{matrix} m_0+m_1+m_2 \\ m_1-m_2+m_3 \end{matrix} \right]$$ 
 :eqlabel:`equ:ch-deploy/conv-2-winograd`
 
 where $m_0-m_3$ are computed as Equation
 :eqref:`ch-deploy/winograd-param`:
 
-$$
-\begin{aligned}
+$$\begin{aligned}
 m_0=(d_0-d_2) \times g_0 \\
 m_1=(d_1+d_2) \times (\frac{g_0+g_1+g_2}{2}) \\
 m_2=(d_0-d_2) \times (\frac{g_0-g_1+g_2}{2}) \\
@@ -310,22 +307,18 @@ matrix computation is performed based on the handwritten form, as
 provided in Equation
 :eqref:`ch-deploy/winograd-param`.
 
-$$
-\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g) \odot (\textit{\textbf{B}}^{\rm T}d)$$ 
+$$\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g) \odot (\textit{\textbf{B}}^{\rm T}d)$$ 
 :eqlabel:`equ:ch-deploy/winograd-matrix`
 
-$$
-\textit{\textbf{B}}^{\rm T}=
+$$\textit{\textbf{B}}^{\rm T}=
 \left[ \begin{matrix} 1 & 0 & -1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & -1 \end{matrix} \right]$$ 
 :eqlabel:`equ:ch-deploy/winograd-matrix-bt`
 
-$$
-\textit{\textbf{G}}=
+$$\textit{\textbf{G}}=
 \left[ \begin{matrix} 1 & 0 & 0 \\ 0.5 & 0.5 & 0.5 \\ 0.5 & -0.5 & 0.5 \\ 0 & 0 & 1 \end{matrix} \right]$$ 
 :eqlabel:`equ:ch-deploy/winograd-matrix-g`
 
-$$
-\textit{\textbf{A}}^{\rm T}=
+$$\textit{\textbf{A}}^{\rm T}=
 \left[ \begin{matrix} 1 & 1 & -1 & 0 \\ 0 & 1 & -1 & -1  \end{matrix} \right] \\$$ 
 :eqlabel:`equ:ch-deploy/winograd-matrix-at`
 
@@ -337,8 +330,7 @@ shown in Equation
 Winograd has 16 multiplications, reducing the computation complexity by
 2.25 times compared with 36 multiplications of the original convolution.
 
-$$
-\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g\textit{\textbf{G}}^{\rm T}) \odot (\textit{\textbf{B}}^{\rm T}d\textit{\textbf{B}})\textit{\textbf{A}}$$ 
+$$\textit{\textbf{Y}}=\textit{\textbf{A}}^{\rm T}(\textit{\textbf{G}}g\textit{\textbf{G}}^{\rm T}) \odot (\textit{\textbf{B}}^{\rm T}d\textit{\textbf{B}})\textit{\textbf{A}}$$ 
 :eqlabel:`equ:ch-deploy/winograd-two-dimension-matrix`
 
 The logical process of Winograd can be divided into four steps, as shown