fix

pianfan · pianfan · commit f61ebd82fed8 · 2025-11-20T17:42:12.000+08:00
diff --git a/_posts/2025-11-19-d2l_recurrent-neural-networks.md b/_posts/2025-11-19-d2l_recurrent-neural-networks.md
@@ -20,7 +20,7 @@ author: Pianfan
 
 1. **自回归模型（autoregressive models）**：使用过去 τ 个时间步的观测值 $x_{t-1},...,x_{t-τ}$ 预测 $x_t$，参数数量固定
 
-2. **隐变量自回归模型（latent autoregressive models）**：保留过去观测的总结 $h_t$，通过 $\hat{x}_t = P(x_t \mid h_t)$ 和 $h_t = g(h_{t-1}, x_{t-1})$ 更新
+2. **隐变量自回归模型（latent autoregressive models）**：保留过去观测的总结 $h_t$，通过 $\hat{x}_t = P(x_t \mid h_t) 和 h_t = g(h_{t-1}, x_{t-1})$ 更新
 
 3. **马尔可夫模型（first-order Markov model）**：满足马尔可夫条件（仅需近期历史），一阶模型满足 $P(x_1,...,x_T) = \prod_{t=1}^T P(x_t \mid x_{t-1})$（$P(x_1 \mid x_0) = P(x_1)$）
 
@@ -30,7 +30,7 @@ author: Pianfan
 
 1. **数据生成**：生成带噪声的正弦序列作为示例数据
 
-2. **特征标签构造**：基于 τ，构建特征 $\mathbf{x}_t = [x_{t-\tau}, \ldots, x_{t-1}]$ 和标签 $y_t = x_t$
+2. **特征标签构造**：基于 τ，构建特征和标签 $\mathbf{x}_t = [x_{t-\tau}, \ldots, x_{t-1}], y_t = x_t$
 
 3. **模型架构**：简单多层感知机（2 个全连接层 + ReLU 激活）
 
@@ -49,7 +49,7 @@ author: Pianfan
 
 1. **单步预测（one-step-ahead prediction）**：直接预测下一个时间步的值
 
-2. **$k$ 步预测（$k$-step-ahead prediction）**：使用自身预测结果作为输入进行后续预测，$\hat{x}_{t+k}$ 需基于 $\hat{x}_{t+1},...,\hat{x}_{t+k-1}$
+2. **$k$ 步预测（$k$-step-ahead prediction）**：使用自身预测结果作为输入进行后续预测，$\hat{x}_{t+1},...,\hat{x}_{t+k-1} => \hat{x}_{t+k}$
 
 3. 预测误差随步数增加而累积
 
@@ -144,7 +144,7 @@ author: Pianfan
 
 ### 8.3.1. 自然语言统计
 
-**词频特性**：遵循齐普夫定律（Zipf’s law）
+**词频特性**：遵循齐普夫定律（Zipf's law）
 
 - 公式：$n_i \propto \frac{1}{i^\alpha}$ 或 $\log n_i = -\alpha \log i + c$
 
@@ -220,11 +220,11 @@ def load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tok
 
 ### 8.4.2. 有隐状态的循环神经网络
 
-**隐状态计算**：$\mathbf{H}_t = \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh}  + \mathbf{b}_h)$，$\mathbf{X}_t \in \mathbb{R}^{n \times d}$，$\mathbf{H}_t \in \mathbb{R}^{n \times h}$，$\mathbf{W}_{hh} \in \mathbb{R}^{h \times h}$
+**隐状态计算**：$\mathbf{H}_t = \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh}  + \mathbf{b}_h), \mathbf{X}_t \in \mathbb{R}^{n \times d}, \mathbf{H}_t \in \mathbb{R}^{n \times h}, \mathbf{W}_{hh} \in \mathbb{R}^{h \times h}$
 
 **输出计算**：$\mathbf{O}_t = \mathbf{H}_t \mathbf{W}_{hq} + \mathbf{b}_q$
 
-**参数特性**：参数（$\mathbf{W}_{xh}, \mathbf{W}_{hh}, \mathbf{b}_h, \mathbf{W}_{hq}, \mathbf{b}_q$）在不同时间步共享，参数数量不随时间步增加
+**参数特性**：参数在不同时间步共享，参数数量不随时间步增加
 
 ### 8.4.3. 基于循环神经网络的字符级语言模型