Skip to content

Commit 05885f3

Browse files
committed
feat: a8 finishes
1 parent 03d3ce9 commit 05885f3

File tree

1 file changed

+218
-9
lines changed

1 file changed

+218
-9
lines changed

notebooks/theory_assignments/A8/P_Assignment8_yecanming.ipynb

Lines changed: 218 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
"highlight-style: pygments\n",
2626
"date-format: full\n",
2727
"lang: zh\n",
28-
"bibliography: [../../references.bib]\n",
28+
"bibliography: [../../coding_projects/digital_processing_of_speech_signals/hmm.bib]\n",
2929
"format: \n",
3030
" html:\n",
3131
" code-fold: false\n",
@@ -924,10 +924,101 @@
924924
"cell_type": "markdown",
925925
"metadata": {},
926926
"source": [
927-
"- 前向概率和后向概率的定义是什么?\n",
927+
"\n",
928+
"- 我们需要理解题目给的式子中符号的含义。前向概率和后向概率的定义是什么?\n",
928929
"- 李航给出的这个式子的意义是什么?\n",
929930
"\n",
930-
"我们需要理解题目给的式子中符号的含义。\n"
931+
"首先我们来明确一下,这道题目提出的问题还是概率计算问题,也就是说给定隐马尔可夫模型(HMM)$\\lambda=(A,B,pi)$,观察序列 $O=(o_1,o_2,...,o_T)$,我们想要求模型参数生成了这个观察序列的概率 $P(O|\\lambda)$。\n",
932+
"\n",
933+
"我们首先来复习一下前向概率和后向概率的定义。\n",
934+
"\n",
935+
"如果我们没有对观察序列是一个可以拆开的序列的理解,只是把 O, I 当做一个整体去看,李航首先在书中给出了暴力的(概念上可行但是计算上不可行的)直接计算法 [@LiHang_2022] ,Rabiner论文叫做\"straightforward way\" [@Rabiner_1989]。而根据进一步的观察,可以导出这个问题的两个动态规划算法,分别是前向算法和后向算法。\n",
936+
"\n",
937+
"**而李航在这道题目似乎是给出了一个新的算法,又使用前向概率,又使用后向概率!**\n",
938+
"\n",
939+
"::: {.callout-note}\n",
940+
"注意,深度学习中反向传播求梯度算法与之不同,Hinton推广的反向传播算法确实也是节省计算量的算法,但是那个里面模型需要前向传播也需要反向传播,都要计算,而我们这里前向算法和后向算法是两个独立的、都可以解决问题的算法,尽管李航 [@LiHang_2022] 和 Rabiner [@Rabiner_1989] 都不恰当的合在一起命名为 \"前向-后向算法\"\"The Forward-Backward Procedure\",让人误解。\n",
941+
"\n",
942+
"特别提到这个问题时因为最近Hinton又提出了所谓的 \"Forward-Forward Algorithm\",这里面就是两个Forward都要做。所以我们命名方法的时候一定要精确。\n",
943+
"::: \n",
944+
"\n",
945+
"#### 前向算法的复习 \n",
946+
"\n",
947+
"前向算法是典型的动态规划思路 (不只是维特比算法是动态规划)。\n",
948+
"首先我求 $P(O|\\lambda)$ 可以有多种加和方法,不一定是从 $P(O, I|\\lambda)$ 边缘化得到的。我们可以考虑所谓的前向概率,即 $P(O_{1:t}, I_t = q_i | \\lambda)$ ,其中O_{1:t}的冒号是索引表达,而P(X=x)表示随机变量X取一个值x的概率密度。 这个概率记作 $a_{t}(i)$ ,是t和i的函数, 其中 $q_i$ 表示第i个状态。\n",
949+
"\n",
950+
"如果我求出了 $P(O_{1:T}, I_T = q_i | \\lambda)$ 所有i下的情况,那我就能求出 $P(O|\\lambda)$ 。\n",
951+
"\n",
952+
"而 $P(O_{1:T}, I_T = q_i | \\lambda)$ 本来也是不好求的。因为我们可以看着贝叶斯网络 @fig-hmm-demo, 可以发现我只知道 $P(O_t | I_t)$ 和 $P(I_t | I_{t-1})$ 就算你帮我减少到只有 $I_T = q_i$, $O_T$自然是和前面大家无关了, $O_{1:T}$ 这里还有 T-1个独立的rv需要遍历所有情况。\n",
953+
"\n",
954+
"我们首先通过贝叶斯网络的观察,化简了问题为 $P(O_{1:T}, I_T = q_i | \\lambda) = P(O_{1:T-1}, I_T = q_i | \\lambda) \\times P( O_T | I_T = q_i, \\lambda) $,但是接下来$P(O_{1:T-1}, I_T = q_i | \\lambda)$要怎么办呢?\n",
955+
"\n",
956+
"这个时候,我们将问题拆解为重叠的子问题,如果我知道 $P(O_{1:T-1}, I_{T-1} = q_j | \\lambda)$ 所有的j的情况呢?\n",
957+
"\n",
958+
"想到了这个,我们恍然大悟,只需要再来一个A矩阵,就可以用求和公式和贝叶斯公式得到 $P(O_{1:T-1}, I_T = q_i | \\lambda) = [\\sum_{j=1}^N P(O_{1:t-1}, I_{t-1} = q_j | \\lambda) \\cdot a_{ji}]$ 。\n",
959+
"<!-- TODO -->\n",
960+
"\n",
961+
"所以\n",
962+
"\n",
963+
"$P(O_{1:t}, I_T = q_i | \\lambda) = [\\sum_{j=1}^N P(O_{1:t-1}, I_{t-1} = q_j | \\lambda) \\cdot a_{ji}] \\times b_i(O_t)$\n",
964+
"\n",
965+
"至此,我们就推导出了前向算法,按照李航书上的符号总结为算法步骤如下:\n",
966+
"\n",
967+
"$$\n",
968+
"\\boxed{\n",
969+
"\\begin{aligned}\n",
970+
"&\\text{输入:隐马尔可夫模型 } \\lambda, \\text{ 观测序列 } O. \\\\\n",
971+
"&\\text{输出:观测序列概率 } P(O|\\lambda). \\\\\n",
972+
"&\\text{(1) 初值} \\\\\n",
973+
"&\\alpha_1(i) = \\pi_i b_i(o_1), \\quad i = 1, 2, \\ldots, N \\quad \\\\\n",
974+
"&\\text{(2) 递推} \\\\\n",
975+
"&\\text{对于 } t = 1, 2, \\ldots, T - 1: \\\\\n",
976+
"&\\alpha_{t+1}(i) = \\left[ \\sum_{j=1}^{N} \\alpha_t(j) a_{ji} \\right] b_i(o_{t+1}), \\quad i = 1, 2, \\ldots, N \\quad \\\\\n",
977+
"&\\text{(3) 终止} \\\\\n",
978+
"&P(O|\\lambda) = \\sum_{i=1}^{N} \\alpha_T(i) \\quad \n",
979+
"\\end{aligned}\n",
980+
"}\n",
981+
"$$\n",
982+
"\n",
983+
"前向算法运用广泛,Zhang在论文中就采用了前向算法来计算HMM的概率 [@Zhang_2024]。\n",
984+
"\n",
985+
"#### 后向算法的复习 \n",
986+
"\n",
987+
"后向算法的思路与前向算法基本一致,只是递推的方向反过来,最后会尝试从 i_1 去求和得到 $P(O|\\lambda)$ 。前向算法是从前到后计算$\\alpha_t(i)$,而后向算法是从后往前计算$\\beta_t(i)$。\n",
988+
"\n",
989+
"首先我们按照李航书上对后向概率 $\\beta$ 进行定义 [@LiHang_2022]:\n",
990+
"\n",
991+
"> $$\n",
992+
"> \\beta_t(i) = P(o_{t+1}, o_{t+2}, \\cdots, o_T \\mid i_t = q_i, \\lambda)\n",
993+
"> $$\n",
994+
"\n",
995+
"也就是时刻t位于某个状态$q_i$的条件下,观测序列的后续部分的概率。\n",
996+
"\n",
997+
"这里的关键还是递推。从时间$T-1$向前递推,计算每个时间$t$的$\\beta_t(i)$。在时间$t$,如果模型处于状态$i$,那么接下来观测到$o_{t+1}$的概率取决于从状态$i$转移到所有可能的下一个状态$j$的概率$a_{ij}$,以及在状态$j$下观测到$o_{t+1}$的概率$b_j(o_{t+1})$,再乘以下一个时间点的$\\beta_{t+1}(j)$。\n",
998+
"\n",
999+
"因此,递推公式为:\n",
1000+
"\n",
1001+
"$$\n",
1002+
"\\beta_t(i) = \\sum_{j=1}^{N} a_{ij} b_j(o_{t+1}) \\beta_{t+1}(j), \\quad \\forall i = 1, 2, \\ldots, N; \\ t = T-1, T-2, \\ldots, 1 \\quad \n",
1003+
"$$\n",
1004+
"\n",
1005+
"整体的算法如下\n",
1006+
"\n",
1007+
"$$\n",
1008+
"\\boxed{\n",
1009+
"\\begin{aligned}\n",
1010+
"&\\text{输入:隐马尔可夫模型 } \\lambda, \\text{ 观测序列 } O. \\\\\n",
1011+
"&\\text{输出:观测序列概率 } P(O|\\lambda). \\\\\n",
1012+
"&\\text{(1) 初始化} \\\\\n",
1013+
"&\\beta_T(i) = 1, \\quad \\forall i = 1, 2, \\ldots, N \\quad \\\\\n",
1014+
"&\\text{(2) 递推} \\\\\n",
1015+
"&\\text{对于 } t = T-1, T-2, \\ldots, 1: \\\\\n",
1016+
"&\\beta_t(i) = \\sum_{j=1}^{N} a_{ij} b_j(o_{t+1}) \\beta_{t+1}(j), \\quad \\forall i = 1, 2, \\ldots, N \\quad \\\\\n",
1017+
"&\\text{(3) 终止} \\\\\n",
1018+
"&P(O|\\lambda) = \\sum_{i=1}^{N} \\pi_i b_i(o_1) \\beta_1(i) \\quad \n",
1019+
"\\end{aligned}\n",
1020+
"}\n",
1021+
"$$\n"
9311022
]
9321023
},
9331024
{
@@ -941,18 +1032,136 @@
9411032
"cell_type": "markdown",
9421033
"metadata": {},
9431034
"source": [
944-
"### 题目扩展问题\n",
1035+
"我们首先来看题目所给的式子\n",
9451036
"\n",
1037+
"> $P(O|\\lambda)=\\sum_{i=1}^{N}\\sum_{j=1}^{N}\\alpha_t(i)a_{ij}b_j(o_{t+1})\\beta_{t+1}(j), \\quad t=1,2,\\cdots,T-1$\n",
9461038
"\n",
947-
"\n"
1039+
"其中 \n",
1040+
"$\\beta_t(i) = P(o_{t+1}, o_{t+2}, \\cdots, o_T \\mid i_t = q_i, \\lambda)$, $\\alpha_t(i) = P(o_1, o_2, \\cdots, o_t, i_t = q_i | \\lambda)$"
9481041
]
9491042
},
9501043
{
951-
"cell_type": "code",
952-
"execution_count": null,
1044+
"cell_type": "markdown",
9531045
"metadata": {},
954-
"outputs": [],
955-
"source": []
1046+
"source": [
1047+
"题目式子是说,如果我们考虑任何一个时间点t和下一个时间点t+1, t时刻的状态设为$q_i$, t+1时刻的状态设为$q_j$。首先,发生这个转移的状态转移的概率是 $a_ij$,这很好理解。从t+1的状态导出的 $o_{t+1}$的概率是 $b_j(o_{t+1})$,这也很好理解。\n",
1048+
"\n",
1049+
"题目想问的是,如果我们这个时候把 t的前向概率和t+1的后向概率拼接在一起,能不能得到整一个O的概率?答案是肯定的。我们使用概率公式进行推导证明这个公式。\n",
1050+
"\n",
1051+
"--- \n"
1052+
]
1053+
},
1054+
{
1055+
"cell_type": "markdown",
1056+
"metadata": {},
1057+
"source": [
1058+
"所谓证明,我们要不是从左边的式子展开得到右边的式子,要不是从右边的式子合并得到左边的式子。\n",
1059+
"\n",
1060+
"这里我们选择从左边的式子展开得到右边的式子。我们可以将$P(O|\\lambda)$在t和t+1的位置展开。\n",
1061+
"\n",
1062+
"根据全概率公式,我们可以将$P(O|\\lambda)$表示为所有可能的状态序列的概率之和:\n",
1063+
"\n",
1064+
"$$\n",
1065+
"P(O|\\lambda) = \\sum_{i=1}^{N} \\sum_{j=1}^{N} P(o_1, \\ldots, o_t, i_t = q_i, i_{t+1} = q_j, o_{t+1}, \\ldots, o_T | \\lambda)\n",
1066+
"$$\n",
1067+
"\n",
1068+
"这里,我们引入了中间状态变量$i_t = q_i$和$i_{t+1} = q_j$。\n",
1069+
"\n",
1070+
"注意李航的书上符号稍微有些混乱,不要把下标i和状态序列i搞混了。\n",
1071+
"\n",
1072+
"这里我们已经初步看到了右边式子求和的结构。\n",
1073+
"\n",
1074+
"---\n"
1075+
]
1076+
},
1077+
{
1078+
"cell_type": "markdown",
1079+
"metadata": {},
1080+
"source": [
1081+
"接下来,我们只需要证明这个联合概率可以分解成已知的变量:\n",
1082+
"\n",
1083+
"$$\n",
1084+
"P(o_1, \\ldots, o_t, i_t = q_i, i_{t+1} = q_j, o_{t+1}, \\ldots, o_T | \\lambda) = \\alpha_t(i) \\cdot a_{ij} \\cdot b_j(o_{t+1}) \\cdot \\beta_{t+1}(j)\n",
1085+
"$$\n",
1086+
"\n",
1087+
"要想证明这个,我们补充一下贝叶斯网络的知识 [@Zhou_2016]。\n",
1088+
"\n",
1089+
"贝叶斯网络展示了这些变量之间的依赖关系。重要的是:\n",
1090+
"\n",
1091+
"1. 每个隐状态 $ i_{t} $ 只依赖于前一个隐状态 $ i_{t-1} $。\n",
1092+
"2. 每个观测 $ o_t $ 只依赖于对应的隐状态 $ i_t $.\n",
1093+
"\n",
1094+
"因此,贝叶斯网络的依赖结构如下:\n",
1095+
"\n",
1096+
"$$ i_1 \\rightarrow i_2 \\rightarrow \\ldots \\rightarrow i_T $$\n",
1097+
"$$ i_1 \\rightarrow o_1, \\quad i_2 \\rightarrow o_2, \\ldots, i_T \\rightarrow o_T $$\n"
1098+
]
1099+
},
1100+
{
1101+
"cell_type": "markdown",
1102+
"metadata": {},
1103+
"source": [
1104+
"根据贝叶斯网络的概率的链式法则,联合概率可以分解,每一个联合概率里面出现的rv,只需要考虑自己的父节点作为条件,而不用考虑其他节点的影响,最后直接乘在一起。因此,我们可以将联合概率分解为如下形式:\n",
1105+
"\n",
1106+
"$$\n",
1107+
"P(o_1, \\ldots, o_t, i_t = q_i, i_{t+1} = q_j, o_{t+1}, \\ldots, o_T | \\lambda) = P(o_1, \\ldots, o_t, i_t = q_i | \\lambda) \n",
1108+
"\\\\ \\cdot P(i_{t+1} = q_j | i_t = q_i, \\lambda) \n",
1109+
"\\\\ \\cdot P(o_{t+1}, \\ldots, o_T | i_{t+1} = q_j, \\lambda)\n",
1110+
"$$\n",
1111+
"\n",
1112+
"很多同学推导到这里就懵了,这里我们只有三个式子,原本有四个式子呀,去哪了?\n",
1113+
"\n",
1114+
"原来,因为 $\\beta_t(i) = P(o_{t+1}, o_{t+2}, \\cdots, o_T \\mid i_t = q_i, \\lambda)$ \n",
1115+
"也就是说 $\\beta_{t+1}(i) = P(o_{t+2}, o_{t+3}, \\cdots, o_T \\mid i_{t+1} = q_i, \\lambda)$ \n",
1116+
"\n",
1117+
"而这里面的 第三项乘积项是 $P(o_{t+1}, \\ldots, o_T | i_{t+1} = q_j, \\lambda)$\n",
1118+
"\n",
1119+
"正是因为这个差别,所以多出了 $b_j(o_{t+1})$, 因为\n",
1120+
"\n",
1121+
"$$\n",
1122+
"P(o_{t+1}, \\ldots, o_T | i_{t+1} = q_j, \\lambda) = P(o_{t+2}, o_{t+3}, \\cdots, o_T \\mid i_{t+1} = q_j, \\lambda) \\times P(o_{t+1} \\mid i_{t+1} = q_j, \\lambda)\n",
1123+
"$$\n",
1124+
"\n",
1125+
"至此我们就证明清楚了\n",
1126+
"\n",
1127+
"\n",
1128+
"$$\n",
1129+
"P(o_1, \\ldots, o_t, i_t = q_i, i_{t+1} = q_j, o_{t+1}, \\ldots, o_T | \\lambda) = \\alpha_t(i) \\cdot a_{ij} \\cdot b_j(o_{t+1}) \\cdot \\beta_{t+1}(j)\n",
1130+
"$$"
1131+
]
1132+
},
1133+
{
1134+
"cell_type": "markdown",
1135+
"metadata": {},
1136+
"source": [
1137+
"\n",
1138+
"\n",
1139+
"--- \n"
1140+
]
1141+
},
1142+
{
1143+
"cell_type": "markdown",
1144+
"metadata": {},
1145+
"source": [
1146+
"结合以上结果, 把 $P(o_1, \\ldots, o_t, i_t = q_i, i_{t+1} = q_j, o_{t+1}, \\ldots, o_T | \\lambda) = \\alpha_t(i) \\cdot a_{ij} \\cdot b_j(o_{t+1}) \\cdot \\beta_{t+1}(j)$ 代入到全概率公式里面,那么\n",
1147+
"\n",
1148+
"$$\n",
1149+
"P(O|\\lambda) = \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_t(i) \\cdot a_{ij} \\cdot b_j(o_{t+1}) \\cdot \\beta_{t+1}(j)\n",
1150+
"$$\n",
1151+
"\n",
1152+
"\n",
1153+
"这个公式表明,在每个时间点$t$,观测序列的概率可以通过前向变量$\\alpha_t(i)$、状态转移概率$a_{ij}$、观测概率$b_j(o_{t+1})$和后向变量$\\beta_{t+1}(j)$的结合来计算。\n"
1154+
]
1155+
},
1156+
{
1157+
"cell_type": "markdown",
1158+
"metadata": {},
1159+
"source": [
1160+
"### 题目扩展问题\n",
1161+
"\n",
1162+
"\n",
1163+
"\n"
1164+
]
9561165
},
9571166
{
9581167
"cell_type": "code",

0 commit comments

Comments
 (0)