Skip to content

Commit ce03544

Browse files
committed
feat: question 1.x
1 parent c545130 commit ce03544

File tree

4 files changed

+295
-48
lines changed

4 files changed

+295
-48
lines changed

notebooks/coding_projects/digital_processing_of_speech_signals/P2_HMM/00hidden_markov_model.ipynb

Lines changed: 259 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -148,13 +148,21 @@
148148
},
149149
{
150150
"cell_type": "code",
151-
"execution_count": 2,
151+
"execution_count": 7,
152152
"metadata": {},
153153
"outputs": [],
154154
"source": [
155155
"from scholarly_infrastructure.logging.nucleus import logger, print\n",
156156
"from thu_big_data_ml.help import plt, pio\n",
157-
"import treescope\n",
157+
"import treescope"
158+
]
159+
},
160+
{
161+
"cell_type": "code",
162+
"execution_count": 6,
163+
"metadata": {},
164+
"outputs": [],
165+
"source": [
158166
"treescope.active_autovisualizer.set_globally(treescope.ArrayAutovisualizer())"
159167
]
160168
},
@@ -513,7 +521,9 @@
513521
"> \n",
514522
"> 隐马尔可夫模型由初始状态概率向量$\\pi$、状态转移概率矩阵 A 和观测概率矩阵 B决定。$\\pi$和 A决定状态序列,B决定观测序列。因此,隐马尔可夫模型$\\lambda$可以用三元符号表示,即 $\\lambda = (\\pi, A, B)$, $A, B, \\pi$ 称为隐马尔可夫模型的三要素。\n",
515523
"\n",
516-
"李航强调了HMM模型用于机器学习的标注(Tagging)问题,隐藏状态对应的是标记 [@LiHang_2022]。"
524+
"李航强调了HMM模型用于机器学习的标注(Tagging)问题,隐藏状态对应的是标记 [@LiHang_2022]。\n",
525+
"\n",
526+
"我们使用graphviz可视化一下:"
517527
]
518528
},
519529
{
@@ -522,78 +532,280 @@
522532
"metadata": {},
523533
"outputs": [],
524534
"source": [
525-
"from graphviz import Digraph\n",
526-
"\n",
527-
"def draw_hmm(states, observations, start_probability, transition_probability, emission_probability):\n",
528-
" dot = Digraph(comment='The Round Table')\n",
535+
"import matplotlib.pyplot as plt\n",
536+
"import networkx as nx\n",
537+
"from graphviz import Digraph"
538+
]
539+
},
540+
{
541+
"cell_type": "code",
542+
"execution_count": 13,
543+
"metadata": {},
544+
"outputs": [],
545+
"source": [
546+
"def draw_markov_chain_bayesian_network(T=4):\n",
547+
" # 创建一个新的有向图\n",
548+
" dot = Digraph(comment='Markov Chain Bayesian Network')\n",
549+
" \n",
550+
" # 为节点添加颜色属性以便区分不同类型\n",
551+
" state_attr = {'style': 'filled', 'fillcolor': 'lightblue'}\n",
552+
" observation_attr = {'style': 'filled', 'fillcolor': 'yellow'}\n",
553+
" \n",
554+
" # 添加状态节点\n",
555+
" for i in range(T):\n",
556+
" dot.node(f'I{i}', f'I_{i}', **state_attr)\n",
557+
" \n",
558+
" # 添加观察节点\n",
559+
" for j in range(T):\n",
560+
" dot.node(f'O{j}', f'O_{j}', **observation_attr)\n",
561+
" \n",
562+
" # 添加边表示状态转移\n",
563+
" for i in range(T-1):\n",
564+
" dot.edge(f'I{i}', f'I{i+1}', label='A')\n",
565+
" \n",
566+
" # 添加边表示从状态到观察的映射\n",
567+
" for i in range(T):\n",
568+
" dot.edge(f'I{i}', f'O{i}', label='B')\n",
569+
" \n",
570+
" # 渲染图形\n",
571+
" return dot"
572+
]
573+
},
574+
{
575+
"cell_type": "code",
576+
"execution_count": 14,
577+
"metadata": {},
578+
"outputs": [
579+
{
580+
"data": {
581+
"text/html": [
582+
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
583+
],
584+
"text/plain": []
585+
},
586+
"metadata": {},
587+
"output_type": "display_data"
588+
},
589+
{
590+
"data": {
591+
"image/svg+xml": [
592+
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
593+
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
594+
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
595+
"<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
596+
" -->\n",
597+
"<!-- Title: %3 Pages: 1 -->\n",
598+
"<svg width=\"206pt\" height=\"392pt\"\n",
599+
" viewBox=\"0.00 0.00 206.00 392.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
600+
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 388)\">\n",
601+
"<title>%3</title>\n",
602+
"<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-388 202,-388 202,4 -4,4\"/>\n",
603+
"<!-- I0 -->\n",
604+
"<g id=\"node1\" class=\"node\">\n",
605+
"<title>I0</title>\n",
606+
"<ellipse fill=\"#add8e6\" stroke=\"#000000\" cx=\"153\" cy=\"-366\" rx=\"27\" ry=\"18\"/>\n",
607+
"<text text-anchor=\"middle\" x=\"153\" y=\"-362.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">I_0</text>\n",
608+
"</g>\n",
609+
"<!-- I1 -->\n",
610+
"<g id=\"node2\" class=\"node\">\n",
611+
"<title>I1</title>\n",
612+
"<ellipse fill=\"#add8e6\" stroke=\"#000000\" cx=\"99\" cy=\"-279\" rx=\"27\" ry=\"18\"/>\n",
613+
"<text text-anchor=\"middle\" x=\"99\" y=\"-275.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">I_1</text>\n",
614+
"</g>\n",
615+
"<!-- I0&#45;&gt;I1 -->\n",
616+
"<g id=\"edge1\" class=\"edge\">\n",
617+
"<title>I0&#45;&gt;I1</title>\n",
618+
"<path fill=\"none\" stroke=\"#000000\" d=\"M142.5883,-349.2255C134.6696,-336.4677 123.6429,-318.7024 114.6615,-304.2325\"/>\n",
619+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"117.6272,-302.3736 109.3798,-295.723 111.6797,-306.0652 117.6272,-302.3736\"/>\n",
620+
"<text text-anchor=\"middle\" x=\"135\" y=\"-318.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
621+
"</g>\n",
622+
"<!-- O0 -->\n",
623+
"<g id=\"node5\" class=\"node\">\n",
624+
"<title>O0</title>\n",
625+
"<ellipse fill=\"#ffff00\" stroke=\"#000000\" cx=\"171\" cy=\"-279\" rx=\"27\" ry=\"18\"/>\n",
626+
"<text text-anchor=\"middle\" x=\"171\" y=\"-275.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">O_0</text>\n",
627+
"</g>\n",
628+
"<!-- I0&#45;&gt;O0 -->\n",
629+
"<g id=\"edge4\" class=\"edge\">\n",
630+
"<title>I0&#45;&gt;O0</title>\n",
631+
"<path fill=\"none\" stroke=\"#000000\" d=\"M156.7296,-347.9735C159.1672,-336.1918 162.4012,-320.5607 165.1742,-307.1581\"/>\n",
632+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"168.6764,-307.5051 167.2752,-297.0034 161.8216,-306.0868 168.6764,-307.5051\"/>\n",
633+
"<text text-anchor=\"middle\" x=\"168\" y=\"-318.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
634+
"</g>\n",
635+
"<!-- I2 -->\n",
636+
"<g id=\"node3\" class=\"node\">\n",
637+
"<title>I2</title>\n",
638+
"<ellipse fill=\"#add8e6\" stroke=\"#000000\" cx=\"63\" cy=\"-192\" rx=\"27\" ry=\"18\"/>\n",
639+
"<text text-anchor=\"middle\" x=\"63\" y=\"-188.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">I_2</text>\n",
640+
"</g>\n",
641+
"<!-- I1&#45;&gt;I2 -->\n",
642+
"<g id=\"edge2\" class=\"edge\">\n",
643+
"<title>I1&#45;&gt;I2</title>\n",
644+
"<path fill=\"none\" stroke=\"#000000\" d=\"M91.7149,-261.3943C86.6889,-249.2482 79.9107,-232.8674 74.2018,-219.0711\"/>\n",
645+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"77.3319,-217.4815 70.2743,-209.5796 70.8638,-220.158 77.3319,-217.4815\"/>\n",
646+
"<text text-anchor=\"middle\" x=\"89\" y=\"-231.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
647+
"</g>\n",
648+
"<!-- O1 -->\n",
649+
"<g id=\"node6\" class=\"node\">\n",
650+
"<title>O1</title>\n",
651+
"<ellipse fill=\"#ffff00\" stroke=\"#000000\" cx=\"135\" cy=\"-192\" rx=\"27\" ry=\"18\"/>\n",
652+
"<text text-anchor=\"middle\" x=\"135\" y=\"-188.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">O_1</text>\n",
653+
"</g>\n",
654+
"<!-- I1&#45;&gt;O1 -->\n",
655+
"<g id=\"edge5\" class=\"edge\">\n",
656+
"<title>I1&#45;&gt;O1</title>\n",
657+
"<path fill=\"none\" stroke=\"#000000\" d=\"M106.2851,-261.3943C111.3111,-249.2482 118.0893,-232.8674 123.7982,-219.0711\"/>\n",
658+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"127.1362,-220.158 127.7257,-209.5796 120.6681,-217.4815 127.1362,-220.158\"/>\n",
659+
"<text text-anchor=\"middle\" x=\"125\" y=\"-231.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
660+
"</g>\n",
661+
"<!-- I3 -->\n",
662+
"<g id=\"node4\" class=\"node\">\n",
663+
"<title>I3</title>\n",
664+
"<ellipse fill=\"#add8e6\" stroke=\"#000000\" cx=\"27\" cy=\"-105\" rx=\"27\" ry=\"18\"/>\n",
665+
"<text text-anchor=\"middle\" x=\"27\" y=\"-101.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">I_3</text>\n",
666+
"</g>\n",
667+
"<!-- I2&#45;&gt;I3 -->\n",
668+
"<g id=\"edge3\" class=\"edge\">\n",
669+
"<title>I2&#45;&gt;I3</title>\n",
670+
"<path fill=\"none\" stroke=\"#000000\" d=\"M55.7149,-174.3943C50.6889,-162.2482 43.9107,-145.8674 38.2018,-132.0711\"/>\n",
671+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"41.3319,-130.4815 34.2743,-122.5796 34.8638,-133.158 41.3319,-130.4815\"/>\n",
672+
"<text text-anchor=\"middle\" x=\"53\" y=\"-144.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
673+
"</g>\n",
674+
"<!-- O2 -->\n",
675+
"<g id=\"node7\" class=\"node\">\n",
676+
"<title>O2</title>\n",
677+
"<ellipse fill=\"#ffff00\" stroke=\"#000000\" cx=\"99\" cy=\"-105\" rx=\"27\" ry=\"18\"/>\n",
678+
"<text text-anchor=\"middle\" x=\"99\" y=\"-101.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">O_2</text>\n",
679+
"</g>\n",
680+
"<!-- I2&#45;&gt;O2 -->\n",
681+
"<g id=\"edge6\" class=\"edge\">\n",
682+
"<title>I2&#45;&gt;O2</title>\n",
683+
"<path fill=\"none\" stroke=\"#000000\" d=\"M70.2851,-174.3943C75.3111,-162.2482 82.0893,-145.8674 87.7982,-132.0711\"/>\n",
684+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"91.1362,-133.158 91.7257,-122.5796 84.6681,-130.4815 91.1362,-133.158\"/>\n",
685+
"<text text-anchor=\"middle\" x=\"89\" y=\"-144.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
686+
"</g>\n",
687+
"<!-- O3 -->\n",
688+
"<g id=\"node8\" class=\"node\">\n",
689+
"<title>O3</title>\n",
690+
"<ellipse fill=\"#ffff00\" stroke=\"#000000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
691+
"<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">O_3</text>\n",
692+
"</g>\n",
693+
"<!-- I3&#45;&gt;O3 -->\n",
694+
"<g id=\"edge7\" class=\"edge\">\n",
695+
"<title>I3&#45;&gt;O3</title>\n",
696+
"<path fill=\"none\" stroke=\"#000000\" d=\"M27,-86.9735C27,-75.1918 27,-59.5607 27,-46.1581\"/>\n",
697+
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-46.0033 27,-36.0034 23.5001,-46.0034 30.5001,-46.0033\"/>\n",
698+
"<text text-anchor=\"middle\" x=\"32\" y=\"-57.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
699+
"</g>\n",
700+
"</g>\n",
701+
"</svg>\n"
702+
],
703+
"text/plain": [
704+
"\u001b[1m<\u001b[0m\u001b[1;95mgraphviz.graphs.Digraph\u001b[0m\u001b[39m object at \u001b[0m\u001b[1;36m0x7f1a346d7370\u001b[0m\u001b[1m>\u001b[0m"
705+
]
706+
},
707+
"execution_count": 14,
708+
"metadata": {},
709+
"output_type": "execute_result"
710+
}
711+
],
712+
"source": [
713+
"# 调用函数以绘制图表\n",
714+
"draw_markov_chain_bayesian_network()"
715+
]
716+
},
717+
{
718+
"cell_type": "markdown",
719+
"metadata": {},
720+
"source": [
721+
"从图中的贝叶斯网络来看,如果给定了当前的状态 $I_t$,根据贝叶斯网络同父结构和顺序结构的原理 [@Zhou_2016],那么 $O_t$就和其他随机变量都条件无关了。我们从图中就可以直观的感受到李航说的**观测独立性假设**。"
722+
]
723+
},
724+
{
725+
"cell_type": "markdown",
726+
"metadata": {},
727+
"source": [
728+
"#### 给出隐马尔可夫模型的 3 个典型问题的形式化描述。(4 分)\n",
529729
"\n",
530-
" # Add states as nodes\n",
531-
" for state in states:\n",
532-
" dot.node(state, state)\n",
533730
"\n",
534-
" # Add start transitions\n",
535-
" for state in states:\n",
536-
" dot.edge('Start', state, label=f'{start_probability[state]:.2f}')\n",
731+
"参考李航书第十章的内容,隐马尔可夫模型的 3 个典型问题的形式化描述如下 [@LiHang_2022]:\n",
537732
"\n",
538-
" # Add state transitions\n",
539-
" for from_state in states:\n",
540-
" for to_state in states:\n",
541-
" if transition_probability[from_state][to_state] > 0:\n",
542-
" dot.edge(from_state, to_state, label=f'{transition_probability[from_state][to_state]:.2f}')\n",
733+
"> (1) 概率计算问题。给定模型 $\\lambda = (A, B, \\pi)$ 和观测序列 $O = (o_1, o_2, ..., o_T)$,计算在模型 $\\lambda$ 下观测序列 $O$ 出现的概率 $P(O|\\lambda)$。\n",
734+
">\n",
735+
"> (2) 学习问题。已知观测序列 $O = (o_1, o_2, ..., o_T)$,估计模型 $\\lambda = (A, B, \\pi)$ 参数,使得在该模型下观测序列概率 $P(O|\\lambda)$ 最大,即用极大似然估计的方法估计参数。\n",
736+
">\n",
737+
"> (3) 预测问题,也称为解码(decoding)问题。已知模型 $\\lambda = (A, B, \\pi)$ 和观测序列 $O = (o_1, o_2, ..., o_T)$,求对给定观测序列条件概率 $P(I|O)$ 最大的状态序列 $I = (i_1, i_2, ..., i_T)$。即使得给定观测序列,求最有可能对应的状态序列。\n",
543738
"\n",
544-
" # Add emissions\n",
545-
" for state in states:\n",
546-
" for observation in observations:\n",
547-
" dot.edge(state, f'{observation}', label=f'{emission_probability[state][observation]:.2f}', style='dashed')\n",
548739
"\n",
549-
" return dot\n",
740+
"根据李航上一章EM算法的内容,(1)里面的概率也叫做不完全数据(incomplete-data)的似然函数。由于我们有$T$个时间步,还有N个和M个可能的状态和观测,所以怎么高效计算出这个似然概率也是个问题。需要使用前向后向算法, 见回答 @sec:hmm_prob 。\n",
550741
"\n",
551-
"# Define HMM parameters\n",
552-
"states = ['Rainy', 'Sunny']\n",
553-
"observations = ['walk', 'shop', 'clean']\n",
554-
"start_probability = {'Rainy': 0.6, 'Sunny': 0.4}\n",
555-
"transition_probability = {\n",
556-
" 'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},\n",
557-
" 'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},\n",
558-
"}\n",
559-
"emission_probability = {\n",
560-
" 'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},\n",
561-
" 'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},\n",
562-
"}\n",
742+
"由于不完全数据的似然函数比起完全数据的似然函数更难进行极大似然估计,所以问题(2)也是一个很重要的问题。Baum-Welch (HMM的EM算法)就是用来解决这个问题的, 见 @sec:hmm_learn。\n",
563743
"\n",
564-
"# Draw the HMM and save it to a file\n",
565-
"hmm_graph = draw_hmm(states, observations, start_probability, transition_probability, emission_probability)\n",
566-
"hmm_graph.render('hmm_model', format='png', view=True) # This will create hmm_model.png and open it in your default image viewer"
744+
"问题(3)则是HMM的一个重要目的,将隐藏的状态序列I解码出来。需要使用到维特比算法(Viterbi algorithm) , 见 @sec:hmm_decode。"
567745
]
568746
},
569747
{
570748
"cell_type": "markdown",
571749
"metadata": {},
572750
"source": [
573-
"#### 给出隐马尔可夫模型的 3 个典型问题的形式化描述。(4 分)"
751+
"#### 给定隐马尔可夫模型λ,如何计算该模型生成某个观察序列 O 的概率?请给出数学公式及对应的推导。(5 分){#sec:hmm_prob}"
574752
]
575753
},
576754
{
577-
"cell_type": "code",
578-
"execution_count": null,
755+
"cell_type": "markdown",
579756
"metadata": {},
580-
"outputs": [],
581-
"source": []
757+
"source": [
758+
"#### 给定隐马尔可夫模型λ、以及对应的观察序列 O,如何得到该模型产生该观察序列的最优隐含状态的序列 Q?请给出数学公式及对应的推导。(5 分) {#sec:hmm_decode}"
759+
]
582760
},
583761
{
584-
"cell_type": "code",
585-
"execution_count": null,
762+
"cell_type": "markdown",
763+
"metadata": {},
764+
"source": [
765+
"#### 给你一堆观察数据 O1,O2,…,ON,并假设这些观察数据符合隐马尔可夫过程的假设,如何估计出一个最优的隐马尔可夫模型λ,该模型产生这些观察数据的概率最优。(5 分){#sec:hmm_learn}"
766+
]
767+
},
768+
{
769+
"cell_type": "markdown",
770+
"metadata": {},
771+
"source": [
772+
"#### 什么是高斯混合模型?高斯混合模型如何与隐马尔可夫模型结合?(5 分)"
773+
]
774+
},
775+
{
776+
"cell_type": "markdown",
777+
"metadata": {},
778+
"source": [
779+
"#### 请结合相关文献,介绍一个隐马尔可夫模型在语音及自然语言处理领域的最新应用实例,包括但不限于语音识别、语音合成、文本分词等。注意需要解释清楚模型是如何应用到实际任务中的,以及是否存在方法上的改进。(5 分)\n",
780+
"\n"
781+
]
782+
},
783+
{
784+
"cell_type": "markdown",
586785
"metadata": {},
587-
"outputs": [],
588786
"source": []
589787
},
590788
{
591789
"cell_type": "markdown",
592790
"metadata": {},
593791
"source": [
594-
"## 任务二:基于HMM的强制对齐实验\n"
792+
"## 实验任务一:Viterbi 解码算法实现(20 分)"
595793
]
596794
},
795+
{
796+
"cell_type": "markdown",
797+
"metadata": {},
798+
"source": [
799+
"## 任务二:基于HMM的强制对齐实验\n",
800+
"\n",
801+
"#### 成功运行 align 命令\n"
802+
]
803+
},
804+
{
805+
"cell_type": "markdown",
806+
"metadata": {},
807+
"source": []
808+
},
597809
{
598810
"cell_type": "code",
599811
"execution_count": null,
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// Markov Chain Bayesian Network
2+
digraph {
3+
S0 [label=S_0 fillcolor=lightblue style=filled]
4+
S1 [label=S_1 fillcolor=lightblue style=filled]
5+
S2 [label=S_2 fillcolor=lightblue style=filled]
6+
S3 [label=S_3 fillcolor=lightblue style=filled]
7+
O0 [label=O_0 fillcolor=yellow style=filled]
8+
O1 [label=O_1 fillcolor=yellow style=filled]
9+
O2 [label=O_2 fillcolor=yellow style=filled]
10+
S0 -> S1
11+
S0 -> S2
12+
S0 -> S3
13+
S1 -> S0
14+
S1 -> S2
15+
S1 -> S3
16+
S2 -> S0
17+
S2 -> S1
18+
S2 -> S3
19+
S3 -> S0
20+
S3 -> S1
21+
S3 -> S2
22+
S0 -> O0
23+
S0 -> O1
24+
S0 -> O2
25+
S1 -> O0
26+
S1 -> O1
27+
S1 -> O2
28+
S2 -> O0
29+
S2 -> O1
30+
S2 -> O2
31+
S3 -> O0
32+
S3 -> O1
33+
S3 -> O2
34+
}

0 commit comments

Comments
 (0)