Skip to content

Commit 0dee716

Browse files
committed
Updating gitignore
1 parent 9cc2d52 commit 0dee716

File tree

318 files changed

+38128
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

318 files changed

+38128
-2
lines changed

.gitignore

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ __pycache__/
1414
.mypy*
1515

1616

17-
# documentation specific
18-
docs/
1917

2018
# Distribution / packaging
2119

docs/EM_sampler.rst

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
Focus on EM sampler
2+
===================
3+
4+
This method allows the imputation of missing values in multivariate data using a multivariate Gaussian model
5+
via EM algorithm.
6+
7+
Basics of Gaussians
8+
******************
9+
10+
We assume the data :math:`\mathbf{X} \in \mathbb{R}^{n \times p}` follows a
11+
multivariate Gaussian distribution :math:`\mathcal{N}(\mathbf{m}, \mathbf{\Sigma})`.
12+
Hence, the density of :math:`\mathbf{x}` is given by
13+
14+
.. math::
15+
16+
p(\mathbf{x}) = \frac{1}{\sqrt{\det (2 \pi \mathbf{\Sigma})}} \exp \left[-\frac{1}{2} (\mathbf{x} - \mathbf{m})^\top \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{m}) \right]
17+
18+
We define :math:`\Omega := \{ (i,j) \, | \, X_{ij} \text{ is observed} \}`,
19+
and :math:`\Omega_i := \{ j \, | \, X_{ij} \text{ is observed} \}`.
20+
The complementary of these sets are :math:`\Omega^c := \{ (i,j) \, | \, X_{ij} \text{ is missing} \}`
21+
and :math:`\Omega_i^c := \{ j \, | \, X_{ij} \text{ is missing} \}`.
22+
23+
24+
Each row :math:`i` of the matrix represents a time, :math:`1 \leq i \leq n`,
25+
and each column :math:`j` represents a variable, :math:`1 \leq j \leq m`.
26+
27+
Let :math:`\mathbf{x}_i \in \mathbb{R}^p` be an observation, i.e. :math:`\mathbf{x}_i \overset{iid}{\sim} \mathcal{N}_{\mathbf{x}_i}(\mu, \mathbf{\Sigma})`.
28+
We can rearrange the entries of :math:`\mathbf{x}_i` such that we can write
29+
30+
.. math::
31+
32+
\mathbf{x}_i =
33+
\begin{bmatrix}
34+
\mathbf{x}_{i, \Omega} \\
35+
\mathbf{x}_{i, \Omega^c}
36+
\end{bmatrix}
37+
\sim
38+
\mathcal{N}_{\mathbf{x}_i}
39+
\left(
40+
\begin{bmatrix}
41+
\mathbf{\mu}_{\Omega_i} \\
42+
\mathbf{\mu}_{\Omega^c_i}
43+
\end{bmatrix},
44+
\begin{bmatrix}
45+
\mathbf{\Sigma}_{\Omega_i \Omega_i} & \mathbf{\Sigma}_{\Omega_i \Omega^c_i} \\
46+
\mathbf{\Sigma}_{\Omega^c_i \Omega_i} & \mathbf{\Sigma}_{\Omega^c_i \Omega^c_i}
47+
\end{bmatrix}
48+
\right)
49+
50+
Thus formulated, the conditional distributions can be expressed as
51+
52+
.. math::
53+
54+
\begin{array}{l}
55+
p(\mathbf{x}_{i, \Omega^c_i} | \mathbf{x}_{i, \Omega})
56+
= \mathcal{N}_{\mathbf{x}_i}(\tilde{\mu_i}, \tilde{\mathbf{\Sigma}_{i,\Omega_i^c}}) \\
57+
\text{where } \tilde{\mu}_i =
58+
\mu_{\Omega^c_i} + \mathbf{\Sigma}_{\Omega^c_i \Omega_i} \mathbf{\Sigma}^{-1}_{\Omega_i \Omega_i} (\mathbf{x}_{i, \Omega_i} - \mathbf{\mu}_{\Omega_i}) \\
59+
\phantom{\text{ where }} \tilde{\mathbf{\Sigma}}_{i,\Omega_i^c} =
60+
\mathbf{\Sigma}_{\Omega^c_i \Omega^c_i} - \mathbf{\Sigma}_{\Omega^c_i \Omega_i} \mathbf{\Sigma}^{-1}_{\Omega_i \Omega_i} \mathbf{\Sigma}_{\Omega_i \Omega^c_i}
61+
\end{array}
62+
63+
Note, that the covariance matrices are the Schur complement of the block matrix.
64+
65+
66+
Recall also the mean of square forms, i.e.
67+
68+
.. math::
69+
E \left[ (\mathbf{x} - \mathbf{m}')^\top \mathbf{A} (\mathbf{x} - \mathbf{m}') \right] = (\mathbf{m} - \mathbf{m}')^\top \mathbf{A} (\mathbf{m} - \mathbf{m}') + \text{Tr}(\mathbf{A}\mathbf{\Sigma}),
70+
71+
for all square matrices :math:`\mathbf{A}`.
72+
73+
EM algorithm
74+
************
75+
76+
The EM algorithm is an optimisation algorithm that maximises the "expected complete data log likelihood" by some iterative
77+
means under the (conditional) distribution of unobserved components.
78+
In this way it is possible to calculate the statistics of interest.
79+
80+
How it works
81+
------------
82+
83+
We start with a first estimation :math:`\mathbf{\hat{X}}` of :math:`\mathbf{X}`, obtained via a simple
84+
imputation method, i.e. linear interpolation.
85+
86+
the expectation step (or E-step) at iteration *t* computes:
87+
88+
.. math::
89+
90+
\begin{array}{ll}
91+
\mathcal{Q}(\theta \, | \, \theta^{(t)}) &:= &E \left[ \log L(\theta ; \mathbf{X}) \, | \, \mathbf{X}_{\Omega}, \theta^{(t)} \right] \\
92+
& = & \sum_{i=1}^n E \left[ \log L(\theta ; \mathbf{x}_i) \, | \, \mathbf{x}_{i, \Omega_i}, \theta^{(t)} \right].
93+
\end{array}
94+
95+
The maximization step (or M-step) at iteration *t* finds:
96+
97+
.. math::
98+
99+
\theta^{(t+1)} := \underset{\theta}{\mathrm{argmax}} \left\{ \mathcal{Q} \left( \theta \, | \, \theta^{(t)} \right) \right\}.
100+
101+
These two steps are repeated until the parameter estimate converges.
102+
103+
104+
Computation
105+
-----------
106+
107+
At iteration :math:`\textit{t}` with :math:`\theta^{(t)} = (\mu^{(t)}, \mathbf{\Sigma}^{(t)})`, let's
108+
:math:`\mathbf{x}_i \sim \mathcal{N}_p(\mu, \Sigma)`. The expected log likelihhod is equal to
109+
110+
.. math::
111+
112+
\begin{array}{ll}
113+
\mathcal{Q}_i(\theta \, | \, \theta^{(t)})
114+
&=& E \left[ - \frac{1}{2} \log \det \mathbf{\Sigma} - \frac{1}{2} (\mathbf{x}_i - \mu)^\top \mathbf{\Sigma}^{-1} (\mathbf{x}_i - \mu) \, | \, \mathbf{x}_{i, \Omega_i}, \theta^{(t)} \right] \\
115+
&=& - \frac{1}{2} \log \det \mathbf{\Sigma} - \frac{1}{2} \Big(
116+
(\mathbf{x}_{i,\Omega_i} - \mu_{\Omega_i})^\top \mathbf{\Sigma}_{\Omega_i\Omega_i}^{-1} (\mathbf{x}_{i,\Omega_i} - \mu_{\Omega_i})
117+
\\
118+
&& \phantom{- \frac{1}{2}} +
119+
2 (\mathbf{x}_{i,\Omega_i} - \mu_{\Omega_i})^\top \mathbf{\Sigma}_{\Omega_i\Omega^c_i}^{-1} E \left[ \mathbf{x}_{i,\Omega^c_i} - \mu_{\Omega^c_i} \, | \, \mathbf{x}_{i, \Omega_i}, \theta^{(t)} \right]
120+
\\
121+
&& \phantom{- \frac{1}{2}} +
122+
E \left[ (\mathbf{x}_{i,\Omega^c_i} - \mu_{\Omega^c_i})^\top \mathbf{\Sigma}_{\Omega^c_i\Omega^c_i}^{-1} (\mathbf{x}_{i,\Omega^c_i} - \mu_{\Omega^c_i}) \, | \, \mathbf{x}_{i, \Omega_i}, \theta^{(t)} \right]
123+
\Big) \\
124+
&=& - \frac{1}{2} \log \det \mathbf{\Sigma}
125+
- \frac{1}{2} \Big(
126+
(\mathbf{x}_{i,\Omega_i} - \mu_{\Omega_i})^\top \mathbf{\Sigma}_{\Omega_i\Omega_i}^{-1} (\mathbf{x}_{i,\Omega_i} - \mu_{\Omega_i})
127+
\\
128+
&& \phantom{- \frac{1}{2}} +
129+
2 (\mathbf{x}_{i,\Omega_i} - \mu_{\Omega_i})^\top \mathbf{\Sigma}_{\Omega_i\Omega^c_i}^{-1} (\tilde{\mu}_{i}^{(t)} - \mu_{\Omega^c_i})
130+
\\
131+
&& \phantom{- \frac{1}{2}} +
132+
(\tilde{\mu}_{i}^{(t)} - \mu_{\Omega^c_i})^\top \mathbf{\Sigma}^{-1}_{\Omega_i^c\Omega_i^c} (\tilde{\mu}_{i}^{(t)} - \mu_{\Omega^c_i})
133+
\\
134+
&& \phantom{- \frac{1}{2}} +
135+
\text{Tr} \left( \mathbf{\Sigma}^{-1}_{\Omega_i^c\Omega_i^c} \tilde{\mathbf{\Sigma}}_{i,\Omega_i^c}^{(t)} \right)
136+
\Big) \\
137+
&=& - \frac{1}{2} \log \det \mathbf{\Sigma}
138+
- \frac{1}{2} \left[
139+
(\hat{\mathbf{x}}_{i}^{(t)} - \mu)^\top \mathbf{\Sigma}^{-1} (\hat{\mathbf{x}}_{i}^{(t)} - \mu)
140+
+ \text{Tr} \left( \mathbf{\Sigma}^{-1}_{\Omega_i^c\Omega_i^c} \tilde{\mathbf{\Sigma}}_{i,\Omega_i^c}^{(t)} \right)
141+
\right]
142+
\end{array}
143+
144+
where :math:`\hat{\mathbf{x}}_{i}^{(t)} = [\hat{x}_{i1}^{(t)}, ..., \hat{x}_{ip}^{(t)}]`
145+
is the data point such that :math:`\mathbf{x}_{i\Omega_i^c}^{(t)}` is replaced by :math:`\tilde{\mu}_{i}^{(t)}`.
146+
147+
And finally, one has
148+
149+
.. math::
150+
151+
\mathcal{Q}(\theta \, | \, \theta^{(t)}) = \sum_{i=1}^n \mathcal{Q}_i(\theta \, | \, \theta^{(t)})
152+
153+
154+
For the M-step, one has to find :math:`\theta` that maximises the previous expression. Since it is concave, it suffices
155+
to zeroing the derivatives.
156+
For the mean, one has
157+
158+
.. math::
159+
160+
\begin{array}{l}
161+
\nabla_{\mu} \mathcal{Q}(\theta \, | \, \theta^{(t)})
162+
&= -\frac{1}{2} \sum_{i=1}^n \nabla_{\mu} (\hat{\mathbf{x}}_{i}^{(t)} - \mu)^\top \mathbf{\Sigma}^{-1} (\hat{\mathbf{x}}_{i}^{(t)} - \mu) \\
163+
&= \mathbf{\Sigma}^{-1} \sum_{i=1}^n (\hat{\mathbf{x}}_{i}^{(t)} - \mu) \\
164+
&= 0.
165+
\end{array}
166+
167+
Therefore
168+
169+
.. math::
170+
171+
\mu^{(t+1)} = \frac{1}{n} \sum_{i=1}^n \hat{\mathbf{x}}_{i}^{(t)}.
172+
173+
For the variance, one has
174+
175+
.. math::
176+
177+
\begin{array}{ll}
178+
\nabla_{\Sigma^{-1}} \mathcal{Q}(\theta \, | \, \theta^{(t)})
179+
&=& \frac{1}{2} \sum_{i=1}^n \nabla_{\Sigma^{-1}} \log \det \Sigma^{-1}
180+
\\
181+
&& \phantom{\frac{1}{2}}
182+
- \nabla_{\Sigma^{-1}} \text{Tr} \left( \mathbf{\Sigma}^{-1}_{\Omega_i^c\Omega_i^c} \tilde{\mathbf{\Sigma}}_i^{(t)} \right)
183+
\\
184+
&& \phantom{\frac{1}{2}}
185+
- \frac{1}{2} \sum_{i=1}^n \nabla_{\Sigma^{-1}} (\hat{\mathbf{x}}_{i}^{(t)} - \mu)^\top \mathbf{\Sigma}^{-1} (\hat{\mathbf{x}}_{i}^{(t)} - \mu) \\
186+
&=& \frac{1}{2} \left[n \mathbf{\Sigma} - \sum_{i=1}^n \tilde{\mathbf{\Sigma}}_i^{(t)} \right]
187+
- \frac{1}{2} \sum_{i=1}^n (\hat{\mathbf{x}}_{i}^{(t)} - \mu) (\hat{\mathbf{x}}_{i}^{(t)} - \mu)^\top \\
188+
&=& 0
189+
\end{array}
190+
191+
where :math:`\tilde{\mathbf{\Sigma}}_i^{(t)}` is the :math:`p \times p` matrix having zero everywhere
192+
expect the :math:`\Omega_i^c\Omega_i^c` block replaced by :math:`\tilde{\mathbf{\Sigma}}_{i,\Omega_i^c}^{(t)}`.
193+
194+
Therefore
195+
196+
.. math::
197+
198+
\mathbf{\Sigma}^{(t+1)} = \frac{1}{n} \sum_{i=1}^n \left[ (\hat{\mathbf{x}}_{i}^{(t)} - \mu) (\hat{\mathbf{x}}_{i}^{(t)} - \mu)^\top + \tilde{\mathbf{\Sigma}}_i^{(t)} \right].
199+
200+
We can test the convergence of the algorithm by checking some sort of metric between
201+
two consecutive estimates of the means or the covariances
202+
(it is assumed to converge since the sequences are Cauchy).
203+
204+
Thus, at each iteration, the missing values are replaced either by their corresponding mean or by smapling from
205+
a multivarite normal distribution with fitted mean and variance.
206+
The resulting imputed data is the final imputed array, obtained at convergence.
207+
208+
209+
210+
Multivariate time series
211+
************************
212+
213+
To explicitely take into account the temporal aspect of the data
214+
(temporal correlations), we construct an extended matrix :math:`\mathbf{X}^{ext}`
215+
by considering the shifted columns, i.e.
216+
:math:`\mathbf{X}^{ext} := [\mathbf{X}, \mathbf{X}^{s+1}, \mathbf{X}^{s-1}]` where
217+
:math:`\mathbf{X}^{s+1}` (resp. :math:`\mathbf{X}^{s-1}`) is the :math:`\mathbf{X}` matrix
218+
where all columns are shifted +1 for one step backward in time (resp. -1 for one step forward in time).
219+
The covariance matrix :math:`\mathbf{\Sigma}^{ext}` is therefore richer in information since the presence of additional
220+
(temporal) correlations.
221+
222+
.. image:: images/extended_matrix.png
223+
224+
225+
226+
227+
References
228+
**********
229+
[1] Borman, Sean. "The expectation maximization algorithm-a short tutorial." Submitted for publication 41 (2004).
230+
(`pdf <https://www.lri.fr/~sebag/COURS/EM_algorithm.pdf>`__)
231+
232+
[2] https://joon3216.github.io/research_materials.html

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

0 commit comments

Comments
 (0)