Skip to content

Commit b18582f

Browse files
authored
Merge pull request #162 from markovmodel/sol_rev
Simon revision
2 parents f83b31c + 8cfd3f3 commit b18582f

File tree

2 files changed

+54
-18
lines changed

2 files changed

+54
-18
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,8 @@ ENV/
102102

103103
# mypy
104104
.mypy_cache/
105+
106+
#OSX stuff
107+
*.DS_Store
108+
manuscript/manuscript.suppinfo
109+
manuscript/manuscript.pdf

manuscript/manuscript.tex

Lines changed: 49 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -125,25 +125,39 @@ \subsection{Essential theory}
125125
\noindent{}where the left-hand side of the equation corresponds to an MSM estimated at lag time $k\tau$, where $k$ is an integer larger than~1, whereas the right-hand side of the equation is our estimated MSM transition probability matrix to the $k^\textrm{th}$ power.
126126
By assessing how well the approximated transition probability matrix adheres to the CK property, we can validate the appropriateness of the Markovian assumption for the model.
127127

128-
Once validated, the transition matrix can be decomposed into eigenvectors and eigenvalues,
128+
Once validated, the transition matrix can be decomposed into left eigenvectors and eigenvalues,
129129

130130
\begin{equation}
131-
\label{eq:transmat}
132-
T(\tau) \circ \phi_i = \lambda_i \phi_i,
131+
\label{eq:spectral_left}
132+
\phi_i^\top T(\tau) = \lambda_i(\tau) \phi_i^\top,
133133
\end{equation}
134134

135-
\noindent{}where the eigenvalues are indexed in decreasing order. The highest eigenvalue, $\lambda_1$, is unique and is equal to $1$, and its corresponding left eigenvector $\phi_1$ corresponds to the stationary distribution of the system.
136-
The right eigenvector $\psi_1$ is a vector consisting of~$1$'s.
137-
The subsequent eigenvalues $\lambda_{i>1}$ are real with absolute values less than~$1$ and correspond to dynamical processes within the system.
138-
The right eigenvectors $\psi_i$ each represent a dynamical process (for $i>1$), and the coefficients of the eigenvectors represent the flux into and out of the MSM states that characterizes that process.
139-
The corresponding left eigenvectors $\phi_i$ contain the same information weighted by the stationary distribution.
135+
\noindent{}or equivalently into their right eigenvectors and eigenvalues,
140136

141-
The timescale of a given dynamical process is a function of the relevant eigenvalue and the lag time at which the model was defined,
137+
\begin{equation}
138+
\label{eq:spectral_right}
139+
T(\tau)\psi_i = \lambda_i(\tau) \psi_i,
140+
\end{equation}
141+
142+
143+
\noindent{}where the eigenvalue-eigenvector pairs are indexed in decreasing order.
144+
The eigenvalues are the same in both cases, however, the left and right eigenvectors are related to each other as
142145

143146
\begin{equation}
144-
\label{eq:timescales}
145-
t_i \equiv -\frac{\tau}{\log(|\lambda_i|)}.
147+
\phi_i = \pi \circ \psi_i,
148+
\label{eq:left-right-eigenvalue-relation}
146149
\end{equation}
150+
\noindent{}where $\pi$ is the \emph{stationary distribution} of the MSM, and $\circ$ corresponds to an element-wise vector product.
151+
152+
The highest eigenvalue, $\lambda_1(\tau)$, is unique and is equal to $1$, and its corresponding left eigenvector $\phi_1$ corresponds to the stationary distribution, $\pi$.
153+
From the relationship between the left and right eigenvectors (eq.~\ref{eq:left-right-eigenvalue-relation}) we see that the right eigenvector $\psi_1$ is a vector consisting of~$1$'s.
154+
155+
The subsequent eigenvalues $\lambda_{i>1}(\tau)$ are real with absolute values less than~$1$ and are related to the \emph{characteristic} or \emph{implied} timescales of dynamical processes within the system (eq.~\ref{eq:its}).
156+
157+
The right eigenvectors $\psi_i$ each encode a dynamical process (for $i>1$), corresponding to the characteristic time-scale, $t_i$.
158+
The coefficients of the eigenvectors represent the flux into and out of the Markov states that characterizes that process.
159+
Again, as can be seen from (eq.~\ref{eq:left-right-eigenvalue-relation}) the corresponding left eigenvectors $\phi_i$ contain the same information weighted by the stationary distribution.
160+
147161

148162
\subsection{MSM construction the variational approach}
149163
\label{sec:construction}
@@ -264,7 +278,7 @@ \subsection{Feature selection}
264278
\includegraphics{figure_3}
265279
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
266280
(a)~The convergence behavior of the implied timescales associated with the four slowest processes.
267-
(b)~Chapman-Kolmogorov test computed using an MSM estimated with lag time $\tau=0.5$~ns assuming~5 meta-stable states..
281+
(b)~Chapman-Kolmogorov test computed using an MSM estimated with lag time $\tau=0.5$~ns assuming~5 metastable states.
268282
The solid lines in (a) refer to the maximum likelihood result while the dashed lines show the ensemble mean computed with a Bayesian sampling procedure~\cite{ben-rev-msm}.
269283
The black line indicates where implied timescales are equal to the lag time, whereas the grey area indicates all implied timescales faster than the lag time.
270284
In both panels, the (non-grey) shaded areas indicate~$95\%$ confidence intervals computed with the aforementioned Bayesian sampling procedure.}
@@ -278,7 +292,11 @@ \subsection{Feature selection}
278292

279293
Here, we utilize the VAMP-2 score, which maximizes the kinetic variance contained in the features~\cite{kinetic-maps}.
280294
We should always evaluate the score in a cross-validated manner to ensure that we neither include too few features (under-fitting) or too many features (over-fitting)~\cite{gmrq,vamp-preprint}.
281-
To choose among three different molecular features relevant to protein structure, we compute the (cross-validated) VAMP-2 score at a lag time of~$0.5$~ns.
295+
To choose among three different molecular features reflecting protein structure, we compute the (cross-validated) VAMP-2 score (Notebook 00).
296+
Although we cannot MSM optimize lag times with a variational score\cite{husic2017note}, such as VAMP-2, it is important to ensure that properties that we optimize are robust as a function of lag time.
297+
Consequently, we compute the VAMP-2 score at several lag times (Notebook 00).
298+
We find that the relative rankings of the different molecular features are highly robust as a function of lag time.
299+
We show one example of this ranking and the absolute VAMP-2 scores for lag time~$0.5$~ns in Fig.~\ref{fig:io-to-tica}b.
282300
We find that backbone torsions contain more kinetic variance than the backbone heavy atom positions or the distances between them (Fig.~\ref{fig:io-to-tica}b).
283301
This suggests that backbone torsions are the best of the options evaluated for MSM construction.
284302

@@ -351,10 +369,11 @@ \subsection{Analyzing the MSM}
351369
\end{equation}
352370
where $\pi_j$ denotes the MSM stationary weight of the $j^\textrm{th}$ microstate.
353371

354-
In order to interpret the slowest relaxation timescales, we refer to the (right) eigenvectors of the MSM as they contain information about what configurational changes are happening and their timescales.
372+
In order to interpret the slowest relaxation timescales, we refer to the (right) eigenvectors, as they are independent of the stationary distribution.
373+
This enables us to specifically study what conformational changes are happening on a particular time scale independently of the equilbrium distribution.
355374
The first right eigenvector corresponds to the stationary process and its eigenvalue is the Perron eigenvalue~$1$.
356-
The second right eigenvector, however, corresponds to the slowest process
357-
(the eigenvector components are real because of the detailed balance constraint enforced during MSM estimation).
375+
The second right eigenvector, on the other hand, corresponds to the slowest process in the system.
376+
Note that the eigenvectors are real as detailed balance has been enforced during MSM estimation.
358377
The minimal and maximal components of the second right eigenvector indicate the microstates between which the process shifts probability density.
359378
The relaxation timescale of this exchange process is exactly the corresponding implied timescale, which can be computed from its corresponding eigenvalue using~\eqref{eq:its}.
360379
In the projection onto the first two TICA components, we identify the slowest MSM process as a probability shift between macrostate $\mathcal{S}_1$ and the rest of the system, with macrostates $\mathcal{S}_4$ and $\mathcal{S}_5$ in particular (Fig.~\ref{fig:msm-analysis}c).
@@ -435,8 +454,20 @@ \subsection{Modeling large systems}
435454

436455
\subsection{Advanced Methods}
437456

438-
While the present tutorial is intended to cover Markov State Modeling 101, we encourage the user to explore other, more recent extensions of the methodology.
439-
Multi-ensemble Markov models (MEMMs)~\cite{dtram,tram} can be used to combine unbiased and biased simulations so as to probe kinetics of very rare events~\cite{trammbar}; MEMMs are implemented in PyEMMA.
457+
The present tutorial presents the basics of modern Markov state modeling with PyEMMA.
458+
However, recent years have seen many extensions of the methodology --- many of which are available within PyEMMA.
459+
We encourage interested readers to look into these methods in the software documentation and to make use of the specific Jupyter notebooks distributed with PyEMMA.
460+
461+
Conventional Markov state modeling often relies on large simulation datasets to ensure proper convergence of thermodynamic and kinetic properties.
462+
In one extension, Multi-ensemble Markov models (MEMMs)~\cite{dtram,tram}, we can integrate unbiased and biased simulations in a systematic manner to speed up the convergence.
463+
MEMMs consequently enable users to combine enhanced sampling methods such as umbrella sampling or replica exchange with conventional molecular dynamics simulations to more efficiently study rare event kinetics~\cite{trammbar}.
464+
MEMMs are implemented in PyEMMA.
465+
466+
Another issue often faced during Markov state modeling is a lack of quantitative agreement with complementary experimental data.
467+
This issue is not intrinsic to the Markov state modeling approach as such, but rather associated with systematic errors in the force field model used to conduct the simulation.
468+
Nevertheless, using Augmented Markov models (AMM) it is possible to build an integrative MSM which balances experimental and simulation data, taking into account their respective uncertainties~\cite{simon-amm}.
469+
AMMs are implemented in PyEMMA.
470+
440471
Recently, there have been steps towards replacing the traditional user-directed pipeline (involving featurizing, reducing dimension, discretizing, MSM estimation and coarse-graining) by a single end-to-end deep learning method such as VAMPnets~\cite{vampnet}.
441472
Other deep learning methods for performing the dimension reduction~\cite{tae}, finding reaction coordinates for enhanced sampling~\cite{hernandez-vde,Sultan2018-vde-enhanced-sampling,Ribeiro2018-rave}, and generative MSMs~\cite{deep-gen-msm-preprint} have been put forward and are likely to spawn an active field of research on its own right.
442473
Implementations of some of these methods are available or are under development in the deeptime package \url{github.com/markovmodel/deeptime}.

0 commit comments

Comments
 (0)