You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: manuscript/manuscript.tex
+12-14Lines changed: 12 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -86,18 +86,18 @@ \section{Introduction}
86
86
87
87
\subsection{Scope}
88
88
89
-
In this tutorial, we assume that the reader is familiar with MD simulation and standard analysis of MD simulations of peptides and proteins, such as computation of torsion angles and distances (see~\cite{dror2012biomolecular} for a review).
89
+
In this tutorial, we assume that the reader is familiar with MD simulation and standard analysis of MD simulations of peptides and proteins, such as computation of torsion angles and distances (see Ref.~\cite{dror2012biomolecular} for a review on the MD simulation of biomolecules, and Ref.~\cite{mdtutorial} for a tutorial on MD simulations).
90
90
91
91
We further assume that the reader is familiar with the basic ideas and theory underlying Markov modeling and will only give a brief reminder of the basic concepts in Section 2.
92
92
93
-
For those seeking further resources, ``\emph{Markov State Models: From an Art to a Science}''~\cite{msm-brooke} provides a recent overview,
94
-
while ``\emph{Markov models of molecular kinetics: Generation and validation}''~\cite{msm-jhp} describes the basic MSM theory and methodology in detail.
93
+
For those seeking further resources, the recent perspective ``\emph{Markov State Models: From an Art to a Science}''~\cite{msm-brooke} provides a timeline of methods advances with relevant citations,
94
+
while ``\emph{Markov models of molecular kinetics: Generation and validation}''~\cite{msm-jhp} describes the basic MSM theory and methodology and provides the underlying mathematics in detail.
95
95
Additionally, two textbooks have been published that focus on computational methods and applications~\cite{msm-book} and mathematical theory~\cite{schuette-sarich-book}.
96
96
97
97
In addition to publications on the theory and application of Markov state modeling~\cite{schuette-msm,buchete-msm-2008,noe-tmat-sampling,bowman-msm-2009,noe-folding-pathways,sarich-msm-quality,noe-fingerprints,noe-dy-neut-scatt,Chodera2014,ben-rev-msm,simon-mech-mod-nmr,oom-feliks,simon-amm},
98
-
we also recommend the literature on TICA~\cite{tica,tica3,tica2,kinetic-maps},
98
+
we also recommend the literature on TICA~\cite{tica,tica3,kinetic-maps,tica2},
99
99
transition path theory (TPT)~\cite{weinan-tpt,metzner-msm-tpt},
100
-
hidden Markov state models (HMMs)~\cite{noe-proj-hid-msm,hmm-baum-welch-alg,hmm-tutorial,jhp-spectral-rate-theory,bhmm-preprint},
100
+
hidden Markov state models (HMMs)~\cite{noe-proj-hid-msm,jhp-spectral-rate-theory,bhmm-preprint},
101
101
and variational techniques~\cite{noe-vac,vamp-preprint,gmrq},
102
102
as these topics play important roles within the standard MSM workflow.
103
103
@@ -183,7 +183,7 @@ \subsection{Variational approach and TICA}
183
183
\begin{itemize}
184
184
\item Featurization -- The Cartesian coordinates characterizing each frame of the MD trajectory are transformed into an intuitive basis such as the protein's dihedral angles or contact distance pairs.
185
185
\item Dimensionality reduction -- Optionally, a basis set transformation can be performed that produces a linear (or nonlinear) combination of the features in the previous step.
186
-
Frequently, time-lagged independent component analysis (TICA)~\cite{tica,tica3,tica2,kinetic-maps} is used to transform the features into a set of slow coordinates.
186
+
Frequently, time-lagged independent component analysis (TICA)~\cite{tica,tica3,kinetic-maps} is used to transform the features into a set of slow coordinates.
187
187
\item Clustering -- This is the step at which the state decomposition occurs.
188
188
The features or TICs are grouped into a set of states using a clustering algorithm such as $k$-means.
189
189
\item Transition matrix approximation -- At this stage, transitions are counted at a pre-specified lag time, and the estimation and validation described in the previous section are performed.
@@ -259,7 +259,7 @@ \subsection{Hidden Markov state models}
259
259
We illustrate this point in notebook~07.
260
260
261
261
An alternative, which is much less sensitive to poor discretization,
262
-
is to estimate a hidden Markov model (HMM)~\cite{hmm-baum-welch-alg,hmm-tutorial,jhp-spectral-rate-theory,noe-proj-hid-msm,bhmm-preprint}.
262
+
is to estimate a hidden Markov model (HMM)~\cite{hmm-baum-welch-alg,jhp-spectral-rate-theory,noe-proj-hid-msm,bhmm-preprint}.
263
263
HMMs are less sensitive to the discretization error as they sidestep the assumption of Markovian dynamics in the discretized space (illustrated in Fig.~\ref{fig:hmm-scheme}).
264
264
Instead, HMMs assume that there is an underlying (hidden) dynamic process which is Markovian
265
265
and gives rise to our observed data, e.g., the ($n$~states) discretized trajectories $s(t)$.
@@ -272,17 +272,15 @@ \subsection{Hidden Markov state models}
272
272
and a row-stochastic matrix ($\bm{\chi}$) of probabilities $\chi\left( s \middle| \tilde{s} \right)$
273
273
to emit the discrete state $s$ conditional on being in the hidden state $\tilde{s}$.
274
274
275
-
We can further compute a reversal of the emission matrix $\bm{\chi}\in\mathbb{R}^{m \times n}$:
276
-
the membership matrix $\mathbf{M}\in\mathbb{R}^{n \times m}$ which encodes
277
-
a fuzzy assignment of each of the $n$ observable microstates $s$ to the $m$ hidden states $\tilde{s}$ and,
278
-
thus, defines the \emph{coarse graining} of microstate.
279
-
280
275
An HMM estimation always yields a model with a small number of (hidden) states
281
276
where each state is considered to be metastable and,
282
277
thus, the number of hidden states is a new hyper-parameter which needs to be chosen carefully (see notebook~07).
283
278
As the HMMs---like MSMs---approximate the full phase-space dynamics,
284
279
we can similarly compute the metastable kinetics, apply TPT, visualize the network, and obtain physical observables.
285
280
281
+
For an extensive discussion of details about HMM properties and the estimation algorithm in general, we suggest Ref.~\cite{hmm-tutorial}.
282
+
For its specific application to the discretization of MSMs using HMMs, we suggest Ref.~\cite{noe-proj-hid-msm}. A generalized extension for estimating this type of low-dimensional projection from the data is given in Ref.~\cite{wu2015projected}.
283
+
286
284
\subsection{Software and installation}
287
285
288
286
We utilize Jupyter~\cite{jupyter} notebooks to show code examples along with figures and interactive widgets to display molecules.
Subsequently, we perform TICA~\cite{tica,kinetic-maps} in order to reduce the dimension from the feature space,
390
+
Subsequently, we perform TICA~\cite{tica,tica3,kinetic-maps} in order to reduce the dimension from the feature space,
393
391
which typically contains many degrees of freedom,
394
392
to a lower dimensional space that can be discretized with higher resolution and better statistical efficiency.
395
393
TICA is a special case of the variational principle~\cite{noe-vac,nueske-vamk} and is designed to find a projection preserving the long-timescale dynamics in the dataset.
@@ -595,7 +593,6 @@ \subsection{Modeling large systems}
595
593
596
594
Additional technical challenges for large systems include high demands on memory and computation time;
597
595
we explain how to deal with those in the tutorials (notebook~01).
598
-
599
596
More details on how to model complex systems with the techniques presented here are described, e.g., by~\cite{plattner_protein_2015,plattner_complete_2017}.
600
597
We further examine some symptoms that may indicate problematic or difficult datasets, and demonstrate how to deal with them in notebook~08.
601
598
@@ -612,6 +609,7 @@ \subsection{Advanced Methods}
612
609
MEMMs consequently enable users to combine enhanced sampling methods such as umbrella sampling or replica exchange
613
610
with conventional molecular dynamics simulations to more efficiently study rare event kinetics~\cite{trammbar}.
614
611
MEMMs are implemented in PyEMMA.
612
+
Since the many publications associated with the development of these methods are beyond the scope of this tutorial, we refer the reader to Sec.~8.3 of Ref.~\cite{msm-brooke} and the references therein.
615
613
616
614
Another issue often faced during Markov state modeling is a lack of quantitative agreement with complementary experimental data.
617
615
This issue is not intrinsic to the Markov state modeling approach as such,
0 commit comments