Skip to content

Commit c495445

Browse files
committed
fixes #142
1 parent 298af56 commit c495445

File tree

1 file changed

+21
-12
lines changed

1 file changed

+21
-12
lines changed

manuscript/manuscript.tex

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@ \subsection{The PyEMMA workflow}
236236
\item coarse-graining the MSM using a hidden Markov model approach (07).
237237
\end{itemize}
238238

239-
For the remainder of this manuscript we will walk through the first notebook (00). In notebook 00 we analyze a dataset of the Trp-Leu-Ala-Leu-Leu pentapeptide (Fig.~\ref{fig:io-to-ck}a), consisting of $25$ independent MD trajectories conducted in implicit solvent with frames saved at an interval of $0.1$~ns. We present the results obtained in the notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
239+
For the remainder of this manuscript we will walk through the first notebook (00). In notebook 00 we analyze a dataset of the Trp-Leu-Ala-Leu-Leu pentapeptide (Fig.~\ref{fig:io-to-tica}a), consisting of $25$ independent MD trajectories conducted in implicit solvent with frames saved at an interval of $0.1$~ns. We present the results obtained in the notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
240240
The figures that will be displayed in the following are created in the showcase notebook (00) and can be easily reproduced.
241241

242242
\subsection{Feature selection}
@@ -246,11 +246,19 @@ \subsection{Feature selection}
246246
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone: (a)~The Trp-Leu-Ala-Leu-Leu pentapeptide in licorice representation~\cite{vmd}.
247247
(b)~The VAMP-2 score indicates which of the tested featurizations contains the highest kinetic variance.
248248
(c)~The sample density projected onto the first two time-lagged independent components (ICs) at lag time $\tau=0.5$ ns shows multiple density maxima and
249-
(d)~the time series of the first two ICs show rare transition events.
250-
(e)~The convergence behavior of the first four implied timescales indicates that a lag time of $\tau=0.5$ ns is suitable for MSM estimation.
251-
(f) A Chapman-Kolmogorov test shows that an MSM estimated at lag time $\tau=0.5$ ns under the assumption of five metastable states accurately predicts the kinetic behavior on longer timescales.
252-
In~(e) and~(f), the shaded areas indicate $95\%$ confidence intervals computed with a Bayesian sampling procedure.}
253-
\label{fig:io-to-ck}
249+
(d)~the time series of the first two ICs show rare transition events.}
250+
\label{fig:io-to-tica}
251+
\end{figure}
252+
253+
\begin{figure}
254+
\includegraphics{figure_3}
255+
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
256+
(a)~The convergence behavior of the first four implied timescales indicates that a lag time of $\tau=0.5$ ns is suitable for MSM estimation.
257+
(b) A Chapman-Kolmogorov test shows that an MSM estimated at lag time $\tau=0.5$ ns under the assumption of five metastable states accurately predicts the kinetic behavior on longer timescales.
258+
The solid lines in (a) refer to the maximum likelihood result while the dashed lines show the ensemble mean computed with a Bayesian sampling scheme.
259+
The black line indicates the timescale cutoff and, thus, all ITS withion the grey-shaded area are not necessarily resolved.
260+
In both panels, the (non-grey) shaded areas indicate $95\%$ confidence intervals computed with a Bayesian sampling procedure.}
261+
\label{fig:its-and-ck}
254262
\end{figure}
255263

256264
In Markov state modeling our objective is to model the slow dynamics of a molecular process. In order to approximate the slow dynamics in a statistically efficient manner, a lower dimensional representation of our simulation data is necessary.
@@ -260,15 +268,15 @@ \subsection{Feature selection}
260268

261269
Here, we utilize the VAMP-2 score, which maximizes the kinetic variance contained in the features~\cite{kinetic-maps}.
262270
We should always evaluate the score in a cross-validated manner to ensure that we neither include too few features (under-fitting) or too many features (over-fitting)~\cite{gmrq,vamp-preprint}.
263-
To choose among three different molecular features relevant to protein structure, we compute the (cross-validated) VAMP-2 score at a lag time of $0.5$~ns and find that backbone torsions contain more kinetic variance than the backbone's heavy atom positions or the distances between them (Fig.~\ref{fig:io-to-ck}b).
271+
To choose among three different molecular features relevant to protein structure, we compute the (cross-validated) VAMP-2 score at a lag time of $0.5$~ns and find that backbone torsions contain more kinetic variance than the backbone's heavy atom positions or the distances between them (Fig.~\ref{fig:io-to-tica}b).
264272

265273
We note that deep learning approaches for feature selection have recently been developed that may eventually replace the feature selection step~\cite{vampnet,tae,hernandez-vde}.
266274

267275
\subsection{Dimensionality reduction}
268276

269277
Subsequently, we perform TICA~\cite{tica,kinetic-maps} in order to reduce the dimension from the feature space, which typically contains many degrees of freedom, to a lower dimensional space that can be discretized with higher resolution and better statistical efficiency. TICA is a special case of the variational principle~\cite{noe-vac,nueske-vamk} and is designed to find a projection preserving the long-timescale dynamics in the dataset. Here, performing TICA on the backbone torsions at lag time $0.5$ ns yields a four dimensional subspace using a $95\%$ kinetic variance cutoff (note that we perform a $\cos/\sin$-transformation of the torsions before TICA in order to preserve their periodicity).
270-
The sample density projected onto the first two independent components (ICs) exhibits several maxima (Fig.~\ref{fig:io-to-ck}c).
271-
Discrete jumps between the maxima can be observed by visualizing the transformation of the first trajectory into these ICs (Fig.~\ref{fig:io-to-ck}d).
278+
The sample density projected onto the first two independent components (ICs) exhibits several maxima (Fig.~\ref{fig:io-to-tica}c).
279+
Discrete jumps between the maxima can be observed by visualizing the transformation of the first trajectory into these ICs (Fig.~\ref{fig:io-to-tica}d).
272280
We thus assume that our TICA-transformed backbone torsion features describe one or more metastable processes.
273281

274282
\subsection{Discretization}
@@ -282,7 +290,7 @@ \subsection{MSM estimation and validation}
282290
% t_i = \frac{-\tau}{\ln\left|\lambda_i(\tau)\right|}.
283291
% \end{equation}
284292
A necessary condition for Markovian dynamics in our reduced space is that the ITS are approximately constant as a function of $\tau$; accordingly, we chose the smallest possible $\tau$ which fulfills this condition within the model uncertainty. The uncertainty bounds are computed using a Bayesian scheme~\cite{ben-rev-msm,noe-tmat-sampling} with $100$ samples.
285-
In our example, we find that the four slowest ITS converge quickly and are constant within a $95\%$ confidence interval for lag times above $0.5$~ns (Fig.~\ref{fig:io-to-ck}e). Using this lag time we can now estimate a (Bayesian) MSM with $\tau=0.5$~ns.
293+
In our example, we find that the four slowest ITS converge quickly and are constant within a $95\%$ confidence interval for lag times above $0.5$~ns (Fig.~\ref{fig:its-and-ck}a). Using this lag time we can now estimate a (Bayesian) MSM with $\tau=0.5$~ns.
286294

287295
To test the validity of our MSM we perform a Chapman-Kolmogorov (CK) test.
288296
% The CK test compares the right and the left side of the Chapman-Kolmogorov equation
@@ -293,8 +301,9 @@ \subsection{MSM estimation and validation}
293301
% where $T$ is the MSM transition matrix. The left-hand side of the equation corresponds to an MSM estimated at lag time $k\tau$, where $k$ is an integer larger than 1, whereas the right-hand side of the equation is our estimated MSM to the $k^\textrm{th}$ power.
294302
Visualizing the full transition probability matrix $T$ is difficult; we therefore coarse-grain $T$ into a smaller number of metastable states before performing the test.
295303
An appropriate number of metastable states can be chosen by identifying a relatively large gap in the ITS plot.
296-
For this analysis, we chose 5 metastable states.
297-
The CK test confirms that this is an appropriate choice and shows that the MSM we have estimated at lag time $\tau=0.5$~ns indeed predicts the long-timescale behavior of our system within error (Fig.~\ref{fig:io-to-ck}f).
304+
For this analysis, we chose five metastable states.
305+
The CK test (Fig.~\ref{fig:its-and-ck}b) shows a good agreement between reestimation at higher lagtimes (black/solid lines) and higher powers of the original transition matrix (blue/dashed lines).
306+
Thus, it confirms that five metastable states is an appropriate choice and shows that the MSM we have estimated at lag time $\tau=0.5$~ns indeed predicts the long-timescale behavior of our system within error (blue/shaded area).
298307

299308
\subsection{Analyzing the MSM}
300309

0 commit comments

Comments
 (0)