Skip to content

Commit 57644f4

Browse files
committed
addressed reviewers' comments and added line breaks
1 parent c495445 commit 57644f4

File tree

1 file changed

+41
-27
lines changed

1 file changed

+41
-27
lines changed

manuscript/manuscript.tex

Lines changed: 41 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ \subsection{Essential theory}
106106
We additionally require that the system is ergodic, which means that every state is accessible from every other state.
107107

108108
When estimating an MSM it is critical to choose a lag time, $\tau$, which is long enough to ensure Markovian dynamics in our state space, but short enough to resolve the dynamics in which we are interested.
109-
Plotting the implied timescales (ITS) as a function of $\tau$ can be a helpful diagnostic when selecting the MSM lag time~\cite{swope-its}.
109+
Plotting the implied timescales (ITS) as a function of~$\tau$ can be a helpful diagnostic when selecting the MSM lag time~\cite{swope-its}.
110110
The ITS $t_i$ approximates the decorrelation time of the $i^\textrm{th}$ process and is computed from the eigenvalues $\lambda_i$ of the MSM transition matrix via,
111111
\begin{equation}
112112
\label{eq:its}
@@ -122,7 +122,7 @@ \subsection{Essential theory}
122122
T(k \tau) = T^k(\tau),
123123
\end{equation}
124124

125-
\noindent{}where the left-hand side of the equation corresponds to an MSM estimated at lag time $k\tau$, where $k$ is an integer larger than 1, whereas the right-hand side of the equation is our estimated MSM transition probability matrix to the $k^\textrm{th}$ power.
125+
\noindent{}where the left-hand side of the equation corresponds to an MSM estimated at lag time $k\tau$, where $k$ is an integer larger than~1, whereas the right-hand side of the equation is our estimated MSM transition probability matrix to the $k^\textrm{th}$ power.
126126
By assessing how well the approximated transition probability matrix adheres to the CK property, we can validate the appropriateness of the Markovian assumption for the model.
127127

128128
Once validated, the transition matrix can be decomposed into eigenvectors and eigenvalues,
@@ -133,8 +133,8 @@ \subsection{Essential theory}
133133
\end{equation}
134134

135135
\noindent{}where the eigenvalues are indexed in decreasing order. The highest eigenvalue, $\lambda_1$, is unique and is equal to $1$, and its corresponding left eigenvector $\phi_1$ corresponds to the stationary distribution of the system.
136-
The right eigenvector $\psi_1$ is a vector consisting of $1$'s.
137-
The subsequent eigenvalues $\lambda_{i>1}$ are real with absolute values less than $1$ and correspond to dynamical processes within the system.
136+
The right eigenvector $\psi_1$ is a vector consisting of~$1$'s.
137+
The subsequent eigenvalues $\lambda_{i>1}$ are real with absolute values less than~$1$ and correspond to dynamical processes within the system.
138138
The right eigenvectors $\psi_i$ each represent a dynamical process (for $i>1$), and the coefficients of the eigenvectors represent the flux into and out of the MSM states that characterizes that process.
139139
The corresponding left eigenvectors $\phi_i$ contain the same information weighted by the stationary distribution.
140140

@@ -210,7 +210,7 @@ \subsection{Software/system requirements}
210210
\item nglview -- Widget for active viewing of molecular structures in Jupyter environments~\cite{nglview}
211211
\end{itemize}
212212

213-
The tutorial software is currently supported for Python versions $3.5$ and $3.6$ on the operating systems Linux, OSX, and Windows.
213+
The tutorial software is currently supported for Python versions~$3.5$ and~$3.6$ on the operating systems Linux, OSX, and Windows.
214214

215215
Should the user prefer not to use Anaconda, a manual installation via the pip installer is possible.
216216
Alternatively, one can use the Binder service (\url{https://mybinder.org}) to view and run the tutorials online in any browser.
@@ -236,7 +236,9 @@ \subsection{The PyEMMA workflow}
236236
\item coarse-graining the MSM using a hidden Markov model approach (07).
237237
\end{itemize}
238238

239-
For the remainder of this manuscript we will walk through the first notebook (00). In notebook 00 we analyze a dataset of the Trp-Leu-Ala-Leu-Leu pentapeptide (Fig.~\ref{fig:io-to-tica}a), consisting of $25$ independent MD trajectories conducted in implicit solvent with frames saved at an interval of $0.1$~ns. We present the results obtained in the notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
239+
For the remainder of this manuscript we will walk through the first notebook (00).
240+
In notebook~00 we analyze a dataset of the Trp-Leu-Ala-Leu-Leu pentapeptide (Fig.~\ref{fig:io-to-tica}a), consisting of~$25$ independent MD trajectories conducted in implicit solvent with frames saved at an interval of~$0.1$~ns.
241+
We present the results obtained in this notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
240242
The figures that will be displayed in the following are created in the showcase notebook (00) and can be easily reproduced.
241243

242244
\subsection{Feature selection}
@@ -245,19 +247,19 @@ \subsection{Feature selection}
245247
\includegraphics{figure_2}
246248
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone: (a)~The Trp-Leu-Ala-Leu-Leu pentapeptide in licorice representation~\cite{vmd}.
247249
(b)~The VAMP-2 score indicates which of the tested featurizations contains the highest kinetic variance.
248-
(c)~The sample density projected onto the first two time-lagged independent components (ICs) at lag time $\tau=0.5$ ns shows multiple density maxima and
249-
(d)~the time series of the first two ICs show rare transition events.}
250+
(c)~The sample density projected onto the first two time-lagged independent components (ICs) at lag time $\tau=0.5$~ns shows multiple density maxima and
251+
(d)~the time series of the first two ICs of the first trajectory.}
250252
\label{fig:io-to-tica}
251253
\end{figure}
252254

253255
\begin{figure}
254256
\includegraphics{figure_3}
255257
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
256-
(a)~The convergence behavior of the first four implied timescales indicates that a lag time of $\tau=0.5$ ns is suitable for MSM estimation.
257-
(b) A Chapman-Kolmogorov test shows that an MSM estimated at lag time $\tau=0.5$ ns under the assumption of five metastable states accurately predicts the kinetic behavior on longer timescales.
258-
The solid lines in (a) refer to the maximum likelihood result while the dashed lines show the ensemble mean computed with a Bayesian sampling scheme.
259-
The black line indicates the timescale cutoff and, thus, all ITS withion the grey-shaded area are not necessarily resolved.
260-
In both panels, the (non-grey) shaded areas indicate $95\%$ confidence intervals computed with a Bayesian sampling procedure.}
258+
(a)~The convergence behavior of the implied timescales associated with the four slowest processes.
259+
(b)~Chapman-Kolmogorov test computed using an MSM estimated with lag time $\tau=0.5$~ns assuming~5 meta-stable states..
260+
The solid lines in (a) refer to the maximum likelihood result while the dashed lines show the ensemble mean computed with a Bayesian sampling procedure~\cite{ben-rev-msm}.
261+
The black line indicates where implied timescales are equal to the lag time, whereas the grey area indicates all implied timescales faster than the lag time.
262+
In both panels, the (non-grey) shaded areas indicate~$95\%$ confidence intervals computed with the aforementioned Bayesian sampling procedure.}
261263
\label{fig:its-and-ck}
262264
\end{figure}
263265

@@ -268,13 +270,17 @@ \subsection{Feature selection}
268270

269271
Here, we utilize the VAMP-2 score, which maximizes the kinetic variance contained in the features~\cite{kinetic-maps}.
270272
We should always evaluate the score in a cross-validated manner to ensure that we neither include too few features (under-fitting) or too many features (over-fitting)~\cite{gmrq,vamp-preprint}.
271-
To choose among three different molecular features relevant to protein structure, we compute the (cross-validated) VAMP-2 score at a lag time of $0.5$~ns and find that backbone torsions contain more kinetic variance than the backbone's heavy atom positions or the distances between them (Fig.~\ref{fig:io-to-tica}b).
273+
To choose among three different molecular features relevant to protein structure, we compute the (cross-validated) VAMP-2 score at a lag time of~$0.5$~ns.
274+
We find that backbone torsions contain more kinetic variance than the backbone heavy atom positions or the distances between them (Fig.~\ref{fig:io-to-tica}b).
275+
This suggests that backbone torsions are the best of the options evaluated for MSM construction.
272276

273277
We note that deep learning approaches for feature selection have recently been developed that may eventually replace the feature selection step~\cite{vampnet,tae,hernandez-vde}.
274278

275279
\subsection{Dimensionality reduction}
276280

277-
Subsequently, we perform TICA~\cite{tica,kinetic-maps} in order to reduce the dimension from the feature space, which typically contains many degrees of freedom, to a lower dimensional space that can be discretized with higher resolution and better statistical efficiency. TICA is a special case of the variational principle~\cite{noe-vac,nueske-vamk} and is designed to find a projection preserving the long-timescale dynamics in the dataset. Here, performing TICA on the backbone torsions at lag time $0.5$ ns yields a four dimensional subspace using a $95\%$ kinetic variance cutoff (note that we perform a $\cos/\sin$-transformation of the torsions before TICA in order to preserve their periodicity).
281+
Subsequently, we perform TICA~\cite{tica,kinetic-maps} in order to reduce the dimension from the feature space, which typically contains many degrees of freedom, to a lower dimensional space that can be discretized with higher resolution and better statistical efficiency.
282+
TICA is a special case of the variational principle~\cite{noe-vac,nueske-vamk} and is designed to find a projection preserving the long-timescale dynamics in the dataset.
283+
Here, performing TICA on the backbone torsions at lag time~$0.5$~ns yields a four dimensional subspace using a~$95\%$ kinetic variance cutoff (note that we perform a $\cos/\sin$-transformation of the torsions before TICA in order to preserve their periodicity).
278284
The sample density projected onto the first two independent components (ICs) exhibits several maxima (Fig.~\ref{fig:io-to-tica}c).
279285
Discrete jumps between the maxima can be observed by visualizing the transformation of the first trajectory into these ICs (Fig.~\ref{fig:io-to-tica}d).
280286
We thus assume that our TICA-transformed backbone torsion features describe one or more metastable processes.
@@ -284,13 +290,13 @@ \subsection{Discretization}
284290

285291
\subsection{MSM estimation and validation}
286292
% When estimating an MSM it is critical to choose a lag time, $\tau$, which is long enough to ensure Markovian dynamics in our reduced space, but short enough to resolve the dynamics in which we are interested.
287-
% Plotting the implied timescales (ITS) as a function of $\tau$ can be a helpful diagnostic when selecting the MSM lag time~\cite{swope-its}. The ITS $t_i$ approximates the decorrelation time of the $i^\textrm{th}$ process and is computed from the eigenvalues $\lambda_i$ of the MSM transition matrix via
293+
% Plotting the implied timescales (ITS) as a function of~$\tau$ can be a helpful diagnostic when selecting the MSM lag time~\cite{swope-its}. The ITS $t_i$ approximates the decorrelation time of the $i^\textrm{th}$ process and is computed from the eigenvalues $\lambda_i$ of the MSM transition matrix via
288294
% \begin{equation}
289295
% \label{eq:its}
290296
% t_i = \frac{-\tau}{\ln\left|\lambda_i(\tau)\right|}.
291297
% \end{equation}
292-
A necessary condition for Markovian dynamics in our reduced space is that the ITS are approximately constant as a function of $\tau$; accordingly, we chose the smallest possible $\tau$ which fulfills this condition within the model uncertainty. The uncertainty bounds are computed using a Bayesian scheme~\cite{ben-rev-msm,noe-tmat-sampling} with $100$ samples.
293-
In our example, we find that the four slowest ITS converge quickly and are constant within a $95\%$ confidence interval for lag times above $0.5$~ns (Fig.~\ref{fig:its-and-ck}a). Using this lag time we can now estimate a (Bayesian) MSM with $\tau=0.5$~ns.
298+
A necessary condition for Markovian dynamics in our reduced space is that the ITS are approximately constant as a function of $\tau$; accordingly, we chose the smallest possible $\tau$ which fulfills this condition within the model uncertainty. The uncertainty bounds are computed using a Bayesian scheme~\cite{ben-rev-msm,noe-tmat-sampling} with~$100$ samples.
299+
In our example, we find that the four slowest ITS converge quickly and are constant within a $95\%$ confidence interval for lag times above~$0.5$~ns (Fig.~\ref{fig:its-and-ck}a). Using this lag time we can now estimate a (Bayesian) MSM with $\tau=0.5$~ns.
294300

295301
To test the validity of our MSM we perform a Chapman-Kolmogorov (CK) test.
296302
% The CK test compares the right and the left side of the Chapman-Kolmogorov equation
@@ -302,8 +308,8 @@ \subsection{MSM estimation and validation}
302308
Visualizing the full transition probability matrix $T$ is difficult; we therefore coarse-grain $T$ into a smaller number of metastable states before performing the test.
303309
An appropriate number of metastable states can be chosen by identifying a relatively large gap in the ITS plot.
304310
For this analysis, we chose five metastable states.
305-
The CK test (Fig.~\ref{fig:its-and-ck}b) shows a good agreement between reestimation at higher lagtimes (black/solid lines) and higher powers of the original transition matrix (blue/dashed lines).
306-
Thus, it confirms that five metastable states is an appropriate choice and shows that the MSM we have estimated at lag time $\tau=0.5$~ns indeed predicts the long-timescale behavior of our system within error (blue/shaded area).
311+
The CK test (Fig.~\ref{fig:its-and-ck}b) shows that predictions from our MSM (blue-dashed lines) agrees well with MSMs estimated with longer lag times (black-solid lines)
312+
Thus, the CK test confirms that five metastable states is an appropriate choice and shows that the MSM we have estimated at lag time $\tau=0.5$~ns indeed predicts the long-timescale behavior of our system within error (blue/shaded area).
307313

308314
\subsection{Analyzing the MSM}
309315

@@ -338,7 +344,7 @@ \subsection{Analyzing the MSM}
338344
where $\pi_j$ denotes the MSM stationary weight of the $j^\textrm{th}$ microstate.
339345

340346
In order to interpret the slowest relaxation timescales, we refer to the (right) eigenvectors of the MSM as they contain information about what configurational changes are happening and their timescales.
341-
The first right eigenvector corresponds to the stationary process and its eigenvalue is the Perron eigenvalue $1$.
347+
The first right eigenvector corresponds to the stationary process and its eigenvalue is the Perron eigenvalue~$1$.
342348
The second right eigenvector, however, corresponds to the slowest process
343349
(the eigenvector components are real because of the detailed balance constraint enforced during MSM estimation).
344350
The minimal and maximal components of the second right eigenvector indicate the microstates between which the process shifts probability density.
@@ -354,11 +360,14 @@ \subsection{Analyzing the MSM}
354360
\end{array}\]
355361
using the Bayesian MSM.
356362

357-
TPT~\cite{weinan-tpt,metzner-msm-tpt} is a method used to analyze the statistics of transition pathways. The TPT version of~\cite{noe-folding-pathways} can be conveniently applied to the estimated MSM. Here, we compute the TPT flux between macrostates $\mathcal{S}_2$ and $\mathcal{S}_4$ (Fig.~\ref{fig:msm-analysis}d).
363+
TPT~\cite{weinan-tpt,metzner-msm-tpt} is a method used to analyze the statistics of transition pathways.
364+
The TPT version of~\cite{noe-folding-pathways} can be conveniently applied to the estimated MSM.
365+
Here, we compute the TPT flux between macrostates $\mathcal{S}_2$ and $\mathcal{S}_4$ (Fig.~\ref{fig:msm-analysis}d).
358366
The committor projection onto the first two TICA components shows that it is constant within the metastable states defined above.
359367
Transition regions (macrostates $\mathcal{S}_{(1,3,5)}$) can be identified by committor values $\approx \frac{1}{2}$.
360368

361-
The transition network can be additionally visualized by plotting representative structures of the five metastable states $\mathcal{S}_{(1-5)}$ according to their committor probability (Fig.~\ref{fig:tpt-network}). It is easy to see from this depiction that the dominant pathway from $\mathcal{S}_2$ to $\mathcal{S}_4$ proceeds through $\mathcal{S}_5$.
369+
The transition network can be additionally visualized by plotting representative structures of the five metastable states $\mathcal{S}_{(1-5)}$ according to their committor probability (Fig.~\ref{fig:tpt-network}).
370+
It is easy to see from this depiction that the dominant pathway from $\mathcal{S}_2$ to $\mathcal{S}_4$ proceeds through $\mathcal{S}_5$.
362371

363372
\begin{figure}
364373
\includegraphics{figure_5}
@@ -372,22 +381,27 @@ \subsection{Connecting the MSM with experimental data}
372381

373382
\begin{figure}
374383
\includegraphics{figure_6}
375-
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone: (a)~the Trp-1 SASA autocorrelation function yields a weak signal which, however, (b)~can be enhanced if the system is prepared in the nonequilibrium condition $\mathcal{S}_1$. The solid/orange lines denote the maximum likelihood MSM result; the dashed/blue lines and the the shaded areas indicate sample means and $95\%$ confidence intervals computed with a Bayesian sampling procedure.}
384+
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
385+
(a)~the Trp-1 SASA autocorrelation function yields a weak signal which, however,
386+
(b)~can be enhanced if the system is prepared in the nonequilibrium condition $\mathcal{S}_1$.
387+
The solid/orange lines denote the maximum likelihood MSM result;
388+
the dashed/blue lines and the the shaded areas indicate sample means and~$95\%$ confidence intervals computed with a Bayesian sampling procedure~\cite{ben-rev-msm}.}
376389
\label{fig:msm-exp-obs}
377390
\end{figure}
378391

379-
MSMs can also be analyzed in the context of experimental observables. Connecting MSM analysis to experimental data can both serve as an accuracy test of our MSM as well as provide a mechanistic interpretation of observed experimental signals.
392+
MSMs can also be analyzed in the context of experimental observables.
393+
Connecting MSM analysis to experimental data can both serve as an accuracy test of our MSM as well as provide a mechanistic interpretation of observed experimental signals.
380394
Since we have both the stationary and dynamic properties of the molecular system encoded in the MSM transition probability matrix, we can compute observables that involve both stationary ensemble averages as well as correlation functions.
381395

382396
As an example, here we look at the fluorescence correlation of Trp-1, since this terminal tryptophan is a realistic experimental observable for our pentapeptide system.
383-
In order to compute the fluorescence correlation functions we require a microscopic, instantaneous value of the tryptophan fluorescence for each of the original $75$ MSM microstates.
397+
In order to compute the fluorescence correlation functions we require a microscopic, instantaneous value of the tryptophan fluorescence for each of the original~$75$ MSM microstates.
384398
To approximate the fluorescence signal in our pentapeptide system, we use the mdtraj library~\cite{mdtraj} to compute the solvent accessible surface area (SASA) of Trp-1.
385399
Now that we have an approximation of the fluorescence in each of our MSM states, we can use PyEMMA to compute the fluorescence autocorrelation function (ACF) from our MSM (\ref{fig:msm-exp-obs}a).
386400
Note how the computed ACF has a very small response (i.e., signal amplitude).
387401

388402
Using PyEMMA, we can simulate the relaxation of an observable if we had prepared our molecular system in a nonequilibrium initial condition.
389403
The experimental counterpart of such a prediction could be a temperature or pressure jump experiment or a stopped flow assay.
390-
To illustrate such an experiment, we initialize our molecular ensemble as the metastable distribution of $\mathcal{S}_1$ and follow the predicted fluorescence signal as it relaxes to equilibrium (\ref{fig:msm-exp-obs}b).
404+
To illustrate such an experiment, we initialize our molecular ensemble as the metastable distribution of~$\mathcal{S}_1$ and follow the predicted fluorescence signal as it relaxes to equilibrium (\ref{fig:msm-exp-obs}b).
391405
We see that the predicted relaxation signal has a much larger amplitude for the nonequilibrium initialization, making it more likely to be experimentally measurable.
392406

393407
\subsection{Summary}

0 commit comments

Comments
 (0)