addressing @franknoe's comments [ci skip]

cwehmeyer · cwehmeyer · commit 769c9b0910e7 · 2018-09-20T23:41:37.000+02:00
diff --git a/manuscript/literature.bib b/manuscript/literature.bib
@@ -577,6 +577,18 @@ @article{oom-feliks
     URL = {https://doi.org/10.1063/1.4976518},
     DOI = {10.1063/1.4976518}
 }
+@article{hao-variational-koopman-models,
+    Author = {Hao Wu and Feliks N\"{u}ske and Fabian Paul and Stefan Klus and P{\'{e}}ter Koltai and Frank No{\'{e}}},
+    Title = {Variational Koopman models: Slow collective variables and molecular kinetics from short off-equilibrium simulations},
+    Journal = {J. Chem. Phys.},
+    Year = {2017},
+    Volume = {146},
+    Number = {15},
+    Pages = {154104},
+    Month = {apr},
+    URL = {https://doi.org/10.1063/1.4979344},
+    DOI = {10.1063/1.4979344}
+}
 @article{NoeClementiReview,
     Author = {Frank No{\'{e}} and Cecilia Clementi},
     Title = {Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods},
diff --git a/manuscript/manuscript.tex b/manuscript/manuscript.tex
@@ -81,7 +81,7 @@
 
 \section{Introduction}
 
-PyEMMA~\cite{pyemma} (\url{http://emma-project.org}) is a software for the analysis of molecular dynamics (MD) simulations using Markov state models~\cite{schuette-msm,singhal-msm-naming} (MSMs).
+PyEMMA~\cite{pyemma} (\url{http://emma-project.org}) is a software for the analysis of molecular dynamics (MD) simulations using Markov state models~\cite{schuette-msm,singhal-msm-naming,noe2007jcp,chodera2007jcp,buchete-msm-2008} (MSMs).
 The package is written in Python (\url{http://python.org}), relies heavily on NumPy/SciPy~\cite{numpy,scipy}, and is compatible with the scikit-learn~\cite{sklearn} framework for machine learning.
 
 \subsection{Scope}
@@ -202,17 +202,24 @@ \subsection{Variational approach and TICA}
 A commonly used method for dimensionality reduction, TICA, is a particular implementation of the VAC.
 To apply TICA, we need to compute instantaneous ($\mathbf{C}(0)$) and time-lagged ($\mathbf{C}(\tau)$) covariance matrices with elements
 \begin{eqnarray}
-c_{ij}(0) & = & \left\langle \tilde{x}_i(t) \; \tilde{x}_j(t) \right\rangle_t \\
-c_{ij}(\tau) & = & \left\langle \tilde{x}_i(t) \; \tilde{x}_j(t + \tau) \right\rangle_t,
+c_{ij}(0) & = & \left\langle \tilde{x}_i(t) \; \tilde{x}_j(t) \right\rangle_t \label{eq:c0}\\
+c_{ij}(\tau) & = & \left\langle \tilde{x}_i(t) \; \tilde{x}_j(t + \tau) \right\rangle_t, \label{eq:ct}
 \end{eqnarray}
 where $\tilde{x}_i(t)$ denotes the $i^\textrm{th}$ feature at time $t$ after the mean has been removed.
-Then, we can solve the generalized eigenvalue problem
+By default, PyEMMA estimates (\ref{eq:c0},\ref{eq:ct}) using symmetrization~\cite{tica}.
+This symmetrization induces a significant bias when using non-equilibrium data from short trajectories~\cite{hao-variational-koopman-models}.
+As an alternative, the so-called Koopman reweighting estimator is available which avoids this bias,
+but comes at the cost of a large variance~\cite{hao-variational-koopman-models}.
+
+After estimating the covariance matrices, TICA solves the generalized eigenvalue problem
 \begin{equation}
 \mathbf{C}(\tau) \, \mathbf{u}_i = \mathbf{C}(0) \, \lambda_i(\tau) \, \mathbf{u}_i \,, \quad i=1,\dots,n,
 \end{equation}
 to obtain independent component directions $\mathbf{u}_i$ which approximate the reaction coordinates of the system,
-where the pairs of eigenvalues and independent components are sorted in descending order,
-and we define a cumulative kinetic variance fraction
+where the pairs of eigenvalues and independent components are sorted in descending order.
+A way to measure the contribution of each independent component to the kinetics
+is obtained by the kinetic distance~\cite{kinetic-maps}
+which assigns a cumulative variance fraction to the first $d$ independent components:
 \begin{equation}
 c_d = \frac{\sum_{i=2}^d \lambda_i^2(\tau)}{\textrm{TKV}},
 \end{equation}
@@ -223,12 +230,12 @@ \subsection{Variational approach and TICA}
 is the total kinetic variance explained by all $n$ features.
 
 If we further scale the independent components $\mathbf{u}_i$ by the corresponding eigenvectors $\lambda_i(\tau)$,
-we obtain a \emph{kinetic map} which is the default behavior in PyEMMA.
+we obtain a \emph{kinetic map}~\cite{kinetic-maps} which is the default behavior in PyEMMA.
 
-Note, though, that TICA requires the data to be in equilibrium.
-To use TICA with nonequilibrium data, we can either symmetrize the time-lagged covariance matrix $\mathbf{C}(\tau)$
-or apply a Koopman reweighting~\cite{vamp-preprint}.
-For short trajectories and nonequilibrium data we generally recommend to use VAMP~\cite{vamp-preprint}.
+Note, though, that TICA requires the dynamics to be simulated at equilibrium conditions.
+To use TICA with nonequilibrium MD, e.g., subject to external forces,
+or simply to perform dimension reduction on short trajectory data without worrying about reweighting,
+we recommend to use VAMP~\cite{vamp-preprint}.
 
 For all these approaches,
 dimensionality reduction is performed by projecting the (mean free) features $\tilde{\mathbf{x}}(t)$
@@ -242,7 +249,7 @@ \subsection{Hidden Markov state models}
 
 \begin{figure}
 \includegraphics[width=0.48\textwidth]{figure_1}
-\caption{The HMM transition matrix $\tilde{\mathbf{P}}(\tau)$ propagates the hidden state trajectory $\tilde{s}(t)$ (orange circles) and, at each time step $t$, the emission into the observable state $s(t)$ is governed by the emission probabilities $\bm{\chi}\left( s(t) \middle| \tilde{s}(t) \right)$.}
+\caption{The HMM transition matrix $\tilde{\mathbf{P}}(\tau)$ propagates the hidden state trajectory $\tilde{s}(t)$ (orange circles) and, at each time step $t$, the emission into the observable state $s(t)$ (cyan circles) is governed by the emission probabilities $\bm{\chi}\left( s(t) \middle| \tilde{s}(t) \right)$.}
 \label{fig:hmm-scheme}
 \end{figure}
 
@@ -252,24 +259,23 @@ \subsection{Hidden Markov state models}
 We illustrate this point in Notebook~07.
 
 An alternative, which is much less sensitive to poor discretization,
-is to estimate a hidden Markov model (HMM)~\cite{hmm-baum-welch-alg,hmm-tutorial,jhp-spectral-rate-theory,bhmm-preprint}.
-HMMs are less sensitive to the discretization error as they sidestep the assumption of Markovian dynamics in the discretized space.
+is to estimate a hidden Markov model (HMM)~\cite{hmm-baum-welch-alg,hmm-tutorial,jhp-spectral-rate-theory,noe-proj-hid-msm,bhmm-preprint}.
+HMMs are less sensitive to the discretization error as they sidestep the assumption of Markovian dynamics in the discretized space (illustrated in Fig.~\ref{fig:hmm-scheme}).
 Instead, HMMs assume that there is an underlying (hidden) dynamic process which is Markovian
-and gives rise to our observed data, e.g., the discretized trajectories (see Fig.~\ref{fig:hmm-scheme}).
+and gives rise to our observed data, e.g., the ($n$~states) discretized trajectories $s(t)$.
 This is a powerful principle as we know that there is indeed an underlying process which is Markovian:
 our molecular dynamics trajectories.
 
-To estimate an HMM, we need a spectral gap after the $m^\textrm{th}$ eigenvalue;
+To estimate an HMM, we need a spectral gap after the $m^\textrm{th}$ timescale;
 in practice, a timescale separation of $t_m \geq 2t_{m+1}$ is sufficient~\cite{pyemma}.
-Then, we can approximate the dynamics in the observed microstates ($\mathbf{P}$) at any lag time $k\tau$ via
-\begin{equation}
-\mathbf{P}(k\tau) \approx \bm{\Pi}^{-1} \bm{\chi}^\top \tilde{\bm{\Pi}} \tilde{\mathbf{P}}^k(\tau) \, \bm{\chi}.
-\end{equation}
-Here, the $\bm{\Pi}=\left[ \pi_1,\dots,\pi_n \right]$ is a diagonal matrix of the $n$ microstates' stationary probabilities,
-$\tilde{\bm{\Pi}}=\left[ \tilde{\pi}_1,\dots,\tilde{\pi}_m \right]$ is a diagonal matrix of the $m<n$ hidden states' stationary probabilities,
-$\tilde{\mathbf{P}}(\tau)$ is a transition matrix between the $m<n$ hidden states at lag time $\tau$,
-and $\bm{\chi}$ is an $m\times n$-dimensional row-stochastic matrix
-where each row encodes the emission probabilities into the $n$ microstates conditioned on being in the corresponding hidden state~\cite{noe-proj-hid-msm}.
+The HMM then consists of a transition matrix $\tilde{\mathbf{P}}(\tau)$ between $m<n$ hidden states
+and a row-stochastic matrix ($\bm{\chi}$) of probabilities $\chi\left( s \middle| \tilde{s} \right)$
+to emit the discrete state $s$ conditional on being in the hidden state $\tilde{s}$.
+
+We can further compute a reversal of the emission matrix $\bm{\chi}\in\mathbb{R}^{m \times n}$:
+the membership matrix $\mathbf{M}\in\mathbb{R}^{n \times m}$ which encodes
+a fuzzy assignment of each of the $n$ observable microstates $s$ to the $m$ hidden states $\tilde{s}$ and,
+thus, defines the \emph{coarse graining} of microstate.
 
 An HMM estimation always yields a model with a small number of (hidden) states
 where each state is considered to be metastable and,
@@ -311,15 +317,6 @@ \subsection{Software and installation}
 
 \section{PyEMMA tutorials}
 
-\begin{figure}[bt]
-\includegraphics[width=0.48\textwidth]{figure_2}
-\caption{The PyEMMA workflow: MD trajectories are processed and discretized (first row).
-A Markov state model is estimated from the resulting discrete trajectories and validated (middle row).
-By iterating between data processing and MSM estimation/validation,
-a dynamical model is obtained that can be analyzed (last row).}
-\label{fig:workflowchart}
-\end{figure}
-
 This tutorial consists of nine Jupyter notebooks which introduce the basic features of PyEMMA.
 The first notebook (00), which we will summarize in the following, showcases the entire estimation,
 validation, and analysis workflow for a small example system.
@@ -355,6 +352,15 @@ \subsection{The PyEMMA workflow}
 we chose to adopt a sequential approach where only the hyper-parameters of the current stage are optimized.
 This approach is not only computationally cheaper but allows us to discuss the significance of the necessary modeling choices.
 
+\begin{figure}[bt]
+\includegraphics[width=0.48\textwidth]{figure_2}
+\caption{The PyEMMA workflow: MD trajectories are processed and discretized (first row).
+A Markov state model is estimated from the resulting discrete trajectories and validated (middle row).
+By iterating between data processing and MSM estimation/validation,
+a dynamical model is obtained that can be analyzed (last row).}
+\label{fig:workflowchart}
+\end{figure}
+
 \subsection{Feature selection}
 
 In Markov state modeling, our objective is to model the slow dynamics of a molecular process.
@@ -366,16 +372,6 @@ \subsection{Feature selection}
 provide a systematic means to quantitatively compare multiple representations of the simulation data.
 In particular, we can use a scalar score obtained using VAMP to directly compare the ability of certain features to capture slow dynamical modes in a particular molecular system.
 
-\begin{figure}
-\includegraphics{figure_3}
-\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
-(a)~The Trp-Leu-Ala-Leu-Leu pentapeptide in licorice representation~\cite{vmd}.
-(b)~The VAMP-2 score indicates which of the tested featurizations contains the highest kinetic variance.
-(c)~The sample free energy projected onto the first two time-lagged independent components (ICs) at lag time $\tau=0.5$~ns shows multiple minima and
-(d)~the time series of the first two ICs of the first trajectory show rare jumps.}
-\label{fig:io-to-tica}
-\end{figure}
-
 Here, we utilize the VAMP-2 score, which maximizes the kinetic variance contained in the features~\cite{kinetic-maps}.
 We should always evaluate the score in a cross-validated manner to ensure that we neither include too few features (under-fitting) or too many features (over-fitting)~\cite{gmrq,vamp-preprint}.
 To choose among three different molecular features reflecting protein structure,
@@ -402,6 +398,16 @@ \subsection{Dimensionality reduction}
 Discrete jumps between the minima can be observed by visualizing the transformation of the first trajectory into these ICs (Fig.~\ref{fig:io-to-tica}d).
 We thus assume that our TICA-transformed backbone torsion features describe one or more metastable processes.
 
+\begin{figure}
+\includegraphics{figure_3}
+\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
+(a)~The Trp-Leu-Ala-Leu-Leu pentapeptide in licorice representation~\cite{vmd}.
+(b)~The VAMP-2 score indicates which of the tested featurizations contains the highest kinetic variance.
+(c)~The sample free energy projected onto the first two time-lagged independent components (ICs) at lag time $\tau=0.5$~ns shows multiple minima and
+(d)~the time series of the first two ICs of the first trajectory show rare jumps.}
+\label{fig:io-to-tica}
+\end{figure}
+
 \subsection{Discretization}
 
 TICA yields a representation of our molecular simulation data with a reduced dimensionality,