Skip to content

Commit 99e6e65

Browse files
authored
Merge pull request #149 from markovmodel/revision-cw
Revision cw
2 parents 93829f2 + 991b80c commit 99e6e65

File tree

7 files changed

+717
-1028
lines changed

7 files changed

+717
-1028
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ We keep the details minimal throughout the showcase but point to the more specia
2121
In detail, the remaining eight notebooks revisit all aspects shown in the showcase, provide additional details and variants, and contain exercises (and solutions) to self-check your learning progress:
2222

2323
1. Data-I/O and featurization [➜ 📓](notebooks/01-data-io-and-featurization.ipynb)
24-
2. Dimension reduction and discretization [➜ 📓](notebooks/02-dimension-reduction-and-discretisation.ipynb)
24+
2. Dimension reduction and discretization [➜ 📓](notebooks/02-dimension-reduction-and-discretization.ipynb)
2525
3. MSM estimation and validation [➜ 📓](notebooks/03-msm-estimation-and-validation.ipynb)
2626
4. MSM analysis [➜ 📓](notebooks/04-msm-analysis.ipynb)
2727
5. PCCA and TPT analysis [➜ 📓](notebooks/05-pcca-tpt.ipynb)

manuscript/figures/figure_2.pdf

647 Bytes
Binary file not shown.

manuscript/figures/figure_3.pdf

-12.7 KB
Binary file not shown.

manuscript/figures/figure_5.pdf

22.7 KB
Binary file not shown.

manuscript/literature.bib

Lines changed: 607 additions & 952 deletions
Large diffs are not rendered by default.

manuscript/manuscript.tex

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -110,14 +110,15 @@ \subsection{Software/system requirements}
110110
\begin{itemize}
111111
\item \textbf{PyEMMA} -- MSM/HMM estimation, validation, analysis, and visualization, and its dependencies~\cite{pyemma}
112112
\item mdshare -- A downloader for MD data from a public server
113-
\item notebook -- The Jupyter notebook tool used for running the tutorials~\cite{jupyter}, along with extension packages jupyter\_contrib\_nbextensions and nbexamples
113+
\item notebook -- The Jupyter~\cite{jupyter} notebook tool used for running the tutorials, along with extension packages jupyter\_contrib\_nbextensions and nbexamples
114+
\item matplotlib -- A plotting library~\cite{matplotlib}
114115
\item nglview -- Widget for active viewing of molecular structures in Jupyter environments~\cite{nglview}
115116
\end{itemize}
116117

117118
The tutorial software is currently supported for Python versions $3.5$ and $3.6$ on the operating systems Linux, OSX, and Windows.
118119

119120
Should the user prefer not to use Anaconda, a manual installation via the pip installer is possible.
120-
Alternatively, one can use the Binder service to view and run the tutorials online in any browser.
121+
Alternatively, one can use the Binder service (\url{https://mybinder.org}) to view and run the tutorials online in any browser.
121122

122123
\section{Content and links}
123124

@@ -128,7 +129,7 @@ \section{Content and links}
128129

129130
\subsection{The PyEMMA workflow}
130131

131-
In short, the workflow for a full analysis of an MD dataset might consist of,
132+
In short, the workflow (Fig.~\ref{fig:workflowchart}) for a full analysis of an MD dataset might consist of,
132133
\begin{itemize}
133134
\item extracting molecular features from the raw data (01),
134135
\item transforming those features into a suitable, low dimensional subspace (02),
@@ -147,7 +148,7 @@ \subsection{Feature selection}
147148

148149
\begin{figure}
149150
\includegraphics{figure_2}
150-
\caption{Exemplary analysis of the conformational dynamics of a pentapeptide backbone: (a)~The Trp-Leu-Ala-Leu-Leu pentapeptide in licorice representation~\cite{vmd}.
151+
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone: (a)~The Trp-Leu-Ala-Leu-Leu pentapeptide in licorice representation~\cite{vmd}.
151152
(b)~The VAMP-2 score indicates which of the tested featurizations contains the highest kinetic variance.
152153
(c)~The sample density projected onto the first two time-lagged independent components (ICs) at lag time $\tau=0.5$ ns shows multiple density maxima and
153154
(d)~the time series of the first two ICs show rare transition events.
@@ -203,12 +204,11 @@ \subsection{Analyzing the MSM}
203204

204205
\begin{figure}
205206
\includegraphics{figure_3}
206-
\caption{Exemplary analysis of the conformational dynamics of a pentapeptide backbone:
207-
(a) The reweighted free energy surface projected onto the first two independent components exhibits five minima which
208-
(b) PCCA++ identifies as five metastable states.
209-
(c) The second right eigenvector shows that the slowest process shifts probability between the least probable state ($\mathcal{S}_1$) and the other states, in particular states ($\mathcal{S}_4$, $\mathcal{S}_5$), whereas
210-
(d) the committor $\mathcal{S}_2\to\mathcal{S}_4$ indicates that states $\mathcal{S}_{(1,3,5)}$ act as a transition region between states $\mathcal{S}_2$ and $\mathcal{S}_4$.
211-
(e) The Trp-1 SASA autocorrelation function yields a weak signal (top) which, however, can be enhanced if the system is prepared in the nonequilibrium condition $\mathcal{S}_1$ (bottom).}
207+
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
208+
(a)~The reweighted free energy surface projected onto the first two independent components exhibits five minima which
209+
(b)~PCCA++ identifies as five metastable states.
210+
(c)~The second right eigenvector shows that the slowest process shifts probability between the least probable state ($\mathcal{S}_1$) and the other states, in particular states ($\mathcal{S}_4$, $\mathcal{S}_5$), whereas
211+
(d)~the committor $\mathcal{S}_2\to\mathcal{S}_4$ indicates that states $\mathcal{S}_{(1,3,5)}$ act as a transition region between states $\mathcal{S}_2$ and $\mathcal{S}_4$.}
212212
\label{fig:msm-analysis}
213213
\end{figure}
214214

@@ -257,25 +257,32 @@ \subsection{Analyzing the MSM}
257257

258258
\begin{figure}
259259
\includegraphics{figure_4}
260-
\caption{Visualization of the transition paths from $\mathcal{S}_2$ to $\mathcal{S}_4$:
260+
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone:
261+
visualization of the transition paths from $\mathcal{S}_2$ to $\mathcal{S}_4$.
261262
Metastable states $\mathcal{S}_{(1-5)}$ are represented by an ensemble of representative structures and are arranged along the horizonal axis according to their committor probabilities. The three main transition pathways starting from $\mathcal{S}_2$ and ending in $\mathcal{S}_4$ are depicted by gray arrows with thickness proportional to the transition flux. The dominant pathway proceeds through $\mathcal{S}_5$.}
262263
\label{fig:tpt-network}
263264
\end{figure}
264265

265266
\subsection{Connecting the MSM with experimental data}
266267

268+
\begin{figure}
269+
\includegraphics{figure_5}
270+
\caption{Example analysis of the conformational dynamics of a pentapeptide backbone: (a)~the Trp-1 SASA autocorrelation function yields a weak signal which, however, (b)~can be enhanced if the system is prepared in the nonequilibrium condition $\mathcal{S}_1$. The solid/orange lines denote the maximum likelihood MSM result; the dashed/blue lines and the the shaded areas indicate sample means and $95\%$ confidence intervals computed with a Bayesian sampling procedure.}
271+
\label{fig:msm-exp-obs}
272+
\end{figure}
273+
267274
MSMs can also be analyzed in the context of experimental observables. Connecting MSM analysis to experimental data can both serve as an accuracy test of our MSM as well as provide a mechanistic interpretation of observed experimental signals.
268275
Since we have both the stationary and dynamic properties of the molecular system encoded in the MSM transition probability matrix, we can compute observables that involve both stationary ensemble averages as well as correlation functions.
269276

270277
As an example, here we look at the fluorescence correlation of Trp-1, since this terminal tryptophan is a realistic experimental observable for our pentapeptide system.
271278
In order to compute the fluorescence correlation functions we require a microscopic, instantaneous value of the tryptophan fluorescence for each of the original $75$ MSM microstates.
272279
To approximate the fluorescence signal in our pentapeptide system, we use the mdtraj library~\cite{mdtraj} to compute the solvent accessible surface area (SASA) of Trp-1.
273-
Now that we have an approximation of the fluorescence in each of our MSM states, we can use PyEMMA to compute the fluorescence autocorrelation function (ACF) from our MSM (\ref{fig:msm-analysis}e, upper).
280+
Now that we have an approximation of the fluorescence in each of our MSM states, we can use PyEMMA to compute the fluorescence autocorrelation function (ACF) from our MSM (\ref{fig:msm-exp-obs}a).
274281
Note how the computed ACF has a very small response (i.e., signal amplitude).
275282

276283
Using PyEMMA, we can simulate the relaxation of an observable if we had prepared our molecular system in a nonequilibrium initial condition.
277284
The experimental counterpart of such a prediction could be a temperature or pressure jump experiment or a stopped flow assay.
278-
To illustrate such an experiment, we initialize our molecular ensemble as the metastable distribution of $\mathcal{S}_1$ and follow the predicted fluorescence signal as it relaxes to equilibrium (\ref{fig:msm-analysis}e, lower).
285+
To illustrate such an experiment, we initialize our molecular ensemble as the metastable distribution of $\mathcal{S}_1$ and follow the predicted fluorescence signal as it relaxes to equilibrium (\ref{fig:msm-exp-obs}b).
279286
We see that the predicted relaxation signal has a much larger amplitude for the nonequilibrium initialization, making it more likely to be experimentally measurable.
280287

281288
\subsection{Summary}

notebooks/00-pentapeptide-showcase.ipynb

Lines changed: 89 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1226,15 +1226,27 @@
12261226
"ax_mol.set_axis_off()\n",
12271227
"ax_mol.imshow(plt.imread('static/pentapeptide-structure.png'))\n",
12281228
"\n",
1229-
"ax_feat = fig.add_subplot(gs[2000:3150, 400:1800])\n",
1230-
"ax_feat.bar(\n",
1231-
" vamp_bars_plot['labels'],\n",
1232-
" vamp_bars_plot['scores'],\n",
1233-
" yerr=vamp_bars_plot['errors'],\n",
1234-
" color=['C0', 'C1', 'C2'])\n",
1235-
"ax_feat.tick_params(axis='x', labelrotation=20)\n",
1236-
"ax_feat.set_ylabel('VAMP2 score')\n",
1237-
"ax_feat.set_title(r'lag time $\\tau$ = {:.1f} ns'.format(vamp_bars_plot['lag'] * 0.1))\n",
1229+
"ax_feat_label = fig.add_subplot(gs[2000:3150, 180:1800])\n",
1230+
"ax_feat_upper = fig.add_subplot(gs[2000:2720, 400:1800])\n",
1231+
"ax_feat_lower = fig.add_subplot(gs[2790:3150, 400:1800])\n",
1232+
"for ax in (ax_feat_upper, ax_feat_lower):\n",
1233+
" ax.bar(\n",
1234+
" vamp_bars_plot['labels'],\n",
1235+
" vamp_bars_plot['scores'],\n",
1236+
" yerr=vamp_bars_plot['errors'],\n",
1237+
" color=['C0', 'C1', 'C2'])\n",
1238+
"for key in ('top', 'bottom', 'left', 'right'):\n",
1239+
" ax_feat_label.spines[key].set_visible(False)\n",
1240+
"ax_feat_label.set_ylabel('VAMP2 score')\n",
1241+
"ax_feat_label.set_xticks([])\n",
1242+
"ax_feat_label.set_yticks([])\n",
1243+
"ax_feat_upper.set_title(r'lag time $\\tau$ = {:.1f} ns'.format(vamp_bars_plot['lag'] * 0.1))\n",
1244+
"ax_feat_upper.spines['bottom'].set_visible(False)\n",
1245+
"ax_feat_upper.set_xticks([])\n",
1246+
"ax_feat_upper.set_ylim(2.8, 3.6)\n",
1247+
"ax_feat_lower.spines['top'].set_visible(False)\n",
1248+
"ax_feat_lower.tick_params(axis='x', labelrotation=20)\n",
1249+
"ax_feat_lower.set_ylim(0.0, 0.4)\n",
12381250
"\n",
12391251
"ax_density = fig.add_subplot(gs[2000:3150, 2200:3350])\n",
12401252
"_, _, misc = pyemma.plots.plot_density(\n",
@@ -1300,7 +1312,7 @@
13001312
"metadata": {},
13011313
"outputs": [],
13021314
"source": [
1303-
"fig = plt.figure(figsize=(3.47, 5.85))\n",
1315+
"fig = plt.figure(figsize=(3.47, 3.95))\n",
13041316
"gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))\n",
13051317
"gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))\n",
13061318
"gs = plt.GridSpec(gh, gw)\n",
@@ -1309,10 +1321,9 @@
13091321
"ax_box = fig.add_subplot(gs[:, :])\n",
13101322
"ax_box.set_axis_off()\n",
13111323
"ax_box.text(0.03, 0.97, '(a)', size=10)\n",
1312-
"ax_box.text(0.55, 0.97, '(b)', size=10)\n",
1313-
"ax_box.text(0.03, 0.66, '(c)', size=10)\n",
1314-
"ax_box.text(0.55, 0.66, '(d)', size=10)\n",
1315-
"ax_box.text(0.03, 0.30, '(e)', size=10)\n",
1324+
"ax_box.text(0.53, 0.97, '(b)', size=10)\n",
1325+
"ax_box.text(0.03, 0.50, '(c)', size=10)\n",
1326+
"ax_box.text(0.53, 0.50, '(d)', size=10)\n",
13161327
"\n",
13171328
"ax_fe = fig.add_subplot(gs[400:1750, 400:1750])\n",
13181329
"_, _, misc = pyemma.plots.plot_free_energy(\n",
@@ -1343,6 +1354,11 @@
13431354
" for i in range(nstates)])\n",
13441355
"ax_state.set_xticklabels([])\n",
13451356
"ax_state.set_yticklabels([])\n",
1357+
"ax_state.text(0.70, 6.30, '$\\mathcal{S}_1$', size=10)\n",
1358+
"ax_state.text(3.00, 4.00, '$\\mathcal{S}_2$', size=10)\n",
1359+
"ax_state.text(2.50, 2.00, '$\\mathcal{S}_3$', size=10)\n",
1360+
"ax_state.text(5.20, -0.50, '$\\mathcal{S}_4$', size=10)\n",
1361+
"ax_state.text(-0.20, 0.00, '$\\mathcal{S}_5$', size=10)\n",
13461362
"\n",
13471363
"evec_idx = 1\n",
13481364
"ax_eig = fig.add_subplot(gs[2300:3650, 400:1750])\n",
@@ -1358,7 +1374,6 @@
13581374
"misc['cbar'].set_ticks(np.linspace(*misc['cbar'].get_clim(), 3))\n",
13591375
"misc['cbar'].ax.xaxis.set_ticks_position('top')\n",
13601376
"misc['cbar'].ax.xaxis.set_label_position('top')\n",
1361-
"ax_eig.set_xticklabels([])\n",
13621377
"ax_eig.set_xlabel('IC 1')\n",
13631378
"ax_eig.set_ylabel('IC 2')\n",
13641379
"\n",
@@ -1380,46 +1395,6 @@
13801395
"ax_flux.set_xlabel('IC 1')\n",
13811396
"ax_flux.set_yticklabels([])\n",
13821397
"\n",
1383-
"ax_acf = fig.add_subplot(gs[4100:4825, 400:3350])\n",
1384-
"ax_acf.set_title('Trp-1 SASA')\n",
1385-
"ax_acf.plot(eq_time_ml, eq_acf_ml, '-', color='C1', label='ML MSM')\n",
1386-
"ax_acf.plot(\n",
1387-
" eq_time_bayes,\n",
1388-
" eq_acf_bayes,\n",
1389-
" '--',\n",
1390-
" color='C0',\n",
1391-
" label='Bayesian MSM')\n",
1392-
"ax_acf.fill_between(\n",
1393-
" eq_time_bayes,\n",
1394-
" eq_acf_bayes_ci_l[1],\n",
1395-
" eq_acf_bayes_ci_u[1],\n",
1396-
" facecolor='C0',\n",
1397-
" alpha=0.3)\n",
1398-
"ax_acf.semilogx()\n",
1399-
"ax_acf.set_xlim((eq_time_ml[1], eq_time_ml[-1]))\n",
1400-
"ax_acf.set_xticks([])\n",
1401-
"ax_acf.set_ylabel(r'ACF / nm$^4$')\n",
1402-
"ax_acf.legend()\n",
1403-
"\n",
1404-
"ax_rlx = fig.add_subplot(gs[4825:5550, 400:3350])\n",
1405-
"ax_rlx.plot(eq_time_ml, eq_relax_ml, '-', color='C1', label='ML MSM')\n",
1406-
"ax_rlx.plot(\n",
1407-
" eq_time_bayes,\n",
1408-
" eq_relax_bayes,\n",
1409-
" '--',\n",
1410-
" color='C0',\n",
1411-
" label='Bayesian MSM')\n",
1412-
"ax_rlx.fill_between(\n",
1413-
" eq_time_bayes,\n",
1414-
" eq_relax_bayes_CI_l[1],\n",
1415-
" eq_relax_bayes_CI_u[1],\n",
1416-
" facecolor='C0',\n",
1417-
" alpha=0.3)\n",
1418-
"ax_rlx.semilogx()\n",
1419-
"ax_rlx.set_ylabel(r'Average / nm$^2$')\n",
1420-
"ax_rlx.set_xlim((eq_time_ml[1], eq_time_ml[-1]))\n",
1421-
"ax_rlx.set_xlabel(r'time / ns')\n",
1422-
"\n",
14231398
"fig.savefig('data/figure_3.pdf', dpi=300)"
14241399
]
14251400
},
@@ -1469,6 +1444,65 @@
14691444
"fig.savefig('data/figure_4.pdf', dpi=300)"
14701445
]
14711446
},
1447+
{
1448+
"cell_type": "code",
1449+
"execution_count": null,
1450+
"metadata": {},
1451+
"outputs": [],
1452+
"source": [
1453+
"fig = plt.figure(figsize=(3.47, 2.5))\n",
1454+
"gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))\n",
1455+
"gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))\n",
1456+
"gs = plt.GridSpec(gh, gw)\n",
1457+
"gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)\n",
1458+
"\n",
1459+
"ax_box = fig.add_subplot(gs[:, :])\n",
1460+
"ax_box.set_axis_off()\n",
1461+
"ax_box.text(0.01, 0.95, '(a)', size=10)\n",
1462+
"ax_box.text(0.01, 0.50, '(b)', size=10)\n",
1463+
"\n",
1464+
"ax_acf = fig.add_subplot(gs[50:1075, 500:3400])\n",
1465+
"ax_acf.plot(eq_time_ml, eq_acf_ml, '-', color='C1', label='ML MSM')\n",
1466+
"ax_acf.plot(\n",
1467+
" eq_time_bayes,\n",
1468+
" eq_acf_bayes,\n",
1469+
" '--',\n",
1470+
" color='C0',\n",
1471+
" label='Bayesian MSM')\n",
1472+
"ax_acf.fill_between(\n",
1473+
" eq_time_bayes,\n",
1474+
" eq_acf_bayes_ci_l[1],\n",
1475+
" eq_acf_bayes_ci_u[1],\n",
1476+
" facecolor='C0',\n",
1477+
" alpha=0.3)\n",
1478+
"ax_acf.semilogx()\n",
1479+
"ax_acf.set_xlim((eq_time_ml[1], eq_time_ml[-1]))\n",
1480+
"ax_acf.set_xticks([])\n",
1481+
"ax_acf.set_ylabel(r'ACF / nm$^4$')\n",
1482+
"ax_acf.legend()\n",
1483+
"\n",
1484+
"ax_rlx = fig.add_subplot(gs[1125:2150, 500:3400])\n",
1485+
"ax_rlx.plot(eq_time_ml, eq_relax_ml, '-', color='C1', label='ML MSM')\n",
1486+
"ax_rlx.plot(\n",
1487+
" eq_time_bayes,\n",
1488+
" eq_relax_bayes,\n",
1489+
" '--',\n",
1490+
" color='C0',\n",
1491+
" label='Bayesian MSM')\n",
1492+
"ax_rlx.fill_between(\n",
1493+
" eq_time_bayes,\n",
1494+
" eq_relax_bayes_CI_l[1],\n",
1495+
" eq_relax_bayes_CI_u[1],\n",
1496+
" facecolor='C0',\n",
1497+
" alpha=0.3)\n",
1498+
"ax_rlx.semilogx()\n",
1499+
"ax_rlx.set_ylabel(r'Average / nm$^2$')\n",
1500+
"ax_rlx.set_xlim((eq_time_ml[1], eq_time_ml[-1]))\n",
1501+
"ax_rlx.set_xlabel(r'time / ns')\n",
1502+
"\n",
1503+
"fig.savefig('data/figure_5.pdf', dpi=300)"
1504+
]
1505+
},
14721506
{
14731507
"cell_type": "markdown",
14741508
"metadata": {},
@@ -1521,13 +1555,6 @@
15211555
"<a id=\"cite-hmm-baum-welch-alg\"/><sup><a href=#ref-14>[^]</a></sup>Leonard E. Baum and Ted Petrie and George Soules and Norman Weiss. 1970. _A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains_. [URL](http://www.jstor.org/stable/2239727)\n",
15221556
"\n"
15231557
]
1524-
},
1525-
{
1526-
"cell_type": "code",
1527-
"execution_count": null,
1528-
"metadata": {},
1529-
"outputs": [],
1530-
"source": []
15311558
}
15321559
],
15331560
"metadata": {

0 commit comments

Comments
 (0)