Skip to content

Commit f83b31c

Browse files
authored
Merge pull request #156 from marscher/review
Review [ci skip]
2 parents d1111af + 0bb4a7a commit f83b31c

File tree

5 files changed

+35
-9
lines changed

5 files changed

+35
-9
lines changed

manuscript/literature.bib

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -677,3 +677,14 @@ @article{banushkina_nonparametric_2015
677677
year = {2015},
678678
pages = {184108}
679679
}
680+
681+
@article{husic-optimized,
682+
title={Optimized parameter selection reveals trends in Markov state models for protein folding},
683+
author={Husic, Brooke E and McGibbon, Robert T and Sultan, Mohammad M and Pande, Vijay S},
684+
journal={J. Chem. Phys.},
685+
volume={145},
686+
number={19},
687+
pages={194103},
688+
year={2016},
689+
publisher={AIP Publishing}
690+
}

manuscript/manuscript.tex

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,14 @@ \subsection{The PyEMMA workflow}
241241
We present the results obtained in this notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
242242
The figures that will be displayed in the following are created in the showcase notebook (00) and can be easily reproduced.
243243

244+
Note that the modeler has to select hyper-parameters at most stages throughout the workflow.
245+
This selection must be done carefully as poor choices make it hard, or even impossible, to build a good MSM.
246+
247+
While there exist automated schemes~\cite{husic-optimized} for cross-validated optimization in the full hyper-parameter
248+
space, we chose to adopt a sequential approach where only the hyper-parameters of the current stage are optimized. This
249+
approach is not only computationally cheaper but allows us to discuss the significance of the necessary modeling
250+
choices.
251+
244252
\subsection{Feature selection}
245253

246254
\begin{figure}

notebooks/00-pentapeptide-showcase.ipynb

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -231,9 +231,13 @@
231231
"\n",
232232
"### TICA\n",
233233
"\n",
234-
"The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score. \n",
234+
"The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower-dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
235235
"\n",
236-
"Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering."
236+
"By using the tica() function's default parameters, we will use as many dimensions as necessary in order to preserve $95\\%$ of the kinetic variance. By default, tica() also applies a kinetic map scaling.\n",
237+
"This scaling ensures that Euclidean distances in the projected space approximate kinetic distances,\n",
238+
"which is beneficial during the subsequent discretization.\n",
239+
"\n",
240+
"Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering, in order to avoid loading all transformed data into memory."
237241
]
238242
},
239243
{
@@ -1636,7 +1640,7 @@
16361640
"name": "python",
16371641
"nbconvert_exporter": "python",
16381642
"pygments_lexer": "ipython3",
1639-
"version": "3.6.3"
1643+
"version": "3.6.6"
16401644
},
16411645
"toc": {
16421646
"base_numbering": 1,

notebooks/01-data-io-and-featurization.ipynb

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@
279279
" data[:-1], dim=2).score(\n",
280280
" test_data=data[-1:],\n",
281281
" score_method='VAMP2')\n",
282-
"print('VAMP2-score: {:f}'.format(score_phi_psi))"
282+
"print('VAMP2-score backbone torsions: {:f}'.format(score_phi_psi))"
283283
]
284284
},
285285
{
@@ -369,14 +369,15 @@
369369
" data[:-1], dim=2).score(\n",
370370
" test_data=data[-1:],\n",
371371
" score_method='VAMP2')\n",
372-
"print('VAMP2-score: {:f}'.format(score_heavy_atoms))"
372+
"print('VAMP2-score backbone torsions: {:f}'.format(score_phi_psi))\n",
373+
"print('VAMP2-score xyz: {:f}'.format(score_heavy_atoms))"
373374
]
374375
},
375376
{
376377
"cell_type": "markdown",
377378
"metadata": {},
378379
"source": [
379-
"As we see, the score for the heavy atom positions is much higher as the one for the $\\phi/\\psi$ torsion angles. We will learn later what this means.\n",
380+
"As we see, the score for the heavy atom positions is much higher as the one for the $\\phi/\\psi$ torsion angles. The feature with a higher score should be favored for further analysis, because it means that this feature contains more information about slow processes. If you are already digging deeper into your system of interest, you can of course restrict the analysis to a set of features you already know describes your process of interest, regardless of its VAMP score.\n",
380381
"\n",
381382
"Another featurization that is interesting especially for proteins is heavy atom distances:"
382383
]
@@ -456,7 +457,7 @@
456457
"cell_type": "markdown",
457458
"metadata": {},
458459
"source": [
459-
"This function allows to stream the data and work on chunks instead of the full set. Most of the functions in the `coordinates` sub-package accept data in memory as well as streamed feature readers. However, some plotting functions require the data to be in memory. To load a (strided) subset into memory, we can use the `get_output()` method with a stride parameter:"
460+
"This function allows to stream the data and work on chunks instead of the full set. Most of the functions in the `coordinates` sub-package accept data in memory as well as streamed feature readers. However, some plotting functions require the data to be in memory. To load a (sub-sampled) subset into memory, we can use the `get_output()` method with a stride parameter:"
460461
]
461462
},
462463
{

pyemma_tutorials/cli.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,10 @@ def main():
2020

2121
_nglview_pip_installed_workaround()
2222

23-
argv = ['--config=%s' % notebook_cfg, '--config=%s' % notebook_cfg_json]
24-
print('arguments:', argv)
23+
# extend passed arguments with our config files
24+
import sys
25+
argv = sys.argv[1:] + ['--config=%s' % notebook_cfg, '--config=%s' % notebook_cfg_json]
26+
print('invoking notebook server with arguments:', argv)
2527
main_(argv=argv)
2628

2729

0 commit comments

Comments
 (0)