Skip to content

Commit f6a82f7

Browse files
committed
req changes
1 parent 9c1cb2b commit f6a82f7

File tree

3 files changed

+11
-18
lines changed

3 files changed

+11
-18
lines changed

manuscript/literature.bib

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -681,7 +681,7 @@ @article{banushkina_nonparametric_2015
681681
@article{husic-optimized,
682682
title={Optimized parameter selection reveals trends in Markov state models for protein folding},
683683
author={Husic, Brooke E and McGibbon, Robert T and Sultan, Mohammad M and Pande, Vijay S},
684-
journal={The Journal of chemical physics},
684+
journal={J. Chem. Phys.},
685685
volume={145},
686686
number={19},
687687
pages={194103},

manuscript/manuscript.tex

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -241,20 +241,13 @@ \subsection{The PyEMMA workflow}
241241
We present the results obtained in this notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
242242
The figures that will be displayed in the following are created in the showcase notebook (00) and can be easily reproduced.
243243

244-
In the workflow there are multiple hyper parameters to be chosen by the modeler. In our approach we try to optimize a
245-
parameter at the current stage of the pipeline and continue to the next stage, once a good choice was found. This
246-
requires the researcher to understand the consequences of non optimal deciscions for the final result. For instance
247-
a non converged clustering could result in lumping states together which should be seperated from each other.
248-
249-
There also exists automatized approaches to optimize all hyper parameters of the pipeline using a cross-validation
250-
scheme \cite{husic-optimized}. In these approaches the researcher is still required to understand modeling choices like
251-
sane ranges for parameters to avoid wasting computational time, which is spent to explore meaningless areas of the
252-
hyperparameter space.
253-
In the sequential approach, one can fall back to the previous step, if one finds a bad result at any following stage.
254-
This greatly reduces the computational effort and leads to a better understanding of the final model.
255-
256-
%However one will not be able to find a good model based on partially bad modeling choices. E.g. a hidden Markov state
257-
%model could partially correct bad clusterings, but
244+
Note that the modeler has to select hyper-parameters at most stages throughout the workflow.
245+
This selection must be done carefully as poor choices make it hard, or even impossible, to build a good MSM.
246+
247+
While there exist automated schemes~\cite{husic-optimized} for cross-validated optimization in the full hyper-parameter
248+
space, we chose to adopt a sequential approach where only the hyper-parameters of the current stage are optimized. This
249+
approach is not only computationally cheaper but allows us to discuss the significance of the necessary modeling
250+
choices.
258251

259252
\subsection{Feature selection}
260253

notebooks/00-pentapeptide-showcase.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -231,13 +231,13 @@
231231
"\n",
232232
"### TICA\n",
233233
"\n",
234-
"The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
234+
"The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower-dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
235235
"\n",
236-
"By using the tica() functions default parameters, we will use as many dimensions in order to preserve $95\\%$ of the kinetic variance. By default, tica also applies a kinetic map scaling.\n",
236+
"By using the tica() function's default parameters, we will use as many dimensions in order to preserve $95\\%$ of the kinetic variance. By default, tica() also applies a kinetic map scaling.\n",
237237
"This scaling ensures that Euclidean distances in the projected space approximate kinetic distances,\n",
238238
"which is beneficial during the subsequent discretization.\n",
239239
"\n",
240-
"Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering."
240+
"Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering, in order to avoid loading all transformed data into memory."
241241
]
242242
},
243243
{

0 commit comments

Comments
 (0)