req changes

marscher · marscher · commit f6a82f701921 · 2018-09-05T14:17:34.000+02:00
diff --git a/manuscript/literature.bib b/manuscript/literature.bib
@@ -681,7 +681,7 @@ @article{banushkina_nonparametric_2015
 @article{husic-optimized,
   title={Optimized parameter selection reveals trends in Markov state models for protein folding},
   author={Husic, Brooke E and McGibbon, Robert T and Sultan, Mohammad M and Pande, Vijay S},
-  journal={The Journal of chemical physics},
+  journal={J. Chem. Phys.},
   volume={145},
   number={19},
   pages={194103},
diff --git a/manuscript/manuscript.tex b/manuscript/manuscript.tex
@@ -241,20 +241,13 @@ \subsection{The PyEMMA workflow}
 We present the results obtained in this notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
 The figures that will be displayed in the following are created in the showcase notebook (00) and can be easily reproduced.
 
-In the workflow there are multiple hyper parameters to be chosen by the modeler. In our approach we try to optimize a 
-parameter at the current stage of the pipeline and continue to the next stage, once a good choice was found. This 
-requires the researcher to understand the consequences of non optimal deciscions for the final result. For instance
-a non converged clustering could result in lumping states together which should be seperated from each other.
-
-There also exists automatized approaches to optimize all hyper parameters of the pipeline using a cross-validation 
-scheme \cite{husic-optimized}. In these approaches the researcher is still required to understand modeling choices like 
-sane ranges for parameters to avoid wasting computational time, which is spent to explore meaningless areas of the 
-hyperparameter space.
-In the sequential approach, one can fall back to the previous step, if one finds a bad result at any following stage. 
-This greatly reduces the computational effort and leads to a better understanding of the final model.
-
-%However one will not be able to find a good model based on partially bad modeling choices. E.g. a hidden Markov state 
-%model could partially correct bad clusterings, but 
+Note that the modeler has to select hyper-parameters at most stages throughout the workflow.
+This selection must be done carefully as poor choices make it hard, or even impossible, to build a good MSM.
+
+While there exist automated schemes~\cite{husic-optimized} for cross-validated optimization in the full hyper-parameter 
+space, we chose to adopt a sequential approach where only the hyper-parameters of the current stage are optimized. This 
+approach is not only computationally cheaper but allows us to discuss the significance of the necessary modeling 
+choices.
 
 \subsection{Feature selection}
 
diff --git a/notebooks/00-pentapeptide-showcase.ipynb b/notebooks/00-pentapeptide-showcase.ipynb
@@ -231,13 +231,13 @@
     "\n",
     "### TICA\n",
     "\n",
-    "The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
+    "The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower-dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
     "\n",
-    "By using the tica() functions default parameters, we will use as many dimensions in order to preserve $95\\%$ of the kinetic variance. By default, tica also applies a kinetic map scaling.\n",
+    "By using the tica() function's default parameters, we will use as many dimensions in order to preserve $95\\%$ of the kinetic variance. By default, tica() also applies a kinetic map scaling.\n",
     "This scaling ensures that Euclidean distances in the projected space approximate kinetic distances,\n",
     "which is beneficial during the subsequent discretization.\n",
     "\n",
-    "Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering."
+    "Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering, in order to avoid loading all transformed data into memory."
    ]
   },
   {