You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We present the results obtained in this notebook, thereby providing an example of how results generated using PyEMMA can be integrated into research publications.
242
242
The figures that will be displayed in the following are created in the showcase notebook (00) and can be easily reproduced.
243
243
244
-
In the workflow there are multiple hyper parameters to be chosen by the modeler. In our approach we try to optimize a
245
-
parameter at the current stage of the pipeline and continue to the next stage, once a good choice was found. This
246
-
requires the researcher to understand the consequences of non optimal deciscions for the final result. For instance
247
-
a non converged clustering could result in lumping states together which should be seperated from each other.
248
-
249
-
There also exists automatized approaches to optimize all hyper parameters of the pipeline using a cross-validation
250
-
scheme \cite{husic-optimized}. In these approaches the researcher is still required to understand modeling choices like
251
-
sane ranges for parameters to avoid wasting computational time, which is spent to explore meaningless areas of the
252
-
hyperparameter space.
253
-
In the sequential approach, one can fall back to the previous step, if one finds a bad result at any following stage.
254
-
This greatly reduces the computational effort and leads to a better understanding of the final model.
255
-
256
-
%However one will not be able to find a good model based on partially bad modeling choices. E.g. a hidden Markov state
257
-
%model could partially correct bad clusterings, but
244
+
Note that the modeler has to select hyper-parameters at most stages throughout the workflow.
245
+
This selection must be done carefully as poor choices make it hard, or even impossible, to build a good MSM.
246
+
247
+
While there exist automated schemes~\cite{husic-optimized} for cross-validated optimization in the full hyper-parameter
248
+
space, we chose to adopt a sequential approach where only the hyper-parameters of the current stage are optimized. This
249
+
approach is not only computationally cheaper but allows us to discuss the significance of the necessary modeling
Copy file name to clipboardExpand all lines: notebooks/00-pentapeptide-showcase.ipynb
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -231,13 +231,13 @@
231
231
"\n",
232
232
"### TICA\n",
233
233
"\n",
234
-
"The goal of the next step is to find a function that maps the usually high-dimensional input space into some lowerdimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
234
+
"The goal of the next step is to find a function that maps the usually high-dimensional input space into some lower-dimensional space that captures the important dynamics. The recommended way of doing so is a time-lagged independent component analysis (TICA), <a id=\"ref-4\" href=\"#cite-tica2\">molgedey-94</a>, <a id=\"ref-5\" href=\"#cite-tica\">perez-hernandez-13</a>. We perform TICA (with kinetic map scaling) using the lag time obtained from the VAMP-2 score.\n",
235
235
"\n",
236
-
"By using the tica() functions default parameters, we will use as many dimensions in order to preserve $95\\%$ of the kinetic variance. By default, tica also applies a kinetic map scaling.\n",
236
+
"By using the tica() function's default parameters, we will use as many dimensions in order to preserve $95\\%$ of the kinetic variance. By default, tica() also applies a kinetic map scaling.\n",
237
237
"This scaling ensures that Euclidean distances in the projected space approximate kinetic distances,\n",
238
238
"which is beneficial during the subsequent discretization.\n",
239
239
"\n",
240
-
"Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering."
240
+
"Please note that the general `PyEMMA` API is consistant for all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torsions_data)`), the estimation is done and an estimator instance returned (`tica`); this object contains all the information about the specific transformation. For small systems, we can access the transformed data by calling `tica.get_output()`. For large systems, we recommend to pass the `tica` object itself into the subsequent stages, e.g., clustering, in order to avoid loading all transformed data into memory."
0 commit comments