|
37 | 37 | "metadata": {}, |
38 | 38 | "source": [ |
39 | 39 | "## Case 1: preprocessed, two-dimensional data (toy model)\n", |
40 | | - "We load the two-dimensional trajectory from an archive using numpy, directly discretize the full space using $k$-means clustering, visualize the marginal and joint distributions of both components as well as the cluster centers, and show the implied timescale (ITS) convergence:" |
| 40 | + "We load the two-dimensional trajectory from an archive using numpy,\n", |
| 41 | + "directly discretize the full space using $k$-means clustering,\n", |
| 42 | + "visualize the marginal and joint distributions of both components as well as the cluster centers,\n", |
| 43 | + "and show the implied timescale (ITS) convergence:" |
41 | 44 | ] |
42 | 45 | }, |
43 | 46 | { |
|
71 | 74 | "cell_type": "markdown", |
72 | 75 | "metadata": {}, |
73 | 76 | "source": [ |
74 | | - "The plots show us the marginal (left panel) and joint distributions along with the cluster centers (middle panel). The implied timescales are converged (right panel). \n", |
| 77 | + "The plots show us the marginal (left panel) and joint distributions along with the cluster centers (middle panel).\n", |
| 78 | + "The implied timescales are converged (right panel). \n", |
75 | 79 | "\n", |
76 | | - "Before we proceed, let's have a look at the implied timescales error bars. They were computed from a Bayesian MSM, as requested by the `errors='bayes'` argument of the `pyemma.msm.its()` function. As mentioned before, Bayesian MSMs incorporate a sample of transition matrices. Target properties such as implied timescales can now simply be computed from the individual matrices. Thereby, the posterior distributions of these properties can be estimated. The ITS plot shows a confidence interval that contains $95\\%$ of the Bayesian samples." |
| 80 | + "Before we proceed, let's have a look at the implied timescales error bars.\n", |
| 81 | + "They were computed from a Bayesian MSM, as requested by the `errors='bayes'` argument of the `pyemma.msm.its()` function.\n", |
| 82 | + "As mentioned before, Bayesian MSMs incorporate a sample of transition matrices.\n", |
| 83 | + "Target properties such as implied timescales can now simply be computed from the individual matrices.\n", |
| 84 | + "Thereby, the posterior distributions of these properties can be estimated.\n", |
| 85 | + "The ITS plot shows a confidence interval that contains $95\\%$ of the Bayesian samples." |
77 | 86 | ] |
78 | 87 | }, |
79 | 88 | { |
|
89 | 98 | "cell_type": "markdown", |
90 | 99 | "metadata": {}, |
91 | 100 | "source": [ |
92 | | - "For any PyEMMA method that derives target properties from MSMs, sample mean and confidence intervals (as defined by the function argument above) are directly accessible with `sample_mean()` and `sample_conf()`. Further, `sample_std()` is available for computing the standard deviation. In the more general case, it might be interesting to extract the full sample of a function evaluation with `sample_f()`. The syntax is equivalent for all those functions." |
| 101 | + "For any PyEMMA method that derives target properties from MSMs, sample mean and confidence intervals (as defined by the function argument above) are directly accessible with `sample_mean()` and `sample_conf()`.\n", |
| 102 | + "Further, `sample_std()` is available for computing the standard deviation.\n", |
| 103 | + "In the more general case, it might be interesting to extract the full sample of a function evaluation with `sample_f()`.\n", |
| 104 | + "The syntax is equivalent for all those functions." |
93 | 105 | ] |
94 | 106 | }, |
95 | 107 | { |
|
111 | 123 | "source": [ |
112 | 124 | "Please note that sample mean and maximum likelihood estimates are not identical and generally do not provide numerically identical results.\n", |
113 | 125 | "\n", |
114 | | - "Now, for the sake of simplicity we proceed with the analysis of a maximum likelihood MSM. We estimate it at lag time $1$ step." |
| 126 | + "Now, for the sake of simplicity we proceed with the analysis of a maximum likelihood MSM.\n", |
| 127 | + "We estimate it at lag time $1$ step..." |
115 | 128 | ] |
116 | 129 | }, |
117 | 130 | { |
|
127 | 140 | "cell_type": "markdown", |
128 | 141 | "metadata": {}, |
129 | 142 | "source": [ |
130 | | - "and check for disconnectivity. The MSM is constructed on the largest set of discrete states that are (reversibly) connected. The `active_state_fraction` and `active_count_fraction` show us the fraction of discrete states and transition counts from our data which are part of this largest set and, thus, used for the model:" |
| 143 | + "... and check for disconnectivity.\n", |
| 144 | + "The MSM is constructed on the largest set of discrete states that are (reversibly) connected.\n", |
| 145 | + "The `active_state_fraction` and `active_count_fraction` show us the fraction of discrete states and transition counts from our data which are part of this largest set and, thus, used for the model:" |
131 | 146 | ] |
132 | 147 | }, |
133 | 148 | { |
|
146 | 161 | "source": [ |
147 | 162 | "The fraction is, in both cases, $1$ and, thus, we have no disconnected states (which we would have to exclude from our analysis).\n", |
148 | 163 | "\n", |
149 | | - "If there were any disconnectivities in our data (fractions $<1$), we could access the indices of the **active states** (members of the largest connected set) via the `active_set` attribute:" |
| 164 | + "If there were any disconnectivities in our data (fractions $<1$),\n", |
| 165 | + "we could access the indices of the **active states** (members of the largest connected set) via the `active_set` attribute:" |
150 | 166 | ] |
151 | 167 | }, |
152 | 168 | { |
|
162 | 178 | "cell_type": "markdown", |
163 | 179 | "metadata": {}, |
164 | 180 | "source": [ |
165 | | - "With this potential issue out of the way, we can extract our first (stationary/thermodynamic) property, the `stationary_distribution` or, as a shortcut, `pi`:" |
| 181 | + "With this potential issue out of the way, we can extract our first (stationary/thermodynamic) property,\n", |
| 182 | + "the `stationary_distribution` or, as a shortcut, `pi`:" |
166 | 183 | ] |
167 | 184 | }, |
168 | 185 | { |
|
216 | 233 | "cell_type": "markdown", |
217 | 234 | "metadata": {}, |
218 | 235 | "source": [ |
219 | | - "The stationary distribution can also be used to correct the `pyemma.plots.plot_free_energy()` function. This might be necessary if the data points are not sampled from global equilibrium.\n", |
| 236 | + "The stationary distribution can also be used to correct the `pyemma.plots.plot_free_energy()` function.\n", |
| 237 | + "This might be necessary if the data points are not sampled from global equilibrium.\n", |
220 | 238 | "\n", |
221 | 239 | "In this case, we assign the weight of the corresponding discrete state to each data point and pass this information to the plotting function via its `weights` parameter:" |
222 | 240 | ] |
|
229 | 247 | "source": [ |
230 | 248 | "fig, ax, misc = pyemma.plots.plot_free_energy(\n", |
231 | 249 | " *data.T,\n", |
232 | | - " weights=msm.pi[cluster.dtrajs[0]],\n", |
| 250 | + " weights=np.concatenate(msm.trajectory_weights()),\n", |
233 | 251 | " legacy=False)\n", |
234 | 252 | "ax.set_xlabel('$x$')\n", |
235 | 253 | "ax.set_ylabel('$y$')\n", |
|
336 | 354 | "metadata": {}, |
337 | 355 | "source": [ |
338 | 356 | "## Case 2: low-dimensional molecular dynamics data (alanine dipeptide)\n", |
339 | | - "We fetch the alanine dipeptide data set, load the backbone torsions into memory, directly discretize the full space using $k$-means clustering, visualize the margial and joint distributions of both components as well as the cluster centers, and show the ITS convergence to help selecting a suitable lag time:" |
| 357 | + "\n", |
| 358 | + "We fetch the alanine dipeptide data set, load the backbone torsions into memory,\n", |
| 359 | + "directly discretize the full space using $k$-means clustering,\n", |
| 360 | + "visualize the margial and joint distributions of both components as well as the cluster centers,\n", |
| 361 | + "and show the ITS convergence to help selecting a suitable lag time:" |
340 | 362 | ] |
341 | 363 | }, |
342 | 364 | { |
|
376 | 398 | "cell_type": "markdown", |
377 | 399 | "metadata": {}, |
378 | 400 | "source": [ |
379 | | - "The plots show us the marginal (left panel) and joint distributions along with the cluster centers (middle panel). The implied timescales are converged (right panel). \n", |
| 401 | + "The plots show us the marginal (left panel) and joint distributions along with the cluster centers (middle panel).\n", |
| 402 | + "The implied timescales are converged (right panel). \n", |
380 | 403 | "\n", |
381 | 404 | "We then estimate an MSM at lag time $10$ ps and visualize the stationary distribution by coloring all data points according to the stationary weight of the discrete state they belong to:" |
382 | 405 | ] |
|
464 | 487 | "source": [ |
465 | 488 | "We note that four metastable states are a reasonable choice for our MSM.\n", |
466 | 489 | "\n", |
467 | | - "In general, the number of metastable states is a modeler's choice; it is adjusted to map the kinetics to be modeled. In the current example, increasing the resolution with a higher number of metastable states or resolving only the slowest process between $2$ states would be possible. However, the number of states is not arbitrary as the observed processes in metastable state space need not be Markovian in general. A failed Chapman-Kolmogorov test can thus also hint to a bad choice of the metastable state number.\n", |
| 490 | + "In general, the number of metastable states is a modeler's choice; it is adjusted to map the kinetics to be modeled.\n", |
| 491 | + "In the current example, increasing the resolution with a higher number of metastable states or resolving only the slowest process between $2$ states would be possible.\n", |
| 492 | + "However, the number of states is not arbitrary as the observed processes in metastable state space need not be Markovian in general.\n", |
| 493 | + "A failed Chapman-Kolmogorov test can thus also hint to a bad choice of the metastable state number.\n", |
468 | 494 | "\n", |
469 | 495 | "In order to perform further analysis, we save the model to disk:" |
470 | 496 | ] |
|
738 | 764 | "metadata": {}, |
739 | 765 | "source": [ |
740 | 766 | "#### Exercise 5\n", |
741 | | - "Save the MSM, Bayesian MSM and Cluster objects to the same file as before. Use the model names `ala2tica_msm`, `ala2tica_bayesian_msm` and `ala2tica_cluster`, respectively. Further, include the TICA object with model name `ala2tica_tica`." |
| 767 | + "Save the MSM, Bayesian MSM and Cluster objects to the same file as before.\n", |
| 768 | + "Use the model names `ala2tica_msm`, `ala2tica_bayesian_msm` and `ala2tica_cluster`, respectively.\n", |
| 769 | + "Further, include the TICA object with model name `ala2tica_tica`." |
742 | 770 | ] |
743 | 771 | }, |
744 | 772 | { |
|
0 commit comments