|
1 | 1 | """ |
2 | | -================================================== |
3 | | -Using histograms to plot a cumulative distribution |
4 | | -================================================== |
5 | | -
|
6 | | -This shows how to plot a cumulative, normalized histogram as a |
7 | | -step function in order to visualize the empirical cumulative |
8 | | -distribution function (CDF) of a sample. We also show the theoretical CDF. |
9 | | -
|
10 | | -A couple of other options to the ``hist`` function are demonstrated. Namely, we |
11 | | -use the *density* parameter to normalize the histogram and a couple of different |
12 | | -options to the *cumulative* parameter. The *density* parameter takes a boolean |
13 | | -value. When ``True``, the bin heights are scaled such that the total area of |
14 | | -the histogram is 1. The *cumulative* keyword argument is a little more nuanced. |
15 | | -Like *density*, you can pass it True or False, but you can also pass it -1 to |
16 | | -reverse the distribution. |
17 | | -
|
18 | | -Since we're showing a normalized and cumulative histogram, these curves |
19 | | -are effectively the cumulative distribution functions (CDFs) of the |
20 | | -samples. In engineering, empirical CDFs are sometimes called |
21 | | -"non-exceedance" curves. In other words, you can look at the |
22 | | -y-value for a given-x-value to get the probability of and observation |
23 | | -from the sample not exceeding that x-value. For example, the value of |
24 | | -225 on the x-axis corresponds to about 0.85 on the y-axis, so there's an |
25 | | -85% chance that an observation in the sample does not exceed 225. |
26 | | -Conversely, setting, ``cumulative`` to -1 as is done in the |
27 | | -last series for this example, creates an "exceedance" curve. |
28 | | -
|
29 | | -Selecting different bin counts and sizes can significantly affect the |
30 | | -shape of a histogram. The Astropy docs have a great section on how to |
31 | | -select these parameters: |
32 | | -http://docs.astropy.org/en/stable/visualization/histogram.html |
33 | | -
|
| 2 | +================================= |
| 3 | +Plotting cumulative distributions |
| 4 | +================================= |
| 5 | +
|
| 6 | +This example shows how to plot the empirical cumulative distribution function |
| 7 | +(ECDF) of a sample. We also show the theoretical CDF. |
| 8 | +
|
| 9 | +In engineering, ECDFs are sometimes called "non-exceedance" curves: the y-value |
| 10 | +for a given x-value gives probability that an observation from the sample is |
| 11 | +below that x-value. For example, the value of 220 on the x-axis corresponds to |
| 12 | +about 0.80 on the y-axis, so there is an 80% chance that an observation in the |
| 13 | +sample does not exceed 220. Conversely, the empirical *complementary* |
| 14 | +cumulative distribution function (the ECCDF, or "exceedance" curve) shows the |
| 15 | +probability y that an observation from the sample is above a value x. |
| 16 | +
|
| 17 | +A direct method to plot ECDFs is `.Axes.ecdf`. Passing ``complementary=True`` |
| 18 | +results in an ECCDF instead. |
| 19 | +
|
| 20 | +Alternatively, one can use ``ax.hist(data, density=True, cumulative=True)`` to |
| 21 | +first bin the data, as if plotting a histogram, and then compute and plot the |
| 22 | +cumulative sums of the frequencies of entries in each bin. Here, to plot the |
| 23 | +ECCDF, pass ``cumulative=-1``. Note that this approach results in an |
| 24 | +approximation of the E(C)CDF, whereas `.Axes.ecdf` is exact. |
34 | 25 | """ |
35 | 26 |
|
36 | 27 | import matplotlib.pyplot as plt |
|
40 | 31 |
|
41 | 32 | mu = 200 |
42 | 33 | sigma = 25 |
43 | | -n_bins = 50 |
44 | | -x = np.random.normal(mu, sigma, size=100) |
| 34 | +n_bins = 25 |
| 35 | +data = np.random.normal(mu, sigma, size=100) |
45 | 36 |
|
46 | | -fig, ax = plt.subplots(figsize=(8, 4)) |
| 37 | +fig = plt.figure(figsize=(9, 4), layout="constrained") |
| 38 | +axs = fig.subplots(1, 2, sharex=True, sharey=True) |
47 | 39 |
|
48 | | -# plot the cumulative histogram |
49 | | -n, bins, patches = ax.hist(x, n_bins, density=True, histtype='step', |
50 | | - cumulative=True, label='Empirical') |
51 | | - |
52 | | -# Add a line showing the expected distribution. |
| 40 | +# Cumulative distributions. |
| 41 | +axs[0].ecdf(data, label="CDF") |
| 42 | +n, bins, patches = axs[0].hist(data, n_bins, density=True, histtype="step", |
| 43 | + cumulative=True, label="Cumulative histogram") |
| 44 | +x = np.linspace(data.min(), data.max()) |
53 | 45 | y = ((1 / (np.sqrt(2 * np.pi) * sigma)) * |
54 | | - np.exp(-0.5 * (1 / sigma * (bins - mu))**2)) |
| 46 | + np.exp(-0.5 * (1 / sigma * (x - mu))**2)) |
55 | 47 | y = y.cumsum() |
56 | 48 | y /= y[-1] |
57 | | - |
58 | | -ax.plot(bins, y, 'k--', linewidth=1.5, label='Theoretical') |
59 | | - |
60 | | -# Overlay a reversed cumulative histogram. |
61 | | -ax.hist(x, bins=bins, density=True, histtype='step', cumulative=-1, |
62 | | - label='Reversed emp.') |
63 | | - |
64 | | -# tidy up the figure |
65 | | -ax.grid(True) |
66 | | -ax.legend(loc='right') |
67 | | -ax.set_title('Cumulative step histograms') |
68 | | -ax.set_xlabel('Annual rainfall (mm)') |
69 | | -ax.set_ylabel('Likelihood of occurrence') |
| 49 | +axs[0].plot(x, y, "k--", linewidth=1.5, label="Theory") |
| 50 | + |
| 51 | +# Complementary cumulative distributions. |
| 52 | +axs[1].ecdf(data, complementary=True, label="CCDF") |
| 53 | +axs[1].hist(data, bins=bins, density=True, histtype="step", cumulative=-1, |
| 54 | + label="Reversed cumulative histogram") |
| 55 | +axs[1].plot(x, 1 - y, "k--", linewidth=1.5, label="Theory") |
| 56 | + |
| 57 | +# Label the figure. |
| 58 | +fig.suptitle("Cumulative distributions") |
| 59 | +for ax in axs: |
| 60 | + ax.grid(True) |
| 61 | + ax.legend() |
| 62 | + ax.set_xlabel("Annual rainfall (mm)") |
| 63 | + ax.set_ylabel("Probability of occurrence") |
| 64 | + ax.label_outer() |
70 | 65 |
|
71 | 66 | plt.show() |
72 | 67 |
|
|
78 | 73 | # in this example: |
79 | 74 | # |
80 | 75 | # - `matplotlib.axes.Axes.hist` / `matplotlib.pyplot.hist` |
| 76 | +# - `matplotlib.axes.Axes.ecdf` / `matplotlib.pyplot.ecdf` |
0 commit comments