@@ -13,8 +13,8 @@ keypoints:
1313---
1414
1515## Visualizing data
16- The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers," and
17- the best way to develop insight is often to visualize data. Visualization deserves an entire
16+ The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers,"
17+ and the best way to develop insight is often to visualize data. Visualization deserves an entire
1818lecture of its own, but we can explore a few features of Python's ` matplotlib ` library here. While
1919there is no official plotting library, ` matplotlib ` is the _ de facto_ standard. First, we will
2020import the ` pyplot ` module from ` matplotlib ` and use two of its functions to create and display a
@@ -30,9 +30,19 @@ matplotlib.pyplot.show()
3030![ Heat map representing the ` data ` variable. Each cell is colored by value along a color gradient
3131from blue to yellow.] ( ../fig/inflammation-01-imshow.svg )
3232
33- Blue pixels in this heat map represent low values, while yellow pixels represent high values. As we
34- can see, inflammation rises and falls over a 40-day period. Let's take a look at the average
35- inflammation over time:
33+ Each row in the heat map corresponds to a patient in the clinical trial dataset, and each column
34+ corresponds to a day in the dataset. Blue pixels in this heat map represent low values, while
35+ yellow pixels represent high values. As we can see, the general number of inflammation flare-ups
36+ for the patients rises and falls over a 40-day period.
37+
38+ So far so good as this is in line with our knowledge of the clinical trial and Dr. Maverick's
39+ claims:
40+
41+ * the patients take their medication once their inflammation flare-ups begin
42+ * it takes around 3 weeks for the medication to take effect and begin reducing flare-ups
43+ * and flare-ups appear to drop to zero by the end of the clinical trial.
44+
45+ Now let's take a look at the average inflammation over time:
3646
3747~~~
3848ave_inflammation = numpy.mean(data, axis=0)
@@ -45,8 +55,9 @@ matplotlib.pyplot.show()
4555
4656Here, we have put the average inflammation per day across all patients in the variable
4757` ave_inflammation ` , then asked ` matplotlib.pyplot ` to create and display a line graph of those
48- values. The result is a roughly linear rise and fall, which is suspicious: we might instead expect
49- a sharper rise and slower fall. Let's have a look at two other statistics:
58+ values. The result is a reasonably linear rise and fall, in line with Dr. Maverick's claim that
59+ the medication takes 3 weeks to take effect. But a good data scientist doesn't just consider the
60+ average of a dataset, so let's have a look at two other statistics:
5061
5162~~~
5263max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
@@ -64,18 +75,18 @@ matplotlib.pyplot.show()
6475
6576![ A line graph showing the minimum inflammation across all patients over a 40-day period.] ( ../fig/inflammation-01-minimum.svg )
6677
67- The maximum value rises and falls smoothly , while the minimum seems to be a step function. Neither
68- trend seems particularly likely, so either there's a mistake in our calculations or something is
69- wrong with our data. This insight would have been difficult to reach by examining the numbers
70- themselves without visualization tools.
78+ The maximum value rises and falls linearly , while the minimum seems to be a step function.
79+ Neither trend seems particularly likely, so either there's a mistake in our calculations or
80+ something is wrong with our data. This insight would have been difficult to reach by examining
81+ the numbers themselves without visualization tools.
7182
7283### Grouping plots
7384You can group similar plots in a single figure using subplots.
7485This script below uses a number of new commands. The function ` matplotlib.pyplot.figure() `
7586creates a space into which we will place all of our plots. The parameter ` figsize `
7687tells Python how big to make this space. Each subplot is placed into the figure using
77- its ` add_subplot ` [ method] ({{ page.root }}/reference.html#method). The ` add_subplot ` method takes 3
78- parameters. The first denotes how many total rows of subplots there are, the second parameter
88+ its ` add_subplot ` [ method] ({{ page.root }}/reference.html#method). The ` add_subplot ` method takes
89+ 3 parameters. The first denotes how many total rows of subplots there are, the second parameter
7990refers to the total number of subplot columns, and the final parameter denotes which subplot
8091your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a
8192different variable (` axes1 ` , ` axes2 ` , ` axes3 ` ). Once a subplot is created, the axes can
0 commit comments