Skip to content

Commit 2002391

Browse files
committed
Minor changes to Narrative text
1 parent f09c4b8 commit 2002391

File tree

5 files changed

+28
-23
lines changed

5 files changed

+28
-23
lines changed

_episodes/03-matplotlib.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,13 @@ corresponds to a day in the dataset. Blue pixels in this heat map represent low
3535
pixels represent high values. As we can see, the general number of inflammation flare-ups for the patients
3636
rises and falls over a 40-day period.
3737

38-
So far so good, this is in line with our knowledge of the clinical trial and Dr. Maverick's claims:
39-
the patients take their medication once their inflammation flare-ups begin; it takes around 3 weeks
40-
for the medication to take effect and begin reducing flare-ups; and flare-ups appear to drop to zero
41-
by the end of the clinical trial. Now let's take a look at the average inflammation over time:
38+
So far so good as this is in line with our knowledge of the clinical trial and Dr. Maverick's claims:
39+
40+
* the patients take their medication once their inflammation flare-ups begin
41+
* it takes around 3 weeks for the medication to take effect and begin reducing flare-ups
42+
* and flare-ups appear to drop to zero by the end of the clinical trial.
43+
44+
Now let's take a look at the average inflammation over time:
4245

4346
~~~
4447
ave_inflammation = numpy.mean(data, axis=0)
@@ -51,7 +54,7 @@ matplotlib.pyplot.show()
5154

5255
Here, we have put the average inflammation per day across all patients in the variable
5356
`ave_inflammation`, then asked `matplotlib.pyplot` to create and display a line graph of those
54-
values. The result is a roughly linear rise and fall, in line with Dr. Maverick's claim that the
57+
values. The result is a reasonably linear rise and fall, in line with Dr. Maverick's claim that the
5558
medication takes 3 weeks to take effect. But a good data scientist doesn't just consider the
5659
average of a dataset, so let's have a look at two other statistics:
5760

@@ -71,10 +74,10 @@ matplotlib.pyplot.show()
7174

7275
![A line graph showing the minimum inflammation across all patients over a 40-day period.](../fig/inflammation-01-minimum.svg)
7376

74-
The maximum value rises and falls linearly, while the minimum seems to be a step function.
75-
Suspicious... neither trend seems particularly likely, so either there's a mistake in our
76-
calculations or something is wrong with our data. This insight would have been difficult
77-
to reach by examining the numbers themselves without visualization tools.
77+
The maximum value rises and falls linearly, while the minimum seems to be a step function.
78+
Neither trend seems particularly likely, so either there's a mistake in our calculations or
79+
something is wrong with our data. This insight would have been difficult to reach by examining
80+
the numbers themselves without visualization tools.
7881

7982
### Grouping plots
8083
You can group similar plots in a single figure using subplots.

_episodes/04-lists.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ list[2:9]), in the same way as strings and arrays."
2121
---
2222

2323
In the previous episode, we analyzed a single file of clinical trial inflammation data. However,
24-
after finding some peculiar, and potentially suspicious, trends in the trial data we ask
24+
after finding some peculiar and potentially suspicious trends in the trial data we ask
2525
Dr. Maverick if they have performed any other clinical trials. Surprisingly, they say that they
2626
have and provide us with 11 more CSV files for a further 11 clinical trials they have undertaken
2727
since the initial trial.

_episodes/05-loop.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ inflammation dataset (`inflammation-01.csv`), which revealed some suspicious fea
2121

2222
![Line graphs showing average, maximum and minimum inflammation across all patients over a 40-day period.](../fig/03-loop_2_0.png)
2323

24-
We have a dozen data sets right now, though, and potentially more on the way if Dr. Maverick
24+
We have a dozen data sets right now and potentially more on the way if Dr. Maverick
2525
can keep up their surprisingly fast clinical trial rate. We want to create plots for all of
2626
our data sets with a single statement. To do that, we'll have to teach the computer how to
2727
repeat things.

_episodes/06-files.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ This means we can loop over it
4545
to do something with each filename in turn.
4646
In our case,
4747
the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
48+
4849
If we want to start by analyzing just the first three files in alphabetical order, we can use the
4950
`sorted` built-in function to generate a new sorted list from the `glob.glob` output:
5051

@@ -108,13 +109,13 @@ maximum and minimum inflammation over a 40-day period for all patients in the th
108109
dataset.](../fig/03-loop_49_5.png)
109110

110111

111-
Hmmm. The plots generated for the second clinical trial file look very similar to the plots for
112+
The plots generated for the second clinical trial file look very similar to the plots for
112113
the first file: their average plots show similar "noisy" rises and falls; their maxima plots
113114
show exactly the same linear rise and fall; and their minima plots show similar staircase
114115
structures.
115116

116117
The third dataset shows much noisier average and maxima plots that are far less suspicious than
117-
the first two datasets, however the minima plot shows that the third dataset minima are
118+
the first two datasets, however the minima plot shows that the third dataset minima is
118119
consistently zero across every day of the trial. If we produce a heat map for the third data file
119120
we see the following:
120121

@@ -124,7 +125,7 @@ the entire dataset, and the last patient only has zero values over the 40 day st
124125
We can see that there are zero values sporadically distributed across all patients and days of the
125126
clinical trial, suggesting that there were potential issues with data collection throughout the
126127
trial. In addition, we can see that the last patient in the study didn't have any inflammation
127-
flare-ups at all throughout the trial, suggesting that they may not even suffer from arthritis.
128+
flare-ups at all throughout the trial, suggesting that they may not even suffer from arthritis!
128129

129130

130131
> ## Plotting Differences
@@ -213,7 +214,7 @@ flare-ups at all throughout the trial, suggesting that they may not even suffer
213214
214215
After spending some time investigating the heat map and statistical plots, as well as
215216
doing the above exercises to plot differences between datasets and to generate composite
216-
patient statistics, we gain some insight into the twelve clinical trial datasets:
217+
patient statistics, we gain some insight into the twelve clinical trial datasets.
217218
218219
The datasets appear to fall into two categories:
219220
@@ -231,11 +232,11 @@ duplicated files.
231232
Dr. Maverick confesses that they fabricated the clinical data after they found out
232233
that the initial trial suffered from a number of issues, including unreliable data-recording and
233234
poor participant selection. They created fake data to prove their drug worked, and when we asked
234-
for more trials they tried to generate more fake sets, as well as throwing in the original
235-
poor-quality dataset a few times to try and make all the trials seem more "realistic".
235+
for more data they tried to generate more fake datasets, as well as throwing in the original
236+
poor-quality dataset a few times to try and make all the trials seem a bit more "realistic".
236237
237-
Congratulations! We've cracked the case and proven that the inflammation datasets have been
238-
synthetically generated (in python no less!).
238+
Congratulations! We've investigated the inflammation data and proven that the datasets have been
239+
synthetically generated.
239240
240241
But it would be a shame to throw away the synthetic datasets that have taught us so much
241242
already, so we'll forgive the imaginary Dr. Maverick and continue to use the data to learn

index.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,11 @@ so this introduction to Python is built around a common scientific task:
1010

1111
### Scenario: A Miracle Arthritis Inflammation Cure
1212

13-
Our colleague Dr. Maverick has invented a new "miracle drug" that promises to cure arthritis
14-
inflammation flare-ups after only 3 weeks since initially taking the medication! Naturally,
15-
we wish to see the clinical trial data and after months of asking them they have finally
16-
provided us with a CSV spreadsheet containing the clinical trial data.
13+
Our imaginary colleague "Dr. Maverick" has invented a new miracle drug that promises to
14+
cure arthritis inflammation flare-ups after only 3 weeks since initially taking the
15+
medication! Naturally, we wish to see the clinical trial data, and after months of asking
16+
for the data they have finally provided us with a CSV spreadsheet containing the clinical
17+
trial data.
1718

1819
The CSV file contains the number of inflammation flare-ups per day for the 60 patients
1920
in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a

0 commit comments

Comments
 (0)