@@ -45,6 +45,7 @@ This means we can loop over it
4545to do something with each filename in turn.
4646In our case,
4747the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
48+
4849If we want to start by analyzing just the first three files in alphabetical order, we can use the
4950` sorted ` built-in function to generate a new sorted list from the ` glob.glob ` output:
5051
@@ -108,13 +109,13 @@ maximum and minimum inflammation over a 40-day period for all patients in the th
108109dataset.] ( ../fig/03-loop_49_5.png )
109110
110111
111- Hmmm. The plots generated for the second clinical trial file look very similar to the plots for
112+ The plots generated for the second clinical trial file look very similar to the plots for
112113the first file: their average plots show similar "noisy" rises and falls; their maxima plots
113114show exactly the same linear rise and fall; and their minima plots show similar staircase
114115structures.
115116
116117The third dataset shows much noisier average and maxima plots that are far less suspicious than
117- the first two datasets, however the minima plot shows that the third dataset minima are
118+ the first two datasets, however the minima plot shows that the third dataset minima is
118119consistently zero across every day of the trial. If we produce a heat map for the third data file
119120we see the following:
120121
@@ -124,7 +125,7 @@ the entire dataset, and the last patient only has zero values over the 40 day st
124125We can see that there are zero values sporadically distributed across all patients and days of the
125126clinical trial, suggesting that there were potential issues with data collection throughout the
126127trial. In addition, we can see that the last patient in the study didn't have any inflammation
127- flare-ups at all throughout the trial, suggesting that they may not even suffer from arthritis.
128+ flare-ups at all throughout the trial, suggesting that they may not even suffer from arthritis!
128129
129130
130131> ## Plotting Differences
@@ -213,7 +214,7 @@ flare-ups at all throughout the trial, suggesting that they may not even suffer
213214
214215After spending some time investigating the heat map and statistical plots, as well as
215216doing the above exercises to plot differences between datasets and to generate composite
216- patient statistics, we gain some insight into the twelve clinical trial datasets:
217+ patient statistics, we gain some insight into the twelve clinical trial datasets.
217218
218219The datasets appear to fall into two categories:
219220
@@ -231,11 +232,11 @@ duplicated files.
231232Dr. Maverick confesses that they fabricated the clinical data after they found out
232233that the initial trial suffered from a number of issues, including unreliable data-recording and
233234poor participant selection. They created fake data to prove their drug worked, and when we asked
234- for more trials they tried to generate more fake sets , as well as throwing in the original
235- poor-quality dataset a few times to try and make all the trials seem more "realistic".
235+ for more data they tried to generate more fake datasets , as well as throwing in the original
236+ poor-quality dataset a few times to try and make all the trials seem a bit more "realistic".
236237
237- Congratulations! We've cracked the case and proven that the inflammation datasets have been
238- synthetically generated (in python no less!) .
238+ Congratulations! We've investigated the inflammation data and proven that the datasets have been
239+ synthetically generated.
239240
240241But it would be a shame to throw away the synthetic datasets that have taught us so much
241242already, so we'll forgive the imaginary Dr. Maverick and continue to use the data to learn
0 commit comments