Skip to content

Commit 1c55b6c

Browse files
author
Stephen Childs
committed
Add an longitudinal tabular data exercise.
This commit adds an exercise to the first episode dealing with handling tabular data longitudinally by taking the differences between inframmation readings.
1 parent 4f56886 commit 1c55b6c

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

_episodes/01-numpy.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1135,3 +1135,73 @@ the graphs will actually be squeezed together more closely.)
11351135
> > {: .output}
11361136
> {: .solution}
11371137
{: .challenge}
1138+
1139+
>## Change In Inflamation
1140+
>
1141+
>This patient data is _longitudinal_ in the sense that each row represents a
1142+
>series of observations relating to one individual. This means that change
1143+
>inflamation is a meaningful concept.
1144+
>
1145+
>The `numpy.diff()` function takes a NumPy array and returns the
1146+
>difference along a specified axis.
1147+
>
1148+
>Which axis would it make sense to use this function along?
1149+
> > ## Solution
1150+
> > Since the row axis (0) is patients, it does not make sense to get the
1151+
> > difference between two arbitrary patients. The column axis (1) is in
1152+
> > days, so the differnce is the change in inflamation -- a meaningful
1153+
> > concept.
1154+
> >
1155+
> > ~~~
1156+
> > numpy.diff(data, axis=1)
1157+
> > ~~~
1158+
> > {: .python}
1159+
> {: .solution}
1160+
>
1161+
>If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns)
1162+
>, what would the shape of the array be after you run the `diff()` function and
1163+
>why?
1164+
> > ## Solution
1165+
> > The shape will be `(60, 39)` because there is one fewer difference between
1166+
> > columns than there are columns in the data.
1167+
> {: .solution}
1168+
>
1169+
>How would you find the largest change in inflammation for each patient? Does
1170+
>it matter if the change in inflammation is an increase or a decrease?
1171+
> > ## Solution
1172+
> > By using the `max()` function after you apply the `diff()` function, you
1173+
> > will get the largest difference between days.
1174+
> > ~~~
1175+
> > numpy.diff(data, axis=1).max(axis=1)
1176+
> > ~~~
1177+
> > {: .python}
1178+
> > ~~~
1179+
> > array([ 7., 12., 11., 10., 11., 13., 10., 8., 10., 10., 7.,
1180+
> > 7., 13., 7., 10., 10., 8., 10., 9., 10., 13., 7.,
1181+
> > 12., 9., 12., 11., 10., 10., 7., 10., 11., 10., 8.,
1182+
> > 11., 12., 10., 9., 10., 13., 10., 7., 7., 10., 13.,
1183+
> > 12., 8., 8., 10., 10., 9., 8., 13., 10., 7., 10.,
1184+
> > 8., 12., 10., 7., 12.])
1185+
> > ~~~
1186+
> > {: .python}
1187+
> > If a difference is a *decrease*, then the difference will be negative. If
1188+
> > you are interested in the **magnitude** of the change and not just the
1189+
> > direction, the `numpy.absolute()` function will provide that.
1190+
> >
1191+
> > Notice the difference if you get the largest _absolute_ difference
1192+
> > between readings.
1193+
> > ~~~
1194+
> > numpy.absolute(numpy.diff(data, axis=1)).max(axis=1)
1195+
> > ~~~
1196+
> > {: .python}
1197+
> > ~~~
1198+
> > array([ 12., 14., 11., 13., 11., 13., 10., 12., 10., 10., 10.,
1199+
> > 12., 13., 10., 11., 10., 12., 13., 9., 10., 13., 9.,
1200+
> > 12., 9., 12., 11., 10., 13., 9., 13., 11., 11., 8.,
1201+
> > 11., 12., 13., 9., 10., 13., 11., 11., 13., 11., 13.,
1202+
> > 13., 10., 9., 10., 10., 9., 9., 13., 10., 9., 10.,
1203+
> > 11., 13., 10., 10., 12.])
1204+
> > ~~~
1205+
> > {: .python}
1206+
> {: .solution}
1207+
{: .challenge}

0 commit comments

Comments
 (0)