You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/classification1.Rmd
+41-31Lines changed: 41 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -1296,17 +1296,23 @@ upsampled_plot
1296
1296
1297
1297
### Missing data
1298
1298
1299
-
One of the most common issues in real data sets in the wild is *missing data*, i.e., observations
1300
-
where the values of some of the variables were not recorded.
1301
-
Unfortunately, as common as it is, handling missing data properly is very challenging and generally
1302
-
relies on expert knowledge about the data, setting, and how the data were collected. One typical challenge with missing data
1303
-
is that missing entries can be *informative*: the very fact that an entries were missing is related to the values of other variables.
1304
-
For example, survey participants from a marginalized group of people may be less likely to respond to certain kinds of questions if they
1305
-
fear that answering honestly will come with negative consequences. In that case, if we were to simply throw away data with missing entries,
1306
-
we would bias the conclusions of the survey by inadvertently removing many members of that group of respondents.
1307
-
So ignoring this issue in real problems can easily lead to misleading analyses, with detrimental impacts.
1308
-
In this book, we will cover only those techniques for dealing with missing entries in situations
1309
-
where missing entries are just "randomly missing", i.e., where the fact that certain entries are missing *isn't related to anything else* about the observation.
1299
+
One of the most common issues in real data sets in the wild is *missing data*,
1300
+
i.e., observations where the values of some of the variables were not recorded.
1301
+
Unfortunately, as common as it is, handling missing data properly is very
1302
+
challenging and generally relies on expert knowledge about the data, setting,
1303
+
and how the data were collected. One typical challenge with missing data is
1304
+
that missing entries can be *informative*: the very fact that an entries were
1305
+
missing is related to the values of other variables. For example, survey
1306
+
participants from a marginalized group of people may be less likely to respond
1307
+
to certain kinds of questions if they fear that answering honestly will come
1308
+
with negative consequences. In that case, if we were to simply throw away data
1309
+
with missing entries, we would bias the conclusions of the survey by
1310
+
inadvertently removing many members of that group of respondents. So ignoring
1311
+
this issue in real problems can easily lead to misleading analyses, with
1312
+
detrimental impacts. In this book, we will cover only those techniques for
1313
+
dealing with missing entries in situations where missing entries are just
1314
+
"randomly missing", i.e., where the fact that certain entries are missing
1315
+
*isn't related to anything else* about the observation.
1310
1316
1311
1317
Let's load and examine a modified subset of the tumor image data
0 commit comments