|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Working with the Household Grid |
| 4 | +nav_order: 4 |
| 5 | +parent: MCS |
| 6 | +format: docusaurus-md |
| 7 | +--- |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | +# Introduction |
| 13 | + |
| 14 | +In this tutorial, we will learn the basics of using the household grid. |
| 15 | +Specifically, we will see how to identify particular family members, how |
| 16 | +to use the household grid to create family-member specific variables, |
| 17 | +and how to determine the relationships between family members. We will |
| 18 | +use the example of finding natural mothers smoking status at the first |
| 19 | +sweep. |
| 20 | + |
| 21 | +```r |
| 22 | +# Load Packages |
| 23 | +library(tidyverse) # For data manipulation |
| 24 | +library(haven) # For importing .dta files |
| 25 | +``` |
| 26 | + |
| 27 | +# Finding Mother of Cohort Members |
| 28 | + |
| 29 | +We will load just four variables from the household grid: `MCSID` and |
| 30 | +`APNUM00`, which uniquely identify an individual, and `AHPSEX00` and |
| 31 | +`AHCREL00`, which contain information on the individual’s sex and their |
| 32 | +relationship to the household’s cohort member(s). `AHCREL00 == 7` |
| 33 | +identifies natural parents and `AHPSEX00 == 2` identifies females. |
| 34 | +Combining the two identifies natural mothers. Below, we use `count()` to |
| 35 | +show the different (observed) values for the sex and relationship |
| 36 | +variables. We also use the `filter()` function (which retains |
| 37 | +observations where the conditions are `TRUE`) to create a dataset |
| 38 | +containing the identifiers (`MCSID` and `APNUM00` of natural mothers |
| 39 | +only; we will merge this will smoking information shortly. |
| 40 | +`add_count(MCSID) %>% filter(n == 1)` is included as an interim step to |
| 41 | +ensure there is just one natural mother per family. |
| 42 | + |
| 43 | +```r |
| 44 | +df_0y_hhgrid <- read_dta("0y/mcs1_hhgrid.dta") %>% |
| 45 | + select(MCSID, APNUM00, AHPSEX00, AHCREL00) |
| 46 | + |
| 47 | +df_0y_hhgrid %>% |
| 48 | + count(AHPSEX00) |
| 49 | +``` |
| 50 | + |
| 51 | +``` text |
| 52 | +# A tibble: 4 × 2 |
| 53 | + AHPSEX00 n |
| 54 | + <dbl+lbl> <int> |
| 55 | +1 -2 [Unknown] 55 |
| 56 | +2 -1 [Not applicable] 18734 |
| 57 | +3 1 [Male] 26438 |
| 58 | +4 2 [Female] 29567 |
| 59 | +``` |
| 60 | + |
| 61 | +```r |
| 62 | +df_0y_hhgrid %>% |
| 63 | + count(AHCREL00) |
| 64 | +``` |
| 65 | + |
| 66 | +``` text |
| 67 | +# A tibble: 16 × 2 |
| 68 | + AHCREL00 n |
| 69 | + <dbl+lbl> <int> |
| 70 | + 1 -9 [Refusal] 5 |
| 71 | + 2 -8 [Dont Know] 1 |
| 72 | + 3 7 [Natural parent] 33812 |
| 73 | + 4 8 [Adoptive parent] 2 |
| 74 | + 5 9 [Foster parent] 3 |
| 75 | + 6 10 [Step-parent/partner of parent] 50 |
| 76 | + 7 11 [Natural brother/Natural sister] 13873 |
| 77 | + 8 12 [Half-brother/Half-sister] 3486 |
| 78 | + 9 13 [Step-brother/Step-sister] 16 |
| 79 | +10 14 [Adopted brother/Adopted sister] 8 |
| 80 | +11 15 [Foster brother/Foster sister] 9 |
| 81 | +12 17 [Grandparent] 2164 |
| 82 | +13 18 [Nanny/au pair] 20 |
| 83 | +14 19 [Other relative] 2326 |
| 84 | +15 20 [Other non-relative] 233 |
| 85 | +16 96 [Self] 18786 |
| 86 | +``` |
| 87 | + |
| 88 | +```r |
| 89 | +df_0y_mothers <- df_0y_hhgrid %>% |
| 90 | + filter(AHCREL00 == 7, |
| 91 | + AHPSEX00 == 2) %>% |
| 92 | + add_count(MCSID) %>% |
| 93 | + filter(n == 1) %>% |
| 94 | + select(MCSID, APNUM00) |
| 95 | +``` |
| 96 | + |
| 97 | +Note, where a cohort member is part of a family (`MCSID`) with two or |
| 98 | +more cohort members, the cohort member will have been a multiple birth |
| 99 | +(i.e., twin or triplet), so familial relationships should apply to all |
| 100 | +cohort members in the family, which is why there is just one |
| 101 | +relationship (`[A-G]HCREL00`) variable per household grid file. This |
| 102 | +will change as the cohort members age, move into separate residences and |
| 103 | +start their own families. |
| 104 | + |
| 105 | +# Creating a Mother’s Smoking Variable |
| 106 | + |
| 107 | +Now we have a dataset containing the IDs of natural mothers, we can load |
| 108 | +the smoking information from the Sweep 1 parent interview file. The |
| 109 | +smoking variable used is called `APSMUS0A` which contains information on |
| 110 | +the tobacco products a parent uses. We classify a parent as a smoker if |
| 111 | +they use any tobacco product (`mutate(smoker = case_when(...))`). |
| 112 | + |
| 113 | +```r |
| 114 | +df_0y_parent <- read_dta("0y/mcs1_parent_interview.dta") %>% |
| 115 | + select(MCSID, APNUM00, APSMUS0A) |
| 116 | + |
| 117 | +df_0y_parent %>% |
| 118 | + count(APSMUS0A) |
| 119 | +``` |
| 120 | + |
| 121 | +``` text |
| 122 | +# A tibble: 9 × 2 |
| 123 | + APSMUS0A n |
| 124 | + <dbl+lbl> <int> |
| 125 | +1 -9 [Refusal] 4 |
| 126 | +2 -8 [Don't Know] 3 |
| 127 | +3 -1 [Not applicable] 10 |
| 128 | +4 1 [No, does not smoke] 21229 |
| 129 | +5 2 [Yes, cigarettes] 9003 |
| 130 | +6 3 [Yes, roll-ups] 1246 |
| 131 | +7 4 [Yes, cigars] 217 |
| 132 | +8 5 [Yes, a pipe] 6 |
| 133 | +9 95 [Yes, other tobacco product] 16 |
| 134 | +``` |
| 135 | + |
| 136 | +```r |
| 137 | +df_0y_smoking <- df_0y_parent %>% |
| 138 | + mutate(smoker = case_when(APSMUS0A %in% 2:95 ~ 1, |
| 139 | + APSMUS0A == 1 ~ 0)) %>% |
| 140 | + select(MCSID, APNUM00, smoker) |
| 141 | +``` |
| 142 | + |
| 143 | +Now we can merge the two datasets together to ensure we only keep rows |
| 144 | +in `df_0y_smoking` that appear in `df_0y_mothers`. We use `left_join()` |
| 145 | +to do this, with `df_0y_mothers` as the dataset determining the |
| 146 | +outputted rows, so that we have one row per identified mother. The |
| 147 | +result is a dataset with one row per family with an identified mother. |
| 148 | +We rename the `smoker` variable to `mother_smoker` to clarify that it |
| 149 | +refers to the mother’s smoking status. |
| 150 | + |
| 151 | +Below we also pipe this dataset into the `tabyl()` function (from |
| 152 | +`janitor`) to tabulate the number and proportions of mothers who smoke |
| 153 | +and those who do not. |
| 154 | + |
| 155 | +```r |
| 156 | +# install.packages("janitor") # Uncomment if you need to install |
| 157 | +library(janitor) |
| 158 | +``` |
| 159 | + |
| 160 | +``` text |
| 161 | +
|
| 162 | +Attaching package: 'janitor' |
| 163 | +``` |
| 164 | + |
| 165 | +``` text |
| 166 | +The following objects are masked from 'package:stats': |
| 167 | +
|
| 168 | + chisq.test, fisher.test |
| 169 | +``` |
| 170 | + |
| 171 | +```r |
| 172 | +df_0y_mothers %>% |
| 173 | + left_join(df_0y_smoking, by = c("MCSID", "APNUM00")) %>% |
| 174 | + select(MCSID, mother_smoker = smoker) %>% |
| 175 | + tabyl(mother_smoker) |
| 176 | +``` |
| 177 | + |
| 178 | +``` text |
| 179 | + mother_smoker n percent valid_percent |
| 180 | + 0 12883 0.695814205 0.6968304 |
| 181 | + 1 5605 0.302727518 0.3031696 |
| 182 | + NA 27 0.001458277 NA |
| 183 | +``` |
| 184 | + |
| 185 | +# Determining Relationships between Non-Cohort Members |
| 186 | + |
| 187 | +The household grids include another set of relationship variables |
| 188 | +(`[A-G]HPREL[A-Z]0`). These can be used to identify the relationships |
| 189 | +between family members. These variables record the person in the row’s |
| 190 | +(ego) relationship to the person denoted by the column (alt); the |
| 191 | +penultimate letter `[A-Z]` in `[A-G]HPREL[A-Z]0` corresponds to the |
| 192 | +person’s `PNUM00`. For instance, the variable `AHPRELB0` would denote |
| 193 | +the relationship of the person in the row to the person with |
| 194 | +`APNUM00 == 2`. We will extract a small set of data from the Sweep 1 |
| 195 | +household grid to show this in action. |
| 196 | + |
| 197 | +```r |
| 198 | +df_0y_hhgrid_prel <- read_dta("0y/mcs1_hhgrid.dta") %>% |
| 199 | + select(MCSID, APNUM00, matches("AHPREL[A-Z]0")) |
| 200 | + |
| 201 | +df_0y_hhgrid_prel %>% |
| 202 | + select(MCSID, APNUM00, AHPRELA0, AHPRELB0, AHPRELC0, AHPRELD0) %>% |
| 203 | + filter(MCSID == "M10001N") # To look at just one family |
| 204 | +``` |
| 205 | + |
| 206 | +``` text |
| 207 | +# A tibble: 7 × 6 |
| 208 | + MCSID APNUM00 AHPRELA0 AHPRELB0 AHPRELC0 AHPRELD0 |
| 209 | + <chr> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lb> <dbl+lb> |
| 210 | +1 M10001N 1 96 [Self] 1 [Husband/Wife] 7 [Nat… 7 [Nat… |
| 211 | +2 M10001N 2 1 [Husband/Wife] 96 [Self] 7 [Nat… 7 [Nat… |
| 212 | +3 M10001N 3 3 [Natural son/daughter] 3 [Natural son/d… 96 [Sel… 11 [Nat… |
| 213 | +4 M10001N 4 3 [Natural son/daughter] 3 [Natural son/d… 11 [Nat… 96 [Sel… |
| 214 | +5 M10001N 5 3 [Natural son/daughter] 3 [Natural son/d… 11 [Nat… 11 [Nat… |
| 215 | +6 M10001N 6 3 [Natural son/daughter] 3 [Natural son/d… 11 [Nat… 11 [Nat… |
| 216 | +7 M10001N 100 3 [Natural son/daughter] 3 [Natural son/d… 11 [Nat… 11 [Nat… |
| 217 | +``` |
| 218 | + |
| 219 | +There are seven members in this family, one of whom is a cohort member |
| 220 | +(`APNUM00 == 100`). `APNUM00`’s 1 and 2 are the (natural) parents, and |
| 221 | +`APNUM00`’s 3-6 and 100 are the (natural) children. The relationship |
| 222 | +variables show that `APNUM00`’s 1 and 2 are married, and `APNUM00`’s 3-7 |
| 223 | +are siblings. Note, the symmetry in the relationships. Where, |
| 224 | +`APNUM00 == 1`, `AHPRELC0 == 7 [Natural Parent]` and where |
| 225 | +`APNUM00 == 3`, `AHPRELA0 == 3 [Natural Child]`. |
| 226 | + |
| 227 | +If we want to find the particular person occupying a particular |
| 228 | +relationship for an individual (e.g., we want to know the `PNUM00` of |
| 229 | +the person’s partner), we need to reshape the data into long-format with |
| 230 | +one row per ego-alt relationship within a family. For instance, if we |
| 231 | +want to find each person’s spouse (conditional on one being present), we |
| 232 | +can do the following: |
| 233 | + |
| 234 | +```r |
| 235 | +df_0y_hhgrid_prel %>% |
| 236 | + pivot_longer(cols = matches("AHPREL[A-Z]0"), |
| 237 | + names_to = "alt", |
| 238 | + values_to = "relationship") %>% |
| 239 | + mutate(APNUM00_alt = match(str_sub(alt, 7, 7), LETTERS)) %>% |
| 240 | + filter(relationship == 1) %>% |
| 241 | + select(MCSID, APNUM00, parent_pnum = APNUM00_alt) |
| 242 | +``` |
| 243 | + |
| 244 | +``` text |
| 245 | +# A tibble: 23,616 × 3 |
| 246 | + MCSID APNUM00 parent_pnum |
| 247 | + <chr> <dbl> <int> |
| 248 | + 1 M10001N 1 2 |
| 249 | + 2 M10001N 2 1 |
| 250 | + 3 M10002P 1 2 |
| 251 | + 4 M10002P 2 1 |
| 252 | + 5 M10007U 1 2 |
| 253 | + 6 M10007U 2 1 |
| 254 | + 7 M10011Q 1 2 |
| 255 | + 8 M10011Q 2 1 |
| 256 | + 9 M10015U 1 2 |
| 257 | +10 M10015U 2 1 |
| 258 | +# ℹ 23,606 more rows |
| 259 | +``` |
| 260 | + |
| 261 | +# Coda |
| 262 | + |
| 263 | +This only scratches the surface of what can be achieved with the |
| 264 | +household grid. The `mcs[1-7]_hhgrid.dta` also contain information on |
| 265 | +cohort-member and family-member’s dates of birth, which can be used to, |
| 266 | +for example, identify the number of resident younger siblings, determine |
| 267 | +maternal and paternal age at birth, and so on. |
0 commit comments