|
| 1 | +### Please add alt text to your posts |
| 2 | + |
| 3 | +Please add alt text (alternative text) to all of your posted graphics |
| 4 | +for `#TidyTuesday`. |
| 5 | + |
| 6 | +Twitter provides |
| 7 | +[guidelines](https://help.twitter.com/en/using-twitter/picture-descriptions) |
| 8 | +for how to add alt text to your images. |
| 9 | + |
| 10 | +The DataViz Society/Nightingale by way of Amy Cesal has an |
| 11 | +[article](https://medium.com/nightingale/writing-alt-text-for-data-visualization-2a218ef43f81) |
| 12 | +on writing *good* alt text for plots/graphs. |
| 13 | + |
| 14 | +> Here's a simple formula for writing alt text for data visualization: |
| 15 | +> \### Chart type It's helpful for people with partial sight to know |
| 16 | +> what chart type it is and gives context for understanding the rest of |
| 17 | +> the visual. Example: Line graph \### Type of data What data is |
| 18 | +> included in the chart? The x and y axis labels may help you figure |
| 19 | +> this out. Example: number of bananas sold per day in the last year |
| 20 | +> \### Reason for including the chart Think about why you're including |
| 21 | +> this visual. What does it show that's meaningful. There should be a |
| 22 | +> point to every visual and you should tell people what to look for. |
| 23 | +> Example: the winter months have more banana sales \### Link to data or |
| 24 | +> source Don't include this in your alt text, but it should be included |
| 25 | +> somewhere in the surrounding text. People should be able to click on a |
| 26 | +> link to view the source data or dig further into the visual. This |
| 27 | +> provides transparency about your source and lets people explore the |
| 28 | +> data. Example: Data from the USDA |
| 29 | +
|
| 30 | +Penn State has an |
| 31 | +[article](https://accessibility.psu.edu/images/charts/) on writing alt |
| 32 | +text descriptions for charts and tables. |
| 33 | + |
| 34 | +> Charts, graphs and maps use visuals to convey complex images to users. |
| 35 | +> But since they are images, these media provide serious accessibility |
| 36 | +> issues to colorblind users and users of screen readers. See the |
| 37 | +> [examples on this page](https://accessibility.psu.edu/images/charts/) |
| 38 | +> for details on how to make charts more accessible. |
| 39 | +
|
| 40 | +The `{rtweet}` package includes the [ability to post |
| 41 | +tweets](https://docs.ropensci.org/rtweet/reference/post_tweet.html) with |
| 42 | +alt text programatically. |
| 43 | + |
| 44 | +Need a **reminder**? There are |
| 45 | +[extensions](https://chrome.google.com/webstore/detail/twitter-required-alt-text/fpjlpckbikddocimpfcgaldjghimjiik/related) |
| 46 | +that force you to remember to add Alt Text to Tweets with media. |
| 47 | + |
| 48 | +# Tornados |
| 49 | + |
| 50 | +The data this week comes from NOAA's National Weather Service Storm Prediction Center [Severe Weather Maps, Graphics, and Data Page](https://www.spc.noaa.gov/wcm/#data). |
| 51 | +Thank you to [Evan Gower](https://github.com/rfordatascience/tidytuesday/issues/549) for the suggestion! |
| 52 | + |
| 53 | +Evan [investigated](https://www.kaggle.com/code/evangower/diving-into-us-tornado-data) a version of this dataset on Kaggle. |
| 54 | + |
| 55 | +### Get the data here |
| 56 | + |
| 57 | +```{r} |
| 58 | +# Get the Data |
| 59 | +
|
| 60 | +# Read in with tidytuesdayR package |
| 61 | +# Install from CRAN via: install.packages("tidytuesdayR") |
| 62 | +# This loads the readme and all the datasets for the week of interest |
| 63 | +
|
| 64 | +# Either ISO-8601 date or year/week works! |
| 65 | +
|
| 66 | +tuesdata <- tidytuesdayR::tt_load('2023-05-16') |
| 67 | +tuesdata <- tidytuesdayR::tt_load(2023, week = 20) |
| 68 | +
|
| 69 | +tornados <- tornados |
| 70 | +
|
| 71 | +# Or read in the data manually |
| 72 | +
|
| 73 | +tornados <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-05-16/tornados.csv') |
| 74 | +``` |
| 75 | + |
| 76 | +### Data Dictionary |
| 77 | + |
| 78 | +# `tornados.csv` |
| 79 | + |
| 80 | +|variable |class |description | |
| 81 | +|:------------|:---------|:------------| |
| 82 | +|om |integer |Tornado number. Effectively an ID for this tornado in this year.| |
| 83 | +|yr |integer |Year, 1950-2022. | |
| 84 | +|mo |integer |Month, 1-12.| |
| 85 | +|dy |integer |Day of the month, 1-31. | |
| 86 | +|date |date |Date. | |
| 87 | +|time |time |Time. | |
| 88 | +|tz |character |[Canonical tz database timezone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).| |
| 89 | +|datetime_utc |datetime |Date and time normalized to UTC. | |
| 90 | +|st |character |Two-letter postal abbreviation for the state (DC = Washington, DC; PR = Puerto Rico; VI = Virgin Islands). | |
| 91 | +|stf |integer |State FIPS (Federal Information Processing Standards) number. | |
| 92 | +|mag |integer |Magnitude on the F scale (EF beginning in 2007). Some of these values are estimated (see fc). | |
| 93 | +|inj |integer |Number of injuries. When summing for state totals, use sn == 1 (see below). | |
| 94 | +|fat |integer |Number of fatalities. When summing for state totals, use sn == 1 (see below). | |
| 95 | +|loss |double |Estimated property loss information in dollars. Prior to 1996, values were grouped into ranges. The reported number for such years is the maximum of its range. | |
| 96 | +|slat |double |Starting latitude in decimal degrees. | |
| 97 | +|slon |double |Starting longitude in decimal degrees. | |
| 98 | +|elat |double |Ending latitude in decimal degrees. | |
| 99 | +|elon |double |Ending longitude in decimal degrees. | |
| 100 | +|len |double |Length in miles. | |
| 101 | +|wid |double |Width in yards. | |
| 102 | +|ns |integer |Number of states affected by this tornado. 1, 2, or 3. | |
| 103 | +|sn |integer |State number for this row. 1 means the row contains the entire track information for this state, 0 means there is at least one more entry for this state for this tornado (om + yr). | |
| 104 | +|f1 |integer |FIPS code for the 1st county. | |
| 105 | +|f2 |integer |FIPS code for the 2nd county. | |
| 106 | +|f3 |integer |FIPS code for the 3rd county. | |
| 107 | +|f4 |integer |FIPS code for the 4th county. | |
| 108 | +|fc |logical |Was the mag column estimated? | |
| 109 | + |
| 110 | +### Cleaning Script |
| 111 | + |
| 112 | +``` r |
| 113 | +# All packages used in this script: |
| 114 | +library(tidyverse) |
| 115 | +library(here) |
| 116 | + |
| 117 | +url <- "https://www.spc.noaa.gov/wcm/data/1950-2022_actual_tornadoes.csv" |
| 118 | + |
| 119 | +# Some of the automatic column types are imperfect Get that spec and then |
| 120 | +# update it. |
| 121 | +tornados <- read_csv(url) |
| 122 | +spec(tornados) # Copy/pasted into col_types below then edited. |
| 123 | +tornados <- read_csv( |
| 124 | + url, |
| 125 | + col_types = cols( |
| 126 | + om = col_integer(), |
| 127 | + yr = col_integer(), |
| 128 | + mo = col_integer(), |
| 129 | + dy = col_integer(), |
| 130 | + date = col_date(format = ""), |
| 131 | + time = col_time(format = ""), |
| 132 | + tz = col_integer(), |
| 133 | + st = col_factor(), |
| 134 | + stf = col_integer(), |
| 135 | + stn = col_integer(), |
| 136 | + mag = col_integer(), |
| 137 | + inj = col_integer(), |
| 138 | + fat = col_integer(), |
| 139 | + loss = col_double(), |
| 140 | + closs = col_double(), |
| 141 | + slat = col_double(), |
| 142 | + slon = col_double(), |
| 143 | + elat = col_double(), |
| 144 | + elon = col_double(), |
| 145 | + len = col_double(), |
| 146 | + wid = col_integer(), |
| 147 | + ns = col_integer(), |
| 148 | + sn = col_integer(), |
| 149 | + sg = col_integer(), |
| 150 | + f1 = col_integer(), |
| 151 | + f2 = col_integer(), |
| 152 | + f3 = col_integer(), |
| 153 | + f4 = col_integer(), |
| 154 | + fc = col_integer() |
| 155 | + ) |
| 156 | +) |
| 157 | + |
| 158 | +glimpse(tornados) |
| 159 | + |
| 160 | +# This table only contains one segment per tornado, so we can drop the sg |
| 161 | +# column. |
| 162 | +tornados$sg <- NULL |
| 163 | + |
| 164 | +# The tz column is confusing in the provided dictionary |
| 165 | +# (https://www.spc.noaa.gov/wcm/data/SPC_severe_database_description.pdf). |
| 166 | +# Investigate it to make sense of the various values. |
| 167 | +tornados |> |
| 168 | + count(tz) |
| 169 | + |
| 170 | +# The doc says 3 == CST, and 9 == GMT. 0 appears to be NA. What is 6? |
| 171 | +tornados |> |
| 172 | + filter(tz == 6) |> |
| 173 | + count(st) |
| 174 | + |
| 175 | +# All tornados with tz == 6 are in Mountain Time states, so we'll make that |
| 176 | +# assumption. Update time encoding. |
| 177 | + |
| 178 | +tornados <- tornados |> |
| 179 | + # We can't really judge even what day the recording was on for unknown tz, so |
| 180 | + # drop those values. |
| 181 | + filter(tz != 0) |> |
| 182 | + mutate( |
| 183 | + # Make the remaining tz's more meaningful. We'll assume they meant Central |
| 184 | + # (daylight or standard) for "CST", and likewise that they meant what we now |
| 185 | + # call UTC for "GMT". "GMT" sometimes includes BST so we'll avoid using that |
| 186 | + # name. |
| 187 | + tz = case_match( |
| 188 | + tz, |
| 189 | + 3 ~ "America/Chicago", |
| 190 | + 6 ~ "America/Denver", |
| 191 | + 9 ~ "UTC" |
| 192 | + ), |
| 193 | + # Add a datetime_utc column to normalize the times. ymd_hms only wants a |
| 194 | + # single timezone (not a vector of them), so break it up with a case_match. |
| 195 | + datetime_utc = case_match( |
| 196 | + tz, |
| 197 | + "America/Chicago" ~ lubridate::ymd_hms( |
| 198 | + paste(date, time), |
| 199 | + tz = "America/Chicago" |
| 200 | + ), |
| 201 | + "America/Denver" ~ lubridate::ymd_hms( |
| 202 | + paste(date, time), |
| 203 | + tz = "America/Denver" |
| 204 | + ), |
| 205 | + "UTC" ~ lubridate::ymd_hms( |
| 206 | + paste(date, time), |
| 207 | + tz = "UTC" |
| 208 | + ) |
| 209 | + ) |> |
| 210 | + lubridate::with_tz("UTC"), |
| 211 | + .after = tz |
| 212 | + ) |> |
| 213 | + # Drop stn because it was discontinued and was inconsistent before being |
| 214 | + # discontinued. closs (crop loss) has an unexplained discontinuity in 2016 and |
| 215 | + # it isn't entirely clear what changed. |
| 216 | + select(-"stn", -"closs") |> |
| 217 | + # Recode some more weird columns. |
| 218 | + mutate( |
| 219 | + # The mag column uses -9 for NA. |
| 220 | + mag = na_if(mag, -9), |
| 221 | + # The loss column is confusingly coded. Let's attempt to make it make sense. |
| 222 | + # The documentation (last updated in 2010) explains that the coding changed in |
| 223 | + # 1996. Observationally, it's clear that it changed again in 2016. |
| 224 | + loss = case_when( |
| 225 | + loss == 0 ~ NA, |
| 226 | + yr < 1996 & loss == 1 ~ 50, |
| 227 | + yr < 1996 & loss == 2 ~ 500, |
| 228 | + yr < 1996 & loss == 3 ~ 5000, |
| 229 | + yr < 1996 & loss == 4 ~ 50000, |
| 230 | + yr < 1996 & loss == 5 ~ 500000, |
| 231 | + yr < 1996 & loss == 6 ~ 5000000, |
| 232 | + yr < 1996 & loss == 7 ~ 50000000, |
| 233 | + yr < 1996 & loss == 8 ~ 500000000, |
| 234 | + yr < 1996 & loss == 9 ~ 5000000000, |
| 235 | + yr >= 1996 & yr < 2016 ~ loss * 1e6, |
| 236 | + TRUE ~ loss |
| 237 | + ), |
| 238 | + # The fc column is really a "was mag estimated" column |
| 239 | + fc = as.logical(fc) |
| 240 | + ) |
| 241 | + |
| 242 | +# Some of the remaining columns are confusing, but we'll explain them in the |
| 243 | +# dictionary and see what people find! |
| 244 | + |
| 245 | +write_csv( |
| 246 | + tornados, |
| 247 | + here( |
| 248 | + "data", |
| 249 | + "2023", |
| 250 | + "2023-05-16", |
| 251 | + "tornados.csv" |
| 252 | + ) |
| 253 | +) |
| 254 | +``` |
0 commit comments