|
| 1 | +--- |
| 2 | +title: Using spreadsheet programs for data organisation |
| 3 | +teaching: 10 |
| 4 | +exercises: 5 |
| 5 | +authors: |
| 6 | +- Jez Cope |
| 7 | +- Christie Bahlai |
| 8 | +- Aleksandra Pawlik |
| 9 | +contributors: |
| 10 | +- Jennifer Bryan |
| 11 | +- Alexander Duryee |
| 12 | +- Jeffrey Hollister |
| 13 | +- Daisie Huang |
| 14 | +- Owen Jones |
| 15 | +- Clare Sloggett |
| 16 | +- Harriet Dashnow |
| 17 | +- Ben Marwick |
| 18 | +- Sherry Lake |
| 19 | +--- |
| 20 | + |
| 21 | +::::::::::::::::::::::::::::::::::::::: objectives |
| 22 | + |
| 23 | +- Understanding some drawbacks and advantages of using spreadsheet programs |
| 24 | +- Distinguish machine readable tidy data from data that is easy to read for humans |
| 25 | + |
| 26 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 27 | + |
| 28 | +:::::::::::::::::::::::::::::::::::::::: questions |
| 29 | + |
| 30 | +- What are good data practices for using spreadsheets for organizing data? |
| 31 | + |
| 32 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 33 | + |
| 34 | +:::::::::::::::::::::::::::::::::::::::: instructor |
| 35 | + |
| 36 | +### Narrative Guidance |
| 37 | + |
| 38 | +- Introduce that we're teaching data organisation, and that we're using |
| 39 | + spreadsheets, because most people do data entry in spreadsheets or |
| 40 | + have data in spreadsheets. |
| 41 | +- Emphasize that we are teaching good practice in data organisation and that |
| 42 | + this is the foundation of their research practice. Without organised and clean |
| 43 | + data, it will be difficult for them to apply the things we're teaching in the |
| 44 | + rest of the workshop to their data. |
| 45 | +- Much of their lives as a researcher will be spent on this 'data wrangling' stage, but |
| 46 | + some of it can be prevented with good strategies for data collection up front. |
| 47 | +- Tell that we're not teaching data analysis or plotting in spreadsheets, because it's |
| 48 | + very manual and also not reproducible. That's why we're teaching SQL, R, Python! |
| 49 | +- Now let's talk about spreadsheets, and when we say spreadsheets, we mean any program that |
| 50 | + does spreadsheets like Excel, LibreOffice, OpenOffice. Most learners are probably using Excel. |
| 51 | +- Ask the audience any things they've accidentally done in spreadsheets. Talk about an example of your own, like that you accidentally sorted only a single column and not the rest |
| 52 | + of the data in the spreadsheet. What are the pain points!? |
| 53 | +- As people answer highlight some of these issues with spreadsheets |
| 54 | + |
| 55 | + |
| 56 | +::::::::::::::::::::::::::::::::::::::::::::::::::: |
| 57 | + |
| 58 | +Good **data organisation** is the foundation of much of our day-to-day |
| 59 | +work in libraries. Most **librarians** have data or do data entry in |
| 60 | +spreadsheets. Spreadsheet programs are very **useful graphical |
| 61 | +interfaces** for designing data tables and handling very basic data |
| 62 | +quality control functions. |
| 63 | + |
| 64 | +Spreadsheets encompass a lot of the things we need |
| 65 | +to be able to do as librarians. We can use them for: |
| 66 | + |
| 67 | +- Data entry |
| 68 | +- Organizing data |
| 69 | +- Subsetting and sorting data |
| 70 | +- Statistics |
| 71 | +- Plotting |
| 72 | + |
| 73 | +::::::::::::::::::::::::::::::::::::::::: callout |
| 74 | + |
| 75 | +## Jargon busting (Optional, not included in timing) |
| 76 | +The [Jargon Busting exercise](jargon_busting.md) is a helpful way to begin to explore terms, phrases, and ideas related to code and software development. |
| 77 | + |
| 78 | +:::::::::::::::::::::::::::::::::::::::: instructor |
| 79 | +This exercise can be useful when you teach Tidy Data as the introduction to a full LC workshop, especially if you want learners to have an opportunity to meet each other and interact. It can take anywhere from 10 to 45 minutes, depending on your approach. |
| 80 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 81 | + |
| 82 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 83 | + |
| 84 | +### Spreadsheet outline |
| 85 | + |
| 86 | +In this lesson, we will look at: |
| 87 | + |
| 88 | +- Good data entry practices - formatting data tables in spreadsheets |
| 89 | +- How to avoid common formatting mistakes |
| 90 | +- Dates as data - beware! |
| 91 | +- Basic quality control and data manipulation in spreadsheets |
| 92 | +- Exporting data from spreadsheets |
| 93 | + |
| 94 | +**Much of your time when you're producing a report will be spent in |
| 95 | +this 'data wrangling' stage.** It's not the most fun, but it's |
| 96 | +necessary. We'll teach you how to think about data organisation and |
| 97 | +some practices for more effective data wrangling. |
| 98 | + |
| 99 | +*** |
| 100 | + |
| 101 | +### What this lesson will not teach you |
| 102 | + |
| 103 | +- How to do *statistics* in a spreadsheet |
| 104 | +- How to do *plotting* in a spreadsheet |
| 105 | +- How to *write code* in spreadsheet programs |
| 106 | + |
| 107 | +If you're looking to do this, a good reference is |
| 108 | +[Microsoft Excel 365 Bible](https://search.worldcat.org/en/title/1263023438). |
| 109 | + |
| 110 | +*** |
| 111 | + |
| 112 | +### Why aren't we teaching data analysis in spreadsheets |
| 113 | + |
| 114 | +- Data analysis in spreadsheets usually requires **a lot of manual |
| 115 | + work**. If you want to change a parameter or run an analysis with a |
| 116 | + new dataset, you usually have to redo everything by hand. (We do |
| 117 | + know that you can create macros, but see the next point.) |
| 118 | + |
| 119 | +- It is also difficult to **track or reproduce statistical or plotting |
| 120 | + analyses** done in spreadsheet programs when you want to go back to |
| 121 | + your work or someone asks for details of your analysis. |
| 122 | + |
| 123 | +### Spreadsheet programs |
| 124 | + |
| 125 | +There are a number of spreadsheet programs available for use on a desktop or web browser: |
| 126 | + |
| 127 | +- LibreOffice Calc |
| 128 | +- Microsoft Excel |
| 129 | +- Apple Numbers |
| 130 | +- Google Sheets |
| 131 | +- Gnumeric |
| 132 | +- Apache OpenOffice Calc |
| 133 | + |
| 134 | +Commands may differ a bit between programs, but the general idea |
| 135 | +is the same. In this lesson, we will assume that you are most likely using Excel as |
| 136 | +your primary spreadsheet program. There are others with similar functionality, including Gnumeric, OpenOffice Calc, and Google Sheets, but Excel is the package you're most likely to have available on your work computer. |
| 137 | + |
| 138 | +*** |
| 139 | + |
| 140 | +::::::::::::::::::::::::::::::::::::::: challenge |
| 141 | + |
| 142 | +## Questions: |
| 143 | + |
| 144 | +- How many people have used spreadsheets in their work? |
| 145 | +- What kind of operations do you do in spreadsheets? |
| 146 | +- Which ones do you think spreadsheets are good for? |
| 147 | + |
| 148 | + |
| 149 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 150 | + |
| 151 | +*** |
| 152 | + |
| 153 | +::::::::::::::::::::::::::::::::::::::: challenge |
| 154 | + |
| 155 | +## Question |
| 156 | + |
| 157 | +- Spreadsheets can be very useful, but they can also be frustrating and even sometimes give us incorrect results. What are some things that you've accidentally done in a spreadsheet, or have been frustrated that you can't do easily? |
| 158 | + |
| 159 | + |
| 160 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 161 | + |
| 162 | +*** |
| 163 | + |
| 164 | +## Problems with Spreadsheets |
| 165 | + |
| 166 | +Spreadsheets are **good for data entry**, but in reality we **tend to |
| 167 | +use spreadsheet programs for much more** than data entry. We use them |
| 168 | +to create data tables for publications, to generate summary |
| 169 | +statistics, and make figures. |
| 170 | + |
| 171 | +Generating **tables for reports** in a spreadsheet is not optimal - |
| 172 | +often, when formatting a data table for publication, we're reporting |
| 173 | +key summary statistics in a way that is **not really meant to be read |
| 174 | +as data**, and often involves **special formatting** (merging cells, |
| 175 | +creating borders, making it pretty). We advise you to do this sort of |
| 176 | +operation within your document editing software. |
| 177 | + |
| 178 | +The latter two applications, **generating statistics and figures**, should |
| 179 | +be used with caution: because of the graphical, drag and drop nature of |
| 180 | +spreadsheet programs, it can be very difficult, if not impossible, to |
| 181 | +replicate your steps (much less retrace anyone else's), particularly if your |
| 182 | +stats or figures require you to do more complex calculations. Furthermore, |
| 183 | +in doing calculations in a spreadsheet, it's easy to accidentally apply a |
| 184 | +slightly different formula to multiple adjacent cells. When using a |
| 185 | +command-line based statistics program like R or SAS, it's practically |
| 186 | +impossible to accidentally apply a calculation to one observation in your |
| 187 | +dataset but not another unless you're doing it on purpose. |
| 188 | + |
| 189 | +### Using Spreadsheets for Data Entry and Cleaning |
| 190 | + |
| 191 | +**HOWEVER**, there are circumstances where you might want to use a |
| 192 | +spreadsheet program to produce "quick and dirty" calculations or |
| 193 | +figures, and some of these features can be used in **data cleaning**, |
| 194 | +prior to importation into a statistical analysis program. We will show |
| 195 | +you how to use some features of spreadsheet programs to check your |
| 196 | +data quality along the way and produce preliminary summary statistics. |
| 197 | + |
| 198 | +In this lesson, we're going to talk about: |
| 199 | + |
| 200 | +1. [Formatting data tables in spreadsheets](01-format-data.md) |
| 201 | +2. [Formatting problems](02-common-mistakes.md) |
| 202 | +3. [Dates as data](03-dates-as-data.md) |
| 203 | +4. [Basic quality control and data manipulation in spreadsheets](04-quality-control.md) |
| 204 | +5. [Exporting data from spreadsheets](05-exporting-data.md) |
| 205 | +6. [Data export formats caveats](06-data-formats-caveats.md) |
| 206 | + |
| 207 | + |
| 208 | +:::::::::::::::::::::::::::::::::::::::: keypoints |
| 209 | + |
| 210 | +- We will discuss good practices for data entry and formatting |
| 211 | +- We will not discuss analysis or visualisation |
| 212 | + |
| 213 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 214 | + |
| 215 | + |
0 commit comments