-
Notifications
You must be signed in to change notification settings - Fork 6
Excel hell
Steve Harris edited this page Mar 3, 2016
·
21 revisions
Basic principles
- define your unit of analysis (the row)
- define characteristics of that unit (the column)
- if something seems repetitive then consider a separate table (concept of relational data and keys); but for simple projects try hard to keep everything in a simple, square table
- simple formatting principles
- computers are colour blind (highlighting cells/rows is only good for you)
- variable names
- camel case vs snake case
- avoid spaces/hyphens/funny chars
- unique names
- a naming system (i.e. bps, bpd, bpm might be easier to search/organise than sysbp, diastolicbp, meanbp) esp if bp1, bp2, bp3 ... which is amenable to tools for automatic translation from wide to long
- row identifiers
- standardise where possible (not y, YES, yes, true, 1, positive: they are all the same thing, choose any and stick with it)
Need some examples of badly done data with specific lessons
- ragged wide data not long: before/after BP in a row is OK, but day 1-3 BP for one patient, and day 1-7 BP for the next isn't
Please contact Steve Harris if you have any questions.