Skip to content

Excel hell

Steve Harris edited this page Mar 3, 2016 · 21 revisions

Basic principles

  • define your unit of analysis (the row)
  • define characteristics of that unit (the column)
  • if something seems repetitive then consider a separate table (concept of relational data and keys); but for simple projects try hard to keep everything in a simple, square table
  • simple formatting principles
    • computers are colour blind (highlighting cells/rows is only good for you)
    • variable names
      • camel case vs snake case
      • avoid spaces/hyphens/funny chars
      • unique names
      • a naming system (i.e. bps, bpd, bpm might be easier to search/organise than sysbp, diastolicbp, meanbp) esp if bp1, bp2, bp3 ... which is amenable to tools for automatic translation from wide to long
    • row identifiers
    • standardise where possible (not y, YES, yes, true, 1, positive: they are all the same thing, choose any and stick with it)

Need some examples of badly done data with specific lessons

  • ragged wide data not long: before/after BP in a row is OK, but day 1-3 BP for one patient, and day 1-7 BP for the next isn't
Clone this wiki locally