Skip to content

Getting data into R

ngageorange edited this page Mar 30, 2016 · 12 revisions

Getting data into R

Using in-built function

We can import Comma Separated Values (CSV) files into R very easily. These files can be generated by Microsoft Excel, Apple Numbers and Google Sheets usually through a File -> Export process.

Once a sheet has been exported, it can be imported into R:

 data <- read.csv(LOCATION_OF_FILE, header=TRUE)

This will create a variable called data in the global namespace with your sheet. Columns can then be referenced by:

 data$column_name

Using Google Sheets

We will need to teach people to install the googlesheets library for importing data from Google Sheets.

install.packages("googlesheets")

Once the package has been installed, load it into your R script:

library('googlesheets')

V basic cleaning on the way in

  • to lower
  • ??data types

Testing the data is what was expected

  • head(), tail()
  • stem()
  • summary()

https://ramnathv.github.io/pycon2014-r/explore/README.html

Beginners

Types of data:

  • Numeric
  • Integers
  • Strings
  • Date/Time objects

Improvers

I'm going to use the above link to focus a 20 minute tutorial on 3 common mistakes:

  1. Column headers are values, not variable names
  2. Multiple variables are stored in one column
  3. Variables are stored in both rows and columns

There are 2 other instances of untidy data, but I won't delve too much into it.

I've already written a little powerpoint presentation that I have presented to the SHOs regarding tidy data so it'll be an extension of that.

Column headers are values, not variable names

  • Example with Income and Religion table

Multiple variables are stored in one column

Variables are stored in both rows and columns

Clone this wiki locally