Skip to content

Getting data into R

ngageorange edited this page Mar 30, 2016 · 12 revisions

Getting data into R

Using in-built function

We can import Comma Separated Values (CSV) files into R very easily. These files can be generated by Microsoft Excel, Apple Numbers and Google Sheets usually through a File -> Export process.

Once a sheet has been exported, it can be imported into R:

 data <- read.csv(LOCATION_OF_FILE, header=TRUE)

This will create a variable called data in the global namespace with your sheet. Columns can then be referenced by:

 data$column_name

Using Google Sheets

Using Google Sheets is a slightly more advanced way of getting data into R but it is more powerful. It allows the creation of seamless data analytics. Using Google Forms, one can input the data into a pre-validated form that then gets automatically inserted into a sheet that then gets pulled in automatically into R next time you run your script.

We will need to install the googlesheets library to import data from Google Sheets.

install.packages("googlesheets")

Once the package has been installed, load it into your R script:

library("googlesheets")

Now list all the sheets available (local and ones shared to your account as shown https://docs.google.com/spreadsheets/) using

gs_ls()

On the first run, it will ask you to authorise the package 'googlespreadsheet' to view your account by copying a link from the console and pasting it into a browser. Once you have authorised it, you will receive a code from your browser which you can then paste into the console. A list of all your sheets will then be printed in the console.

To access the sheet:

sheet <- gs_title(YOUR_TITLE)

Where YOUR_TITLE is the name of the sheet as shown in the gs_ls command. gs_title will authorise the use of that sheet.

To get access to the worksheet:

data <- gs_read(ss = sheet, ws = NAME_OF_WORKSHEET)

data variable now contains a dataframe much like importing from CSV. You can access columns by:

data$column_name

Looking at Data

Using R-studio, click on Environment, then click on 'data' (as the variable name). This will open up a new window that displays the data if everything has gone smoothly on either methods.

Simple Testing

The see the top rows of your sheet:

head(data)

Similarly, the last rows:

tail(data)

To get a summary of each column (min, max, averages...etc), type:

summary(data)
Clone this wiki locally