-
Notifications
You must be signed in to change notification settings - Fork 6
Getting data into R
We can import Comma Separated Values (CSV) files into R very easily. These files can be generated by Microsoft Excel, Apple Numbers and Google Sheets usually through a File -> Export process.
Once a sheet has been exported, it can be imported into R:
df <- read.csv(LOCATION_OF_FILE, header=TRUE)
This will create a variable called data in the global namespace with your sheet. Columns can then be referenced by:
df$column_name
Using Google Sheets is a slightly more advanced way of getting data into R but it is more powerful. It allows the creation of seamless data analytics. Using Google Forms, one can input the data into a pre-validated form that then gets automatically inserted into a sheet that then gets pulled in automatically into R next time you run your script.
We will need to install the googlesheets library to import data from Google Sheets.
install.packages("googlesheets")
Once the package has been installed, load it into your R script:
library(googlesheets)
Now list all the sheets available (local and ones shared to your account as shown https://docs.google.com/spreadsheets/) using
gs_ls()
On the first run, it will ask you to authorise the package 'googlespreadsheet' to view your account by copying a link from the console and pasting it into a browser. Once you have authorised it, you will receive a code from your browser which you can then paste into the console. A list of all your sheets will then be printed in the console.
To access the sheet:
sheet <- gs_title(YOUR_TITLE)
Where YOUR_TITLE is the name of the sheet as shown in the gs_ls command. gs_title will authorise the use of that sheet.
To get access to the worksheet:
df <- gs_read(ss = sheet, ws = NAME_OF_WORKSHEET)
data variable now contains a dataframe much like importing from CSV. You can access columns by:
df$column_name
Using R-studio, click on Environment, then click on 'data' (as the variable name). This will open up a new window that displays the data if everything has gone smoothly on either methods.
The see the top rows of your sheet:
head(df)
Similarly, the last rows:
tail(df)
To get a summary of each column (min, max, averages...etc), type:
summary(df)
- Discuss what a CSV file
- Show how to export CSV files from Excel in an OS independent way...
- Screenshots of what each bit looks like, like googlesheets console/authentication
- Google Forms screenshot/explaining more
- 2/3rd of the emphasis should be on CSV bit.
- Split googlesheet/CSV bits, CSV should be more important
- Explain what a Path and how to get it (Mac version and a PC version) with screenshot
- String as factors=False in the function (?using readr as it might also deal with line ending \r\n vs \n in CSV file)
- Remind people that gs_ls() is a function (also, () = no arguments)
Please contact Steve Harris if you have any questions.