#Introduction to Programming with R
We do not have enough "in class" time to really develop all of the R programming skills necessary to enable you to independently perform analysis of differential expression using RNAseq data. Therefore, most of the activities will be self-guided, using a combination of video tutorials and practical exercises. It is expected you will have done all of these for class, as we will be doing a larger in class set of activities assuming a basic level of comfort and familiarity.
Readings: From Bioinformatics Data Skills, chapter 8 presents a crash course in programming in R. We will not be using any of the functionality in ggplot2 quite yet, so you can skip those pages (i.e. skip 207-215, 224-227). I suggest that reading a bit, and/or going through the relevant video tutorial & exercises, and then moving onto the next component may work best.
If you do not have a fairly recent version of R installed on your local computer (V.3.2.1 or newer), this is required to be able to complete the class activities. There are several versions of R you might consider.
- You can download and install the version of
Rappropriate to your computer. For Mac OS X or Windows you can download them at the page above. For Linux, useyum,aptor other package management utility you like. For the Mac OS X R GUI, it has a simple script editor that does syntax highlighting, and displays argument flags for functions. I think the Windows R script editor is much more bare bones. - Alternatively, you can use R-studio, which is a pretty nice IDE (integrayed development environment) for
R, including advanced syntax highlighting (including RMarkdown, which we will use), and integration with github for version control. While I have a few pet peaves with it, most folks love it.
The data set that is used for some of these activities can be found on the DRYAD Digital repository right here. You can also set this up (so you do not need a local copy of the data by putting this command in your script or copying and pasting it into the R editor :
dll.data <- read.csv("http://datadryad.org/bitstream/handle/10255/dryad.8377/dll.csv", h=T)Please note, the scripts may look a bit different, as I have edited them a bit after making the screencasts. All of the important parts are still there!
The first link is to the screencast itself (hosted on youtube). The subsequent links are to the scripts and exercises.
- Why use
R(and why learn to program): Motivating example of working counts of expression data from RNAseq (not yet completed, so skip ahead!). - Introduction to
R: part 1.Ras a calculator
- Introduction to
R: part 2. Basic operations and operators inR
- the second exercise is here
- Introduction to
R: part 3. Element-by-element operations, booleans & basic functions
- the third exercise is here
- Introduction to
R: part 4. objects and classes inR
- the fourth exercise is here
- Introduction to
R: part 5. workspaces and getting help inR
- the fifth exercise is here
- Introduction to
R: part 6. writing your own functions inR
- the sixth exercise is here
- Introduction to
R: part 7. regular sequences and indexing
- the seventh exercise is here
- Introduction to
R: part 8. getting data intoR
- Introduction to
R: part 9. control flow inR
- The script can be viewed here
- Introduction to
R: part 10. using the apply family of functions inR
- The script can be viewed here
- seperate "getting data into R" into 3-4 shorter screencasts(with exercises)
- seperate " control flow" into one on if & ifelse, and another on loops.
- seperate the apply screencasts (one for apply alone, additional one for tapply, sapply, lapply)
- new screencast on lists as a class (heterogeneous collections) and working with them
- new screencast on data.frame as a list, but also relationship to matrix.
- new screencast on ordering data sets, and the use of the index in R.
- introduce tidyr, dplyr, reshape2, data.table, readr, libraries
- base plotting Introduction
- ggplot2 plotting syntax (and the Hadleyverse syntax as a "macro language" within R)
- real programming things? (declarative syntax, S3 and S4 objects, )
- things to make R work better with big data sets (pre-allocating space for incoming data, pre-allocating objects and filling them instead of growing objects. Dynamic typing/checking at runtime)
- knitr (with and without Rstudio)
- version control and github (with and without Rstudio)