Skip to content

Commit b28cd50

Browse files
committed
Download package for learners.
0 parents  commit b28cd50

File tree

9 files changed

+2152
-0
lines changed

9 files changed

+2152
-0
lines changed

files/lit-prog/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Files for the Literate Programming lesson
2+
3+
- `countryPick.Rmd`: Rmarkdown file demonstrating various features of
4+
literate programming with R.
5+
- `countryPick.pdf`: the PDF generated from the Rmarkdown file of the
6+
same base name.
7+
- `gapminderDataFiveYear.tsv`: the cleaned and subset version of the
8+
Gapminder dataset available from the [gapminder R package].
9+
10+
[gapminder R package]: http://github.com/jennybc/gapminder

files/lit-prog/countryPick4.Rmd

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
---
2+
title: "Pick four - comparing trends in population over time"
3+
output: pdf_document
4+
---
5+
6+
## Purpose
7+
8+
The purpose of this report is to compare the population trends for four countries of your choosing. In addition, this serves as an example of literate programming. Literate programming is a way to document how you performed your analysis. It serves as a guide to other to others (and your future self) how to reproduce your work.
9+
10+
## Required Libraries
11+
```{r}
12+
library(ggplot2)
13+
```
14+
15+
## Data
16+
17+
Always add as many details as possible about your data including where it came from, how it was processed, licensing, and where it can be accessed.
18+
19+
- Gapminder data [available here](http://www.gapminder.org/data/). [Gapminder data is licensed CC-BY 3.0](https://docs.google.com/document/pub?id=1POd-pBMc5vDXAmxrpGjPLaCSDSWuxX6FLQgq5DhlUhM#h.ul2gu2-uwathz).
20+
21+
- Processed data via [@jennybc](https://github.com/jennybc), [R package available here](https://github.com/jennybc/gapminder). The [data-raw](https://github.com/jennybc/gapminder/tree/master/data-raw) sub-directory reveals the journey from Gapminder.org's Excel workbooks to increasingly clean and tidy data.
22+
23+
**Read in data**: To read in the data, make sure this file is in the same directory/folder as the `gapminderDataFiveYear.txv` file. To set the proper working directory go to Session > Set Working Directory > To Source File Location.
24+
25+
```{r}
26+
gapMinder <- read.delim("gapminderDataFiveYear.tsv")
27+
28+
### Check data
29+
head(gapMinder) #First 10 lines of dataset
30+
dim(gapMinder) #number of rows and columns in data set
31+
```
32+
33+
You can see what countries are available by looking at the how many unique categories are in the country column of the gapMinder dataset.
34+
35+
```{r, results='hide'}
36+
levels(gapMinder$country)
37+
```
38+
39+
### Pick Four Countries
40+
41+
Now pick four countries that you are intrested in. Just replace with the countries name below.
42+
43+
```{r}
44+
countryName1 <- "India"
45+
countryName2 <- "United States"
46+
countryName3 <- "Nigeria"
47+
countryName4 <- "Germany"
48+
```
49+
50+
## Individual countries
51+
52+
### Country One
53+
54+
We want to look at how population changes over time for the first country.
55+
56+
```{r}
57+
country1 <- subset(gapMinder, country == countryName1)
58+
59+
ggplot(country1, aes(year, pop)) +
60+
geom_path() +
61+
ggtitle(countryName1) +
62+
theme(plot.title = element_text(size = 15, face = "bold"))
63+
```
64+
65+
This second graph is looking at the correlation between life expectancy (lifeExp) and GDP per person (gdpPercap). The size of the circles on the plot represents total population.
66+
67+
```{r}
68+
ggplot(country1, aes(gdpPercap, lifeExp, size = pop)) +
69+
geom_point() +
70+
ggtitle(countryName1) +
71+
theme(plot.title = element_text(size = 15, face = "bold"))
72+
```
73+
74+
### Country Two
75+
76+
We will do this for each country. Since the code is very similar, we
77+
will omit viewing it below by adding the named parameter `echo=FALSE`
78+
(`TRUE` is the default):
79+
80+
```{r, echo=FALSE}
81+
country2 <- subset(gapMinder, country == countryName2)
82+
83+
ggplot(country2, aes(year, pop)) +
84+
geom_path() +
85+
ggtitle(countryName2) +
86+
theme(plot.title = element_text(size = 15, face = "bold"))
87+
```
88+
89+
**Notes**: In a real report you can add information about the results of the analysis you are performing. That way your code, analysis, questions, and results are all in one place.
90+
91+
```{r, echo = FALSE}
92+
ggplot(country2, aes(gdpPercap, lifeExp, size = pop)) +
93+
geom_point() +
94+
ggtitle(countryName2) +
95+
theme(plot.title = element_text(size = 15, face = "bold"))
96+
```
97+
98+
### Country Three
99+
100+
```{r, echo=FALSE}
101+
country3 <- subset(gapMinder, country == countryName3)
102+
103+
ggplot(country3, aes(year, pop)) +
104+
geom_path() +
105+
ggtitle(countryName3) +
106+
theme(plot.title = element_text(size = 15, face = "bold"))
107+
```
108+
109+
**Notes** Maybe a country has an unusual distribution and we want to label the graph with the year. We added `label = year` to the first line of the code below. To display the text we also added the `geom_text(hjust = 1, vjust = 0, size = 5)` option.
110+
111+
```{r}
112+
ggplot(country3, aes(gdpPercap, lifeExp, size = pop, label = year)) +
113+
geom_point() +
114+
geom_text(hjust = 1.3, vjust = 0, size = 3) +
115+
ggtitle(countryName3) +
116+
theme(plot.title = element_text(size = 15, face = "bold"))
117+
```
118+
119+
### Country Four
120+
121+
```{r, echo=FALSE}
122+
country4 <- subset(gapMinder, country == countryName4)
123+
124+
ggplot(country4, aes(year, pop)) +
125+
geom_path() +
126+
ggtitle(countryName4) +
127+
theme(plot.title = element_text(size=15, face = "bold"))
128+
```
129+
130+
**Notes**: Or we can try out labeling the year by adding color.
131+
132+
```{r}
133+
ggplot(country4, aes(gdpPercap, lifeExp, size = pop, color = year)) +
134+
geom_point() +
135+
ggtitle(countryName4) +
136+
theme(plot.title = element_text(size=15, face = "bold"))
137+
```
138+
139+
## All four countries
140+
141+
Let's add all four countries together and to see how they compare.
142+
143+
```{r}
144+
#Add subsetted data together
145+
allCountries <- rbind(country1, country2, country3, country4)
146+
147+
#Notice the code for this is similar to when we are just looking at one country
148+
#just with the added color option
149+
ggplot(allCountries, aes(year, pop, color=country)) +
150+
geom_path() +
151+
xlab("Year") + ylab("Population Size") +
152+
ggtitle("All four countries") +
153+
theme(plot.title = element_text(lineheight=.8, face = "bold"))
154+
```
155+
156+
What about what is occuring in a particular year? You can change the year by changing the code in the `year == 2007` section. To look at what years are possible use `allCountries$year`.
157+
158+
```{r}
159+
yr <- 2007
160+
ggplot(subset(allCountries, year == yr),
161+
aes(x = gdpPercap, y = lifeExp, color = country, size = pop)) +
162+
scale_x_log10(limits = c(500, 90000)) +
163+
geom_point(alpha = 0.8) +
164+
scale_size_area(max_size = 14) +
165+
theme_bw() + # black grid on white background
166+
xlab("GDP per capita") + ylab("Life Expectancy") +
167+
ggtitle(paste("All 4 countries in", yr)) +
168+
theme(plot.title = element_text(size = 15, face = "bold"))
169+
```
170+
171+
You can plot all the years at once also!
172+
173+
```{r}
174+
ggplot(allCountries,
175+
aes(x = gdpPercap, y = lifeExp, color = country, size = pop)) +
176+
scale_x_log10(limits = c(500, 90000)) +
177+
ylim(c(30, 90)) +
178+
geom_point(alpha = 0.8) +
179+
scale_size_area(max_size = 14) +
180+
theme_bw() + # black grid on white background
181+
xlab("GDP per capita") + ylab("Life Expectancy") +
182+
ggtitle("All 4 countries") +
183+
theme(plot.title = element_text(size = 15, face = "bold"))
184+
```
185+
186+
187+
## Conclusions
188+
189+
In a real report you can add conclusions about your analysis or future plans for the project. The best part is that if you want to change something in your report you don't have to redo every step. You can just make the change and re-print the report.

files/lit-prog/countryPick4.pdf

242 KB
Binary file not shown.

0 commit comments

Comments
 (0)