-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
230 lines (196 loc) · 10.4 KB
/
README.Rmd
File metadata and controls
230 lines (196 loc) · 10.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
renv::use(
"tor-gu/njmunicipalities",
"tor-gu/njelections",
"ggplot2",
"kableExtra",
"tidyr"
)
```
```{r echo=FALSE, message=FALSE}
library(njelections)
library(kableExtra)
kbl <- function(tbl, caption) {
knitr::kable(tbl,
caption = NULL,
format = "html",
table.attr = "class=\"kable\"")
}
```
# njelections
<!-- badges: start -->
[](https://github.com/tor-gu/njelections/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
This is a data package for R containing the results of statewide elections in NJ, from 2004 to 2025.
## Installation
You can install the development version of `njelections` like so:
``` r
# install.packages("devtools")
devtools::install_github("tor-gu/njelections")
```
## Dataset Overview
This package contains the results of statewide general elections for three offices:
* New Jersey Governor
* US Senate
* US President
All elections from 2004 to 2054 are included, at three levels of organization:
* Statewide (`election_statewide`)
* By county (`election_by_county`)
* By municipality (`election_by_municipality`)
#### Table `election_statewide`
Table `election_statewide` contains the following columns, which are common
to all tables in this dataset:
```{r echo=FALSE}
tibble::tribble(
~Field, ~Type, ~Description, ~Example,
"year", "int", "Election year", "2004",
"type", "chr", "Currently always 'General'", "General",
"office", "chr", "'President', 'Senate', or 'Governor'", "President",
"candidate", "chr", "Candidate name", "John F. Kerry",
"party", "chr", "Candidate party", "Democratic",
"vote", "int", "Number of votes received", "1911430",
) |> kbl("election_statewide")
```
There is one row in this table for every `year`, `office`, and `candidate` combination.
##### Example
```{r echo=FALSE}
head(election_statewide, 3) |> kbl("head(election_statewide), 3")
```
#### Table `election_by_county`
Table `election_by_county` contains all of the columns in `election_statewide`, plus
two more:
```{r echo=FALSE}
tibble::tribble(
~Field, ~Type, ~Description, ~Example,
"GEOID", "chr", "US Census GEOID for the county", "34001",
"county", "chr", "County name", "Atlantic County",
) |> kbl("Additional fields in election_by_county")
```
There is one row in this table for every `year`, `office`, `county` and `candidate` combination. In particular, for a given `year` and `office`, every `candidate` is
represented in every `county`.
##### Example
```{r echo=FALSE}
head(election_by_county, 3) |> kbl("head(election_by_county, 3)")
```
#### Table `election_by_municipality`
Table `election_by_municipality` contains all of the columns in `election_statewide`, plus three more:
```{r echo=FALSE}
tibble::tribble(
~Field, ~Type, ~Description, ~Example,
"GEOID", "chr", "US Census GEOID for the municipality", "3400100100",
"county", "chr", "County name", "Atlantic County",
"municipality", "chr", "Municipality name", "Absecon city",
) |> kbl("Additional fields in election_by_municipality")
```
There is one row in this table for every `year`, `office`, `county`, `municipality` and `candidate` combination. In particular, for a given `year` and `office`, every `candidate` is represented in every `municipality`.
##### Example
```{r echo=FALSE}
head(election_by_municipality, 3) |> kbl("head(election_by_municipality, 3)")
```
## Notes
### Data source
The source for this data is the [New Jersey Division of Elections](https://nj.gov/state/elections/index.shtml). The data was derived by scraping the PDFs in the [election results archive](https://nj.gov/state/elections/election-information-results.shtml).
### NJ municipalities
New Jersey municipalities have not been stable over the course of 2004-2025:
* Several municipalities have changed names or been assigned new GEOIDs by the US Census.
* In 2013, Princeton borough and Princeton township merged
* In 2022, Pine Valley was absorbed by Pine Hill
The [`njmunicipalities`](https://github.com/tor-gu/njmunicipalities) package contains municipality names and GEOIDs across the period 2001-2025. The `election_by_municipality` table uses the names and GEOIDs from the `njmunicipalities` package for the year of the election, with the [exception of the Princetons](#princeton-and-the-2012-election) for the 2012 election. See [Accounting for changing municipal names](#accounting-for-changing-municipal-names) for a worked example dealing with these issues.
#### Princeton and the 2012 election
At the time of the 2012 election, Princeton borough and Princeton township were still separate municipalities. However, the official results for Mercer County provide only the combined results for the merged Princeton municipalities.
As a result, the `election_by_municipality` table uses the 2013 municipality list from `njmunicipalities` for the 2012 election. The Princeton merger is the only difference in the 2012 and 2013 municipality list.
### Candidate and party names
In general, an attempt was made to record candidate and party names exactly as they appear in the official results. However, when the same candidate or party appears
in multiple elections with slightly varying names, the most common form of the name was used.
For example, Jeff Boss has appeared in official results variously as 'Jeff Boss', 'Jeffrey Boss' and 'Jeffery "Jeff" Boss'. In this package, his name has been standardized to "Jeff Boss".
Similarly, the Green and Libertarian party names have been standardized to "Green Party" and "Libertarian Party".
When a candidate does not have a listed party, the party is recorded as "Independent".
### Consistency across levels
#### State vs county
For every `year`, `office` and `candidate` combination, the vote total across counties exactly matches the vote total in the statewide results:
```{r message=FALSE}
library(dplyr)
# Statewide election matches sum of county votes for every year and every office
election_by_county |>
group_by(year, type, office, candidate) |>
summarize(county_vote = sum(vote), .groups = "drop") |>
left_join(election_statewide,
by = c("year", "type", "office", "candidate")) |>
filter(vote != county_vote) |>
nrow()
```
#### County vs municipality
The sum across municipalities does not always match the county total. In many -- but not all -- cases, the official county results account for the discrepancy. For example, the official [2020 Presidential results from Morris County](https://nj.gov/state/elections/assets/pdf/election-results/2020/2020-official-general-results-president-morris.pdf) include federal overseas votes in a separate row, not assigned to any municipality. These discrepancies, even when explicitly
included in the official results, are not recorded in this package.
## Examples
### Displaying in 'wide' format
```{r message=FALSE}
library(dplyr)
library(tidyr)
library(njelections)
hudson_senate_2012 <- election_by_municipality |>
filter(year == 2012,
office == "Senate",
county == "Hudson County") |>
select(GEOID, municipality, party, vote) |>
pivot_wider(names_from = party, values_from = vote) |>
select(GEOID, municipality, Democratic, Republican,
Libertarian = `Libertarian Party`, Green = `Green Party`)
```
```{r echo=FALSE}
hudson_senate_2012 |> kbl("Hudson County 2012 Senate results")
```
### Accounting for changing municipal names
Over the period 2004-2025, several municipalities changed names and GEOIDs, Princeton township was merged into Princeton borough, and Pine Valley was merged into Pine Hill. The package [`njmunicipalities`](https://github.com/tor-gu/njmunicipalities) is helpful here.
As an example, let consider Mercer county, which includes the merged Princetons, as well as Robbinsville township, previously known as Washington township. Let's plot the two-party share of votes for each municipality in Mercer, using the current name for each municipality, and combining the totals for the Princetons in the years prior to the merger.
First, generate a cross reference table for the GEOIDs, using the 2025 GEOIDs and municipality names as the reference. We use `njmunicipalities::get_geoid_cross_reference` and `njmunicipalities::get_municipalities` for this.
```{r}
library(njmunicipalities)
geoid_xref <- get_geoid_cross_references(2025, 2004:2025) |>
dplyr::filter(!is.na(GEOID_ref)) |>
dplyr::left_join(get_municipalities(2025), by = c("GEOID_ref" = "GEOID"))
```
```{r echo=FALSE}
geoid_xref |> head(5) |> kbl("geoid_xref")
```
Now, generate the two-party share of the vote, combining Princeton borough and township. The constants `PRINCETON_TWP_GEOID` and `PRINCETON_BORO_GEOID` come from `njmunicipalities`.
```{r}
tpsov <- njelections::election_by_municipality |>
dplyr::mutate(GEOID = dplyr::if_else(GEOID == PRINCETON_TWP_GEOID,
PRINCETON_BORO_GEOID,
GEOID)) |>
dplyr::group_by(year, office, GEOID, party) |>
dplyr::summarize(vote = sum(vote), .groups = "drop") |>
dplyr::filter(party %in% c("Democratic", "Republican")) |>
dplyr::group_by(year, office, GEOID) |>
dplyr::reframe(party = party, two_party_share_of_vote = vote/sum(vote))
```
```{r echo=FALSE}
tpsov |> head(5) |> kbl("tpsov")
```
Finally, combine the two tables and plot.
```{r}
library(ggplot2)
tpsov |>
dplyr::left_join(geoid_xref, by = c("year", "GEOID")) |>
dplyr::filter(county == "Mercer County") |>
ggplot(aes(x = year, y = two_party_share_of_vote, color = party)) +
scale_color_manual(values = c("Democratic" = "blue", "Republican" = "red")) +
geom_point() +
geom_smooth(se = FALSE, formula = y ~ x, method = "loess") +
facet_wrap("municipality", nrow=4) +
ylab("Two party share of vote") +
xlab("Election year") +
labs(title = "Mercer County, NJ, two party share of vote",
subtitle = "US Senate, President and Governor races, 2004-2025")
```