netlit/README.Rmd at main · judgelord/netlit · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
output: github_document
editor_options:
  chunk_output_type: console
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  cache = FALSE,
  collapse = FALSE,
  warning = FALSE,
  message = FALSE,
  tidy = FALSE,
  fig.align='center',
  comment = "#>",
  fig.path = "man/figures/README-",
  R.options = list(width = 200)
)
```

# netlit: Augment a literature review with network analysis statistics <img src="man/figures/logo.png" align="right" width="150"/>
<!-- badges: start -->
  [![CRAN status](https://www.r-pkg.org/badges/version/netlit)](https://CRAN.R-project.org/package=netlit)
  <!-- badges: end -->

------

Understanding the gaps and connections across existing theories and findings is a perennial challenge in scientific research. Systematically reviewing scholarship is especially challenging for researchers who may lack domain expertise, including junior scholars or those exploring new substantive territory. Conversely, senior scholars may rely on longstanding assumptions and social networks that exclude new research. In both cases, ad hoc literature reviews hinder accumulation of knowledge. Scholars are rarely systematic in selecting relevant prior work or then identifying patterns across their sample. To encourage systematic, replicable, and transparent methods for assessing literature, we propose an accessible network-based framework for reviewing scholarship. In our method, we consider a literature as a network of recurring concepts (nodes) and theorized relationships among them (edges).
Network statistics and visualization allow researchers to see patterns and offer reproducible characterizations of assertions about the major themes in existing literature.

`netlit` provides functions to generate network statistics from a literature review. Specifically, it processes a dataset where each row is a proposed relationship ("edge") between two concepts or variables ("nodes").
The aim is to offer easy tools to begin using the power of network analysis in R for literature reviews. Using `netlit` simply requires researchers to enter relationships they observe in prior studies into a simple spreadsheet.

To install `netlit` from CRAN, run the following:

```{r, eval=FALSE}
install.packages("netlit")
```

## Basic Usage

The `review()` function takes in a dataframe, `data`, that includes `from` and `to` columns (a directed graph structure).

In the example below, we use example data from [this project on redistricting](https://github.com/judgelord/redistricting). These data are a set of related concepts (`from` and `to`) in the redistricting literature and citations for these relationships (`cites` and `cites_empirical`). See the main [`netlit` vignette](https://judgelord.github.io/netlit/articles/netlit.html) for more details on this example.

```{r}
library(netlit)

data("literature")

head(literature)
```

`netlit` offers four functions: `make_edgelist()`, `make_nodelist()`, `augment_nodelist()`, and `review()`.

`review()` is the primary function (and probably the only one you need). The others are helper functions that perform the individual steps that `review()` does all at once. `review()` takes in a dataframe with at least two columns representing linked concepts (e.g., a cause and an effect) and returns data augmented with network statistics. Users must either specify "from" nodes and "to" nodes with the `from` and `to` arguments or include columns named `from` and `to` in the supplied `data` object.

`review()` returns a list of three objects:

1. an augmented `edgelist` (a list of relationships with `edge_betweenness` calculated),
2. an augmented `nodelist` (a list of concepts with `degree` and `betweenness` calculated), and
3. a `graph` object suitable for use in other `igraph` functions or other network visualization packages.

### Including node attributes

Users may wish to include edge attributes (e.g., information about the relationship between the two concepts) or node attributes (information about each concept). We show how to do so below. But first, consider the basic use of `review()`:


```{r}
lit <- review(literature, from = "from", to = "to")

lit

head(lit$edgelist)

head(lit$nodelist)
```

Edge and node attributes can be added using the `edge_attributes` and `node_attributes` arguments.

`edge_attributes` is a vector that identifies columns in the supplied data frame that the user would like to retain. (To retain all variables from `literature`, use `edge_attributes = names(literature)`.)

`node_attributes` is a separate dataframe that contains attributes for each node in the primary data set. `node_attributes` must be a dataframe with a column `node` with values matching the `to` or `from` columns of the `data` argument.

The example `node_attributes` data include one column `type` indicating a type for each each node/variable/concept.

```{r}
data("node_attributes")

head(node_attributes)

lit <- review(literature,
              edge_attributes = c("cites", "cites_empirical"),
              node_attributes = node_attributes)

lit

head(lit$edgelist)

head(lit$nodelist)
```


## Mapping literature networks

Below is a plot of redistricting literature network from the main [`netlit` vignette](https://judgelord.github.io/netlit/articles/netlit.html) using the `graph` object returned by the `netlit::review()` function as the input to network graphing functions from packages like `ggnetwork`. The  `nodelist` and `edgelist` objects also provide required inputs for other network visualization packages, e.g. `ggraph` or `visNetwork` (vignettes on how to make similar plots in `ggraph` and `visNetwork` will be posted shortly).

 Nodes represent theoretical concepts, shaded by total degree centrality. Arrows connect concepts theorized as directional relationships in works, colored by number of works. Solid edges indicate empirically studied connections; dashed are relationships that have been theorized but not studied empirically.

```{r, echo = FALSE}
knitr::include_graphics("man/figures/ggraph-1.png")
```