-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.Rmd
More file actions
131 lines (95 loc) · 4.56 KB
/
README.Rmd
File metadata and controls
131 lines (95 loc) · 4.56 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
output: github_document
---
# `{Simulacron3}` <img src='man/figures/logo.png' style="float:right; height:200px;" align='right' />
The purpose of the `{Simulacron3}` package is to provide easy-to-use boilerplate
functionality for simple simulation studies. The most archetypal example
of a usecase is comparing the performance of multiple estimators as the
sample size of training data increases.
A fundamental thesis of this package is that many simulation studies (of the
statistical performance of estimators) follow the following workflow:

## Main products
`{Simulacron3}` exports very little: just the following object and function:
* `Simulation` — use `Simulation$new()` to set up a new simulation. We take a
simulation to be a repeated experiment of _the same configuration_. We refer
to repetitions of the same experiment as `replications`.
* `run_simulation_study()` — A function for running a `Simulation` across
varied sample sizes.
See `?Simulation` and `?run_simulation_study()`.
## Demonstration
```{r}
library(Simulacron3)
# the only thing Simulacron3 contains is the Simulation R6 Class, used below
# Example Usage
# Define a data generating process
dgp <- function(n) data.frame(x = rnorm(n), y = rnorm(n))
# Define some estimators
estimators <- list(
mean_estimator = function(data) mean(data$x),
var_estimator = function(data) var(data$x)
)
# Define a summary statistics function
#
# An estimator can potentially return a lot more data than can be stored
# in one row of results, so the summary_stats functions are used to
# condense that information down. Here they're not doing very much, but
# in more advanced simulations we will see why they're crucial.
summary_func <- function(iter = NULL, est_results, data = NULL) {
data.frame(
mean_est = est_results$mean_estimator,
var_est = est_results$var_estimator
)
}
# Create a simulation object
sim <- Simulation$new()
# Set up the simulation
sim$set_dgp(dgp)
sim$set_estimators(estimators)
sim$set_config(list(replications = 5000, sample_size = 500))
sim$set_summary_stats(summary_func)
# Run the simulation
sim$run()
# Retrieve results
results <- sim$get_results()
head(results)
```
See <https://ctesta01.github.io/Simulacron3/articles/Comparing-Estimators.html> for
a slightly more involved example.
## Parallelization
Parallelization is supported, and as simple as passing `parallel = TRUE`
to the `config` for your simulation and declaring a `plan(multisession)`
with the `{future}` package.
```{r}
#| cache: true
library(microbenchmark)
# let's benchmark the simulation we specified above
microbenchmark::microbenchmark(sim$run(), times = 10)
# just change the config to run in parallel
sim$set_config(list(parallel = TRUE))
future::plan(future::multisession) # setup an appropriate future::plan
microbenchmark::microbenchmark(sim$run(), times = 10)
```
Underlying this is usage of the `{future.apply}` package. See
<https://future.futureverse.org/> for a description of the
types of plans that can be specified.
## Package Internals
`{Simulacron3}` is meant to have easy to understand source code (and
not too much of it) so that users can easily reason about what to expect
from their simulations. Check out the source, especially for the `$run()` method of the `Simulation` class and the `run_simulation_study()` method:
- <https://github.com/ctesta01/Simulacron3/blob/main/R/Simulation.R>
- <https://github.com/ctesta01/Simulacron3/blob/main/R/run_simulation_study.R>
### Package Title Inspiration
To quote Wikipedia:
<img src="https://upload.wikimedia.org/wikipedia/en/7/70/DanielFGalouye-Simulacron-3.jpg" align='left' hspace='15' width='100px' />
> Simulacron-3 (1964), by Daniel F. Galouye, is an American science fiction novel featuring an early literary description of a simulated reality. <br><br>
> ... As time and events unwind, [Fuller] progressively grasps that his own world is
probably not "real" and might be only a computer-generated simulation.
<br><br>
## Other Related Works
`{Simulacron3}` is one of many attempts to help with the workflow of running
simulations. A lot of inspiration was taken from:
- [simChef](https://github.com/Yu-Group/simChef)
- [SimEngine](https://avi-kenny.github.io/SimEngine/)
- [simcausal](https://www.jstatsoft.org/article/view/v081i02)
- [simulator](https://github.com/jacobbien/simulator)