Skip to content

Commit aa853cc

Browse files
author
Keith Goldfeld
committed
Fixing documentation and vignettes
1 parent eb4175b commit aa853cc

File tree

3 files changed

+36
-19
lines changed

3 files changed

+36
-19
lines changed

man/defData.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/customdist.Rmd

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,13 @@ ggtheme <- function(panelback = "white") {
4545
4646
```
4747

48-
Custom distributions can be specified in `defData` and `defDataAdd` by using the specifying "custom" for the argument *dist*. The name of the user-defined function is specified as a string in the *formula* argument. The arguments of the custom function are specified in the *variance* argument, as a comma delimited string. One important feature of the custom function is that the parameter list used to define the function must include "**n = n**", but this will not be included in the data definition.
48+
Custom distributions can be specified in `defData` and `defDataAdd` by setting the argument *dist* to "custom". When defining a custom distribution, you provide the name of the user-defined function as a string in the *formula* argument. The arguments of the custom function are listed in the *variance* argument, separated by commas and formatted as "**arg_1 = val_form_1, arg_2 = val_form_2, $\dots$, arg_K = val_form_K**".
49+
50+
Here, the *arg_k's* represent the names of the arguments passed to the customized function, where $k$ ranges from $1$ to $K$. You can use values or formulas for each *val_form_k*. If formulas are used, ensure that the variables have been previously generated. Double dot notation is available in specifying *value_formula_k*. One important requirement of the custom function is that the parameter list used to define the function must include an argument"**n = n**", but do not include $n$ in the definition as part of `defData` or `defDataAdd`.
4951

5052
### Example 1
5153

52-
Here is an example where we would like to generate data from a zero-inflated beta distribution. In this case, there is a user-defined function `zeroBeta` that takes on shape parameters $a$ and $b$, as well as $p_0$, the proportion of the sample that is zero:
54+
Here is an example where we would like to generate data from a zero-inflated beta distribution. In this case, there is a user-defined function `zeroBeta` that takes on shape parameters $a$ and $b$, as well as $p_0$, the proportion of the sample that is zero. Note that the function also takes an argument $n$ that will not to be be specified in the data definition; $n$ will represent the number of observations being generated:
5355

5456
```{r}
5557
zeroBeta <- function(n, a, b, p0) {
@@ -59,7 +61,7 @@ zeroBeta <- function(n, a, b, p0) {
5961
}
6062
```
6163

62-
The data definition specifies a new variable $zb$ that will sets $a$ and $b$ to 0.75, and $p_0 = 0.02$:
64+
The data definition specifies a new variable $zb$ that sets $a$ and $b$ to 0.75, and $p_0 = 0.02$:
6365

6466
```{r}
6567
def <- defData(
@@ -91,10 +93,10 @@ ggplot(data = dd, aes(x = zb)) +
9193

9294
### Example 2
9395

94-
In this case, we are generating a mixture of truncated distributions, where the limits of the truncation vary across three different groups. `rnormt` is a customized (user-defined) function that generates truncated data from a Gaussian distribution. The function requires up to four arguments (the left truncation value, the right truncation value, the distribution average and the standard deviation).
96+
In this second example, we are generating sets of truncated Gaussian distributions with means ranging from $-1$ to $1$. The limits of the truncation vary across three different groups. `rnormt` is a customized (user-defined) function that generates the truncated Gaussiandata. The function requires four arguments (the left truncation value, the right truncation value, the distribution average and the standard deviation).
9597

9698
```{r}
97-
rnormt <- function(n, min, max, mu = 0, s = 1.5) {
99+
rnormt <- function(n, min, max, mu, s) {
98100
99101
F.a <- pnorm(min, mean = mu, sd = s)
100102
F.b <- pnorm(max, mean = mu, sd = s)
@@ -105,7 +107,8 @@ rnormt <- function(n, min, max, mu = 0, s = 1.5) {
105107
}
106108
```
107109

108-
In this example, the truncation limits vary by group membership. There are three groups. We only pass three parameters (the limits and the mean), using the default standard deviation.
110+
111+
In this example, truncation limits vary based on group membership. Initially, three groups are created, followed by the generation of truncated values. For Group 1, truncation occurs within the range of $-1$ to $1$, for Group 2, it's $-2$ to $2$ and for Group 3, it's $-3$ to $3$. We'll generate three data sets, each with a distinct mean denoted by M, using the double-dot notation to implement these different means.
109112

110113
```{r}
111114
def <-
@@ -117,26 +120,40 @@ def <-
117120
defData(
118121
varname = "tn",
119122
formula = "rnormt",
120-
variance = "min = -limit, max = limit, mu = 0.5",
123+
variance = "min = -limit, max = limit, mu = ..M, s = 1.5",
121124
dist = "custom"
122125
)
126+
```
123127

124-
dd <- genData(100000, def)
128+
The data generation requires three calls to `genData`. The output is a list of three data sets:
129+
130+
```{r}
131+
mus <- c(-1, 0, 1)
132+
dd <-lapply(mus, function(M) genData(100000, def))
125133
```
126134

135+
Here are the first six observations from each of the three data sets:
136+
127137
```{r, echo=FALSE}
128-
dd
138+
lapply(dd, function(D) head(D))
129139
```
130140

131141
A plot highlights the group differences.
132142

133-
```{r, fig.width = 8, fig.height = 3, echo = FALSE}
134-
ggplot(data = dd, aes(x = tn)) +
135-
geom_histogram(aes(fill = factor(limit)), binwidth = 0.05, boundary = 0, alpha = .8) +
136-
facet_grid( ~ limit) +
137-
theme(panel.grid = element_blank(),
138-
legend.position = "none") +
139-
scale_fill_manual(values = plotcolors) +
140-
scale_x_continuous(breaks = seq(-3, 3, by =1))
143+
```{r, fig.width = 8, fig.height = 6, echo = FALSE}
144+
pfunc <- function(dx, i) {
145+
ggplot(data = dx, aes(x = tn)) +
146+
geom_histogram(aes(fill = factor(limit)), binwidth = 0.05, boundary = 0, alpha = .8) +
147+
facet_grid( ~ limit) +
148+
theme(panel.grid = element_blank(),
149+
legend.position = "none") +
150+
scale_fill_manual(values = plotcolors) +
151+
scale_x_continuous(breaks = seq(-3, 3, by =1)) +
152+
scale_y_continuous(limits = c(0, 1000)) +
153+
ggtitle(paste("mu =", mus[i]))
154+
}
155+
156+
plist <- lapply(seq_along(dd), function(a) pfunc(dd[[a]], a))
157+
grid.arrange(grobs = plist, nrow = 3)
141158
```
142159

vignettes/simstudy.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ The *clusterSize* distribution allocates a total sample size *N* (specified in t
231231

232232
#### custom
233233

234-
The *custom* distribution facilitates data generation for a user-defined distribution. The name of the user-defined function is specified as a string in the *formula* argument. The arguments of the function are specified in the *variance* argument, as a comma delimited string, such as "**name_arg_1 = value/formula_1, name_arg_2 = value/formula_2, ..., name_arg_K = value/formula_K**". The *name_arg_k*, $x \in \{1,2,...,K\}$, are required to create the $K$ arguments that are passed to the customized function. The *val/form_k* represent either values or a formula that is used to generate the values for the argument; if formulas are used, variables in the formulas must have been generated previously. Double dot notation is available in specifying the *value/formula_k*. One important feature of the custom function is that the parameter list used to define the function must include "**n = n**", but this will not be included in the data definition.
234+
Custom distributions can be specified in `defData` and `defDataAdd` by setting the argument *dist* to "custom". When defining a custom distribution, provide the name of the user-defined function as a string in the *formula* argument. The arguments of the custom function are listed in the *variance* argument, separated by commas and formatted as "**arg_1 = val_form_1, arg_2 = val_form_2, $\dots$, arg_K = val_form_K**". The *arg_k's* represent the names of the arguments passed to the customized function, where $k$ ranges from $1$ to $K$. Values or formulas can be used for each *val_form_k*. If formulas are used, ensure that the variables have been previously generated. Double dot notation is available in specifying *value_formula_k*. One important requirement of the custom function is that the parameter list used to define the function must include an argument"**n = n**", but do not include $n$ in the definition as part of `defData` or `defDataAdd`.
235235

236236
#### exponential
237237

0 commit comments

Comments
 (0)