You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description: Provides functions for evaluating, displaying, and interpreting statistical models. The goal is to abstract the operations on models from the particular architecture of the model. For instance, calculating effect sizes rather than looking at coefficients. The package includes interfaces to both regression and classification architectures, including lm, glm, random forests and recursive partitioning, knn, linear and quadratic discriminant analysis, and models produced by the caret package's train. It's straightforward to add in other other model architectures.
8
+
Description: Provides functions for evaluating, displaying, and interpreting statistical models. The goal is to abstract the operations on models from the particular architecture of the model. For instance, calculating effect sizes rather than looking at coefficients. The package includes interfaces to both regression and classification architectures, including lm(), glm(), MASS::rlm(), random forests and recursive partitioning, k-nearest neighbors, linear and quadratic discriminant analysis, and models produced by the caret package's `train()`. It's straightforward to add in other other model architectures.
The `mosaicModel` package provides a basic interface for interpreting and displaying models. From the early beginnings of R, methods such as `summary`, `plot`, and `predict` provided a consistent vocabulary for generating model output and reports, but the format and contents of those reports depended strongly on the specifics of the model architecture. For example, for architectures such as `lm` and `glm`, the `summary` method produces a regression table showing point estimates and standard errors on model coefficients. But other widely used architectures such as random forests or k-nearest neighbors do not generate coefficients and so need to be displayed and interpreted in other ways.
26
+
The `mosaicModel` package provides a basic interface for interpreting and displaying models. From the early beginnings of R, methods such as `summary()`, `plot()`, and `predict()` provided a consistent vocabulary for generating model output and reports, but the format and contents of those reports depended strongly on the specifics of the model architecture. For example, for architectures such as `lm()` and `glm()`, the `summary()` method produces a regression table showing point estimates and standard errors on model coefficients. But other widely used architectures such as random forests or k-nearest neighbors do not generate coefficients and so need to be displayed and interpreted in other ways.
27
27
28
28
To provide a general interface for displaying and interpreting models, the `mosaicModel` package provides an alternative structure of operations that make sense for a wide range of model architectures, including those typically grouped under the term "machine learning."
29
29
@@ -40,8 +40,8 @@ The package implements operations that can be applied to a wide range of model a
40
40
`mosaicModel` stays out of the business of training models. You do that using functions, e.g.
41
41
42
42
- the familiar `lm` or `glm` provided by the `stats` package
43
-
-`train` from the `caret` package for machine learning
44
-
-`rpart`, `randomForest`, `rlm`, and other functions provided by other packages
43
+
-`train()` from the `caret` package for machine learning
44
+
-`rpart()`, `randomForest`, `rlm`, and other functions provided by other packages
45
45
46
46
The package authors will try to expand the repertoire as demand requires. (See the section on [adding new model architectures](#new-architectures).)
47
47
@@ -73,9 +73,9 @@ fuel_mod_1 <- lm(mpg ~ hp * transmission, data = mtcars)
The second model includes a nonlinear dependence on horsepower. You can think of `ns` as standing for "not straight" with the integer describing the amount of "curviness" allowed.
76
+
The second model includes a nonlinear dependence on horsepower. You can think of `ns()` as standing for "not straight" with the integer describing the amount of "curviness" allowed.
77
77
78
-
For models involving only a very few explanatory variables, a plot of the model can give immediate insight. The `mod_plot` function reduces the work to make such a plot.
78
+
For models involving only a very few explanatory variables, a plot of the model can give immediate insight. The `mod_plot()` function reduces the work to make such a plot.
79
79
80
80
```{r out.width = "30%"}
81
81
mod_plot(fuel_mod_1)
@@ -120,7 +120,7 @@ Since this is a classifier, the plot of the model function shows the *probabilit
120
120
121
121
If your interest is in a class other than *virginica*, you can specify the class you want with an additional argument, e.g. `class_level = "setosa"`.
122
122
123
-
The second iris model has four explanatory variables. This is as many as `mod_plot` will display:
123
+
The second iris model has four explanatory variables. This is as many as `mod_plot()` will display:
The `mod_plot` function creates a graphical display of the output of the model for a range of model inputs. The `mod_eval` function (which `mod_plot` uses internally), produces the output in tabular form, e.g.
139
+
The `mod_plot` function creates a graphical display of the output of the model for a range of model inputs. The `mod_eval()` function (which `mod_plot()` uses internally), produces the output in tabular form, e.g.
140
140
141
141
```{r}
142
142
mod_eval(fuel_mod_1, transmission = "manual", hp = 200)
143
143
```
144
-
`mod_eval` tries to do something sensible if you don't specify a value (or a range of values) for an explanatory variable.
144
+
`mod_eval()` tries to do something sensible if you don't specify a value (or a range of values) for an explanatory variable.
It's often helpful in interpreting a model to know how the output changes with a change in one of the inputs. Traditionally, model coefficients have been used for this purpose. But not all model architectures produce coefficients (e.g. random forest) and even in those that do use of interactions and nonlinear terms spreads out the information across multiple coefficients. As an alternative, `mod_effect` calculates a model input at one set of values, repeats the calculation after modifying a selected input, and combines the result into a "rate-of-change/slope" or a finite-difference.
165
165
166
-
Here, `mod_effect` is calculating the rate of change of fuel consumption (remember, the output of `fuel_mod_1` is in term of `mpg`) with respect to `hp`:
166
+
Here, `mod_effect()` is calculating the rate of change of fuel consumption (remember, the output of `fuel_mod_1` is in term of `mpg`) with respect to `hp`:
167
167
168
168
```{r}
169
169
mod_effect(fuel_mod_2, ~ hp)
170
170
```
171
-
Since no specific inputs were specified, `mod_effect` attempted to do something sensible.
171
+
Since no specific inputs were specified, `mod_effect()` attempted to do something sensible.
172
172
173
173
You can, of course, specify the inputs you want, for instance:
174
174
```{r}
@@ -256,7 +256,7 @@ The result suggests a lower bias but higher variance for the second fuel model c
256
256
257
257
## Available model architectures
258
258
259
-
"Architecture" is used to refer to the class of model. For instance, a linear model, random forests, recursive partitioning. Use the model training functions, such as `lm`, `glm`, `rlm` in the `stats` package or in other packages such as `caret` or `zelig`.
259
+
"Architecture" is used to refer to the class of model. For instance, a linear model, random forests, recursive partitioning. Use the model training functions, such as `lm()`, `glm()`, `rlm()` in the `stats` package or in other packages such as `caret` or `zelig`.
260
260
261
261
You can find out which model architectures are available with the command
262
262
```{r}
@@ -272,16 +272,16 @@ The package authors would like to have this package ready-to-run with commonly u
272
272
R programmers can add their own model architectures by adding S3 methods for these functions:
273
273
274
274
275
-
-`formula_from_mod`
276
-
-`data_from_mod`
277
-
-`mod_eval_fun` evaluates the model at specified values of the input variables. This is much like `predict()`, from which it is often built.
275
+
-`formula_from_mod()`
276
+
-`data_from_mod()`
277
+
-`mod_eval_fun()` evaluates the model at specified values of the input variables. This is much like `predict()`, from which it is often built.
278
278
-`construct_fitting_call`
279
279
280
280
The code for the generic and some methods are in the source .R files of the same name. This may give you some idea of how to write your own methods.
281
281
282
282
It often happens that there is a sensible default method that covers lots of model architectures. You can try this out directly by running `mosaicModel:::data_from_mod.default()` (or a similar name) on the model architecture you want to support.
283
283
284
-
To illustrate, let's add a set of methods for the `MASS` package's `lda` and `qda` model architectures for classification.
284
+
To illustrate, let's add a set of methods for the `MASS` package's `lda()` and `qda()` model architectures for classification.
285
285
286
286
Step 1 is to create a model of the architecture you're interested in. Remember that you will need to attach any packages needed for that kind of model.
287
287
@@ -314,30 +314,30 @@ mod_eval_fun(my_mod)
314
314
Error in mod_eval_fun.default(my_mod) : The modelMosaic package doesn't have access to an evaluation function for this kind of model object.
315
315
```
316
316
317
-
Now, of course, there is a `mod_eval_fun` method for models of class `knn3`. How did we go about implementing it?
317
+
Now, of course, there is a `mod_eval_fun()` method for models of class `knn3`. How did we go about implementing it?
318
318
319
-
To start, let's see if there is a `predict` method defined. This is a pretty common practice among those writing model-training functions. Regretably, there is considerable variety in the programming interface to `predict` methods, so it's quite common to have to implement a wrapper to interface any existing `predict` method to `mosaicModel`.
319
+
To start, let's see if there is a `predict` method defined. This is a pretty common practice among those writing model-training functions. Regretably, there is considerable variety in the programming interface to `predict()` methods, so it's quite common to have to implement a wrapper to interface any existing `predict()` method to `mosaicModel`.
320
320
```{r}
321
321
methods(class = "lda")
322
322
```
323
323
324
-
Refer to the help page for `predict.lda` to see what the argument names are. `newdata =` is often the name of the argument for specifying the model inputs, but sometimes it's `x` or `data` or whatever.
324
+
Refer to the help page for `predict.lda()` to see what the argument names are. `newdata =` is often the name of the argument for specifying the model inputs, but sometimes it's `x` or `data` or whatever.
325
325
326
-
Since `lda`/`qda` is a classifier, the form of output we would like to produce is a table of probabilities for each class level for each input class. This is the standard expected by `mosaicModel`. Let's look at the output of `predict`:
326
+
Since `lda`/`qda` is a classifier, the form of output we would like to produce is a table of probabilities for each class level for each input class. This is the standard expected by `mosaicModel`. Let's look at the output of `predict()`:
327
327
328
328
```{r}
329
329
predict(my_mod) %>% str()
330
330
```
331
331
332
-
This is something of a detective story, but a person very familiar with `lda` and with R will see that the `predict` method produces a list of two items. The second one called `posterior` and is a matrix with 150 rows and 3 columns, corresponding to the size of the training data.
332
+
This is something of a detective story, but a person very familiar with `lda()` and with R will see that the `predict` method produces a list of two items. The second one called `posterior` and is a matrix with 150 rows and 3 columns, corresponding to the size of the training data.
333
333
334
-
Once located, do what you need in order to coerce the output to a data frame and remove row names (for consistency of output). Here's the `mod_eval_fun.lda` function from `mosaicModel`.
334
+
Once located, do what you need in order to coerce the output to a data frame and remove row names (for consistency of output). Here's the `mod_eval_fun.lda()` function from `mosaicModel`.
335
335
336
336
```{r}
337
337
mosaicModel:::mod_eval_fun.lda
338
338
```
339
339
340
-
The arguments to the function are the same as for all the `mod_eval_fun` methods. The body of the function pulls out the `posterior` component, coerces it to a data frame and removes the row names. It isn't always this easy. But once the function is available in your session, you can test it out. (Make sure to give it a data set as inputs to the model)
340
+
The arguments to the function are the same as for all the `mod_eval_fun()` methods. The body of the function pulls out the `posterior` component, coerces it to a data frame and removes the row names. It isn't always this easy. But once the function is available in your session, you can test it out. (Make sure to give it a data set as inputs to the model)
341
341
342
342
```{r error = TRUE}
343
343
mod_eval_fun(my_mod, data = iris[c(30, 80, 120),])
0 commit comments