Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit 5693ac8

Browse files
committed
[SPARK-18793][SPARK-18794][R] add spark.randomForest/spark.gbt to vignettes
## What changes were proposed in this pull request? Mention `spark.randomForest` and `spark.gbt` in vignettes. Keep the content minimal since users can type `?spark.randomForest` to see the full doc. cc: jkbradley Author: Xiangrui Meng <[email protected]> Closes apache#16264 from mengxr/SPARK-18793. (cherry picked from commit 594b14f) Signed-off-by: Xiangrui Meng <[email protected]>
1 parent 25b9758 commit 5693ac8

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,6 +449,10 @@ SparkR supports the following machine learning models and algorithms.
449449

450450
* Generalized Linear Model (GLM)
451451

452+
* Random Forest
453+
454+
* Gradient-Boosted Trees (GBT)
455+
452456
* Naive Bayes Model
453457

454458
* $k$-means Clustering
@@ -526,6 +530,34 @@ gaussianFitted <- predict(gaussianGLM, carsDF)
526530
head(select(gaussianFitted, "model", "prediction", "mpg", "wt", "hp"))
527531
```
528532

533+
#### Random Forest
534+
535+
`spark.randomForest` fits a [random forest](https://en.wikipedia.org/wiki/Random_forest) classification or regression model on a `SparkDataFrame`.
536+
Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
537+
538+
In the following example, we use the `longley` dataset to train a random forest and make predictions:
539+
540+
```{r, warning=FALSE}
541+
df <- createDataFrame(longley)
542+
rfModel <- spark.randomForest(df, Employed ~ ., type = "regression", maxDepth = 2, numTrees = 2)
543+
summary(rfModel)
544+
predictions <- predict(rfModel, df)
545+
```
546+
547+
#### Gradient-Boosted Trees
548+
549+
`spark.gbt` fits a [gradient-boosted tree](https://en.wikipedia.org/wiki/Gradient_boosting) classification or regression model on a `SparkDataFrame`.
550+
Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
551+
552+
Similar to the random forest example above, we use the `longley` dataset to train a gradient-boosted tree and make predictions:
553+
554+
```{r, warning=FALSE}
555+
df <- createDataFrame(longley)
556+
gbtModel <- spark.gbt(df, Employed ~ ., type = "regression", maxDepth = 2, maxIter = 2)
557+
summary(gbtModel)
558+
predictions <- predict(gbtModel, df)
559+
```
560+
529561
#### Naive Bayes Model
530562

531563
Naive Bayes model assumes independence among the features. `spark.naiveBayes` fits a [Bernoulli naive Bayes model](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Bernoulli_naive_Bayes) against a SparkDataFrame. The data should be all categorical. These models are often used for document classification.

0 commit comments

Comments
 (0)