You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
Copy file name to clipboardExpand all lines: R/pkg/vignettes/sparkr-vignettes.Rmd
+38-7Lines changed: 38 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -565,7 +565,7 @@ head(aftPredictions)
565
565
566
566
#### Gaussian Mixture Model
567
567
568
-
(Coming in 2.1.0)
568
+
(Added in 2.1.0)
569
569
570
570
`spark.gaussianMixture` fits multivariate [Gaussian Mixture Model](https://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model) (GMM) against a `SparkDataFrame`. [Expectation-Maximization](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) (EM) is used to approximate the maximum likelihood estimator (MLE) of the model.
`spark.lda` fits a [Latent Dirichlet Allocation](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) model on a `SparkDataFrame`. It is often used in topic modeling in which topics are inferred from a collection of text documents. LDA can be thought of as a clustering algorithm as follows:
590
590
@@ -657,7 +657,7 @@ perplexity
657
657
658
658
#### Multilayer Perceptron
659
659
660
-
(Coming in 2.1.0)
660
+
(Added in 2.1.0)
661
661
662
662
Multilayer perceptron classifier (MLPC) is a classifier based on the [feedforward artificial neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC consists of multiple layers of nodes. Each layer is fully connected to the next layer in the network. Nodes in the input layer represent the input data. All other nodes map inputs to outputs by a linear combination of the inputs with the node’s weights $w$ and bias $b$ and applying an activation function. This can be written in matrix form for MLPC with $K+1$ layers as follows:
663
663
$$
@@ -694,7 +694,7 @@ MLPC employs backpropagation for learning the model. We use the logistic loss fu
694
694
695
695
#### Collaborative Filtering
696
696
697
-
(Coming in 2.1.0)
697
+
(Added in 2.1.0)
698
698
699
699
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](http://dl.acm.org/citation.cfm?id=1608614).
700
700
@@ -725,7 +725,7 @@ head(predicted)
725
725
726
726
#### Isotonic Regression Model
727
727
728
-
(Coming in 2.1.0)
728
+
(Added in 2.1.0)
729
729
730
730
`spark.isoreg` fits an [Isotonic Regression](https://en.wikipedia.org/wiki/Isotonic_regression) model against a `SparkDataFrame`. It solves a weighted univariate a regression problem under a complete order constraint. Specifically, given a set of real observed responses $y_1, \ldots, y_n$, corresponding real features $x_1, \ldots, x_n$, and optionally positive weights $w_1, \ldots, w_n$, we want to find a monotone (piecewise linear) function $f$ to minimize
We also expect Decision Tree, Random Forest, Kolmogorov-Smirnov Test coming in the next version 2.1.0.
771
+
### Logistic Regression Model
772
+
773
+
(Added in 2.1.0)
774
+
775
+
[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Predictive Model](https://en.wikipedia.org/wiki/Generalized_linear_model).
776
+
We provide `spark.logit` on top of `spark.glm` to support logistic regression with advanced hyper-parameters.
777
+
It supports both binary and multiclass classification with elastic-net regularization and feature standardization, similar to `glmnet`.
778
+
779
+
We use a simple example to demonstrate `spark.logit` usage. In general, there are three steps of using `spark.logit`:
780
+
1). Create a dataframe from a proper data source; 2). Fit a logistic regression model using `spark.logit` with a proper parameter setting;
781
+
and 3). Obtain the coefficient matrix of the fitted model using `summary` and use the model for prediction with `predict`.
782
+
783
+
Binomial logistic regression
784
+
```{r, warning=FALSE}
785
+
df <- createDataFrame(iris)
786
+
# Create a DataFrame containing two classes
787
+
training <- df[df$Species %in% c("versicolor", "virginica"), ]
788
+
model <- spark.logit(training, Species ~ ., regParam = 0.5)
789
+
summary(model)
790
+
```
791
+
792
+
Predict values on training data
793
+
```{r}
794
+
fitted <- predict(model, training)
795
+
```
796
+
797
+
Multinomial logistic regression against three classes
798
+
```{r, warning=FALSE}
799
+
df <- createDataFrame(iris)
800
+
# Note in this case, Spark infers it is multinomial logistic regression, so family = "multinomial" is optional.
801
+
model <- spark.logit(df, Species ~ ., regParam = 0.5)
802
+
summary(model)
803
+
```
773
804
774
805
### Model Persistence
775
806
The following example shows how to save/load an ML model by SparkR.
0 commit comments