Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit 2a8de2e

Browse files
felixcheungshivaram
authored andcommitted
[SPARK-18849][ML][SPARKR][DOC] vignettes final check update
## What changes were proposed in this pull request? doc cleanup ## How was this patch tested? ~~vignettes is not building for me. I'm going to kick off a full clean build and try again and attach output here for review.~~ Output html here: https://felixcheung.github.io/sparkr-vignettes.html Author: Felix Cheung <[email protected]> Closes apache#16286 from felixcheung/rvignettespass. (cherry picked from commit 7d858bc) Signed-off-by: Shivaram Venkataraman <[email protected]>
1 parent d399a29 commit 2a8de2e

File tree

1 file changed

+12
-26
lines changed

1 file changed

+12
-26
lines changed

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 12 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -447,33 +447,31 @@ head(teenagers)
447447

448448
SparkR supports the following machine learning models and algorithms.
449449

450-
* Generalized Linear Model (GLM)
450+
* Accelerated Failure Time (AFT) Survival Model
451451

452-
* Random Forest
452+
* Collaborative Filtering with Alternating Least Squares (ALS)
453+
454+
* Gaussian Mixture Model (GMM)
455+
456+
* Generalized Linear Model (GLM)
453457

454458
* Gradient-Boosted Trees (GBT)
455459

456-
* Naive Bayes Model
460+
* Isotonic Regression Model
457461

458462
* $k$-means Clustering
459463

460-
* Accelerated Failure Time (AFT) Survival Model
461-
462-
* Gaussian Mixture Model (GMM)
464+
* Kolmogorov-Smirnov Test
463465

464466
* Latent Dirichlet Allocation (LDA)
465467

466-
* Multilayer Perceptron Model
467-
468-
* Collaborative Filtering with Alternating Least Squares (ALS)
469-
470-
* Isotonic Regression Model
471-
472468
* Logistic Regression Model
473469

474-
* Kolmogorov-Smirnov Test
470+
* Multilayer Perceptron Model
475471

476-
More will be added in the future.
472+
* Naive Bayes Model
473+
474+
* Random Forest
477475

478476
### R Formula
479477

@@ -601,8 +599,6 @@ head(aftPredictions)
601599

602600
#### Gaussian Mixture Model
603601

604-
(Added in 2.1.0)
605-
606602
`spark.gaussianMixture` fits multivariate [Gaussian Mixture Model](https://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model) (GMM) against a `SparkDataFrame`. [Expectation-Maximization](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) (EM) is used to approximate the maximum likelihood estimator (MLE) of the model.
607603

608604
We use a simulated example to demostrate the usage.
@@ -620,8 +616,6 @@ head(select(gmmFitted, "V1", "V2", "prediction"))
620616

621617
#### Latent Dirichlet Allocation
622618

623-
(Added in 2.1.0)
624-
625619
`spark.lda` fits a [Latent Dirichlet Allocation](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) model on a `SparkDataFrame`. It is often used in topic modeling in which topics are inferred from a collection of text documents. LDA can be thought of as a clustering algorithm as follows:
626620

627621
* Topics correspond to cluster centers, and documents correspond to examples (rows) in a dataset.
@@ -676,8 +670,6 @@ perplexity
676670

677671
#### Multilayer Perceptron
678672

679-
(Added in 2.1.0)
680-
681673
Multilayer perceptron classifier (MLPC) is a classifier based on the [feedforward artificial neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC consists of multiple layers of nodes. Each layer is fully connected to the next layer in the network. Nodes in the input layer represent the input data. All other nodes map inputs to outputs by a linear combination of the inputs with the node’s weights $w$ and bias $b$ and applying an activation function. This can be written in matrix form for MLPC with $K+1$ layers as follows:
682674
$$
683675
y(x)=f_K(\ldots f_2(w_2^T f_1(w_1^T x + b_1) + b_2) \ldots + b_K).
@@ -726,8 +718,6 @@ head(select(predictions, predictions$prediction))
726718

727719
#### Collaborative Filtering
728720

729-
(Added in 2.1.0)
730-
731721
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](http://dl.acm.org/citation.cfm?id=1608614).
732722

733723
There are multiple options that can be configured in `spark.als`, including `rank`, `reg`, `nonnegative`. For a complete list, refer to the help file.
@@ -757,8 +747,6 @@ head(predicted)
757747

758748
#### Isotonic Regression Model
759749

760-
(Added in 2.1.0)
761-
762750
`spark.isoreg` fits an [Isotonic Regression](https://en.wikipedia.org/wiki/Isotonic_regression) model against a `SparkDataFrame`. It solves a weighted univariate a regression problem under a complete order constraint. Specifically, given a set of real observed responses $y_1, \ldots, y_n$, corresponding real features $x_1, \ldots, x_n$, and optionally positive weights $w_1, \ldots, w_n$, we want to find a monotone (piecewise linear) function $f$ to minimize
763751
$$
764752
\ell(f) = \sum_{i=1}^n w_i (y_i - f(x_i))^2.
@@ -802,8 +790,6 @@ head(predict(isoregModel, newDF))
802790

803791
#### Logistic Regression Model
804792

805-
(Added in 2.1.0)
806-
807793
[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Predictive Model](https://en.wikipedia.org/wiki/Generalized_linear_model).
808794
We provide `spark.logit` on top of `spark.glm` to support logistic regression with advanced hyper-parameters.
809795
It supports both binary and multiclass classification with elastic-net regularization and feature standardization, similar to `glmnet`.

0 commit comments

Comments
 (0)