You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
[SPARK-18849][ML][SPARKR][DOC] vignettes final check update
## What changes were proposed in this pull request?
doc cleanup
## How was this patch tested?
~~vignettes is not building for me. I'm going to kick off a full clean build and try again and attach output here for review.~~
Output html here: https://felixcheung.github.io/sparkr-vignettes.html
Author: Felix Cheung <[email protected]>
Closesapache#16286 from felixcheung/rvignettespass.
(cherry picked from commit 7d858bc)
Signed-off-by: Shivaram Venkataraman <[email protected]>
Copy file name to clipboardExpand all lines: R/pkg/vignettes/sparkr-vignettes.Rmd
+12-26Lines changed: 12 additions & 26 deletions
Original file line number
Diff line number
Diff line change
@@ -447,33 +447,31 @@ head(teenagers)
447
447
448
448
SparkR supports the following machine learning models and algorithms.
449
449
450
-
*Generalized Linear Model (GLM)
450
+
*Accelerated Failure Time (AFT) Survival Model
451
451
452
-
* Random Forest
452
+
* Collaborative Filtering with Alternating Least Squares (ALS)
453
+
454
+
* Gaussian Mixture Model (GMM)
455
+
456
+
* Generalized Linear Model (GLM)
453
457
454
458
* Gradient-Boosted Trees (GBT)
455
459
456
-
*Naive Bayes Model
460
+
*Isotonic Regression Model
457
461
458
462
* $k$-means Clustering
459
463
460
-
* Accelerated Failure Time (AFT) Survival Model
461
-
462
-
* Gaussian Mixture Model (GMM)
464
+
* Kolmogorov-Smirnov Test
463
465
464
466
* Latent Dirichlet Allocation (LDA)
465
467
466
-
* Multilayer Perceptron Model
467
-
468
-
* Collaborative Filtering with Alternating Least Squares (ALS)
469
-
470
-
* Isotonic Regression Model
471
-
472
468
* Logistic Regression Model
473
469
474
-
*Kolmogorov-Smirnov Test
470
+
*Multilayer Perceptron Model
475
471
476
-
More will be added in the future.
472
+
* Naive Bayes Model
473
+
474
+
* Random Forest
477
475
478
476
### R Formula
479
477
@@ -601,8 +599,6 @@ head(aftPredictions)
601
599
602
600
#### Gaussian Mixture Model
603
601
604
-
(Added in 2.1.0)
605
-
606
602
`spark.gaussianMixture` fits multivariate [Gaussian Mixture Model](https://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model) (GMM) against a `SparkDataFrame`. [Expectation-Maximization](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) (EM) is used to approximate the maximum likelihood estimator (MLE) of the model.
607
603
608
604
We use a simulated example to demostrate the usage.
`spark.lda` fits a [Latent Dirichlet Allocation](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) model on a `SparkDataFrame`. It is often used in topic modeling in which topics are inferred from a collection of text documents. LDA can be thought of as a clustering algorithm as follows:
626
620
627
621
* Topics correspond to cluster centers, and documents correspond to examples (rows) in a dataset.
@@ -676,8 +670,6 @@ perplexity
676
670
677
671
#### Multilayer Perceptron
678
672
679
-
(Added in 2.1.0)
680
-
681
673
Multilayer perceptron classifier (MLPC) is a classifier based on the [feedforward artificial neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC consists of multiple layers of nodes. Each layer is fully connected to the next layer in the network. Nodes in the input layer represent the input data. All other nodes map inputs to outputs by a linear combination of the inputs with the node’s weights $w$ and bias $b$ and applying an activation function. This can be written in matrix form for MLPC with $K+1$ layers as follows:
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](http://dl.acm.org/citation.cfm?id=1608614).
732
722
733
723
There are multiple options that can be configured in `spark.als`, including `rank`, `reg`, `nonnegative`. For a complete list, refer to the help file.
@@ -757,8 +747,6 @@ head(predicted)
757
747
758
748
#### Isotonic Regression Model
759
749
760
-
(Added in 2.1.0)
761
-
762
750
`spark.isoreg` fits an [Isotonic Regression](https://en.wikipedia.org/wiki/Isotonic_regression) model against a `SparkDataFrame`. It solves a weighted univariate a regression problem under a complete order constraint. Specifically, given a set of real observed responses $y_1, \ldots, y_n$, corresponding real features $x_1, \ldots, x_n$, and optionally positive weights $w_1, \ldots, w_n$, we want to find a monotone (piecewise linear) function $f$ to minimize
[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Predictive Model](https://en.wikipedia.org/wiki/Generalized_linear_model).
808
794
We provide `spark.logit` on top of `spark.glm` to support logistic regression with advanced hyper-parameters.
809
795
It supports both binary and multiclass classification with elastic-net regularization and feature standardization, similar to `glmnet`.
0 commit comments