Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit d0d9c57

Browse files
committed
[SPARK-18795][ML][SPARKR][DOC] Added KSTest section to SparkR vignettes
## What changes were proposed in this pull request? Added short section for KSTest. Also added logreg model to list of ML models in vignette. (This will be reorganized under SPARK-18849) ![screen shot 2016-12-14 at 1 37 31 pm](https://cloud.githubusercontent.com/assets/5084283/21202140/7f24e240-c202-11e6-9362-458208bb9159.png) ## How was this patch tested? Manually tested example locally. Built vignettes locally. Author: Joseph K. Bradley <[email protected]> Closes apache#16283 from jkbradley/ksTest-vignette. (cherry picked from commit 7862742) Signed-off-by: Joseph K. Bradley <[email protected]>
1 parent c4de90f commit d0d9c57

File tree

1 file changed

+28
-1
lines changed

1 file changed

+28
-1
lines changed

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -469,6 +469,10 @@ SparkR supports the following machine learning models and algorithms.
469469

470470
* Isotonic Regression Model
471471

472+
* Logistic Regression Model
473+
474+
* Kolmogorov-Smirnov Test
475+
472476
More will be added in the future.
473477

474478
### R Formula
@@ -800,7 +804,7 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
800804
head(predict(isoregModel, newDF))
801805
```
802806

803-
### Logistic Regression Model
807+
#### Logistic Regression Model
804808

805809
(Added in 2.1.0)
806810

@@ -834,6 +838,29 @@ model <- spark.logit(df, Species ~ ., regParam = 0.5)
834838
summary(model)
835839
```
836840

841+
#### Kolmogorov-Smirnov Test
842+
843+
`spark.kstest` runs a two-sided, one-sample [Kolmogorov-Smirnov (KS) test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
844+
Given a `SparkDataFrame`, the test compares continuous data in a given column `testCol` with the theoretical distribution
845+
specified by parameter `nullHypothesis`.
846+
Users can call `summary` to get a summary of the test results.
847+
848+
In the following example, we test whether the `longley` dataset's `Armed_Forces` column
849+
follows a normal distribution. We set the parameters of the normal distribution using
850+
the mean and standard deviation of the sample.
851+
852+
```{r, warning=FALSE}
853+
df <- createDataFrame(longley)
854+
afStats <- head(select(df, mean(df$Armed_Forces), sd(df$Armed_Forces)))
855+
afMean <- afStats[1]
856+
afStd <- afStats[2]
857+
858+
test <- spark.kstest(df, "Armed_Forces", "norm", c(afMean, afStd))
859+
testSummary <- summary(test)
860+
testSummary
861+
```
862+
863+
837864
### Model Persistence
838865
The following example shows how to save/load an ML model by SparkR.
839866
```{r, warning=FALSE}

0 commit comments

Comments
 (0)