@@ -469,6 +469,10 @@ SparkR supports the following machine learning models and algorithms.
469
469
470
470
* Isotonic Regression Model
471
471
472
+ * Logistic Regression Model
473
+
474
+ * Kolmogorov-Smirnov Test
475
+
472
476
More will be added in the future.
473
477
474
478
### R Formula
@@ -800,7 +804,7 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
800
804
head(predict(isoregModel, newDF))
801
805
```
802
806
803
- ### Logistic Regression Model
807
+ #### Logistic Regression Model
804
808
805
809
(Added in 2.1.0)
806
810
@@ -834,6 +838,29 @@ model <- spark.logit(df, Species ~ ., regParam = 0.5)
834
838
summary(model)
835
839
```
836
840
841
+ #### Kolmogorov-Smirnov Test
842
+
843
+ ` spark.kstest ` runs a two-sided, one-sample [ Kolmogorov-Smirnov (KS) test] ( https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test ) .
844
+ Given a ` SparkDataFrame ` , the test compares continuous data in a given column ` testCol ` with the theoretical distribution
845
+ specified by parameter ` nullHypothesis ` .
846
+ Users can call ` summary ` to get a summary of the test results.
847
+
848
+ In the following example, we test whether the ` longley ` dataset's ` Armed_Forces ` column
849
+ follows a normal distribution. We set the parameters of the normal distribution using
850
+ the mean and standard deviation of the sample.
851
+
852
+ ``` {r, warning=FALSE}
853
+ df <- createDataFrame(longley)
854
+ afStats <- head(select(df, mean(df$Armed_Forces), sd(df$Armed_Forces)))
855
+ afMean <- afStats[1]
856
+ afStd <- afStats[2]
857
+
858
+ test <- spark.kstest(df, "Armed_Forces", "norm", c(afMean, afStd))
859
+ testSummary <- summary(test)
860
+ testSummary
861
+ ```
862
+
863
+
837
864
### Model Persistence
838
865
The following example shows how to save/load an ML model by SparkR.
839
866
``` {r, warning=FALSE}
0 commit comments