R library for converting Apache Spark ML pipelines to PMML.
This package is a thin R wrapper for the JPMML-SparkML library.
- Apache Spark 3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X, 3.5.X, 4.0.X or 4.1.X.
- R 3.3 or newer.
Install from GitHub using the devtools package:
library("devtools")
install_github("jpmml/sparklyr2pmml")Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:
Active development branches:
| Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version |
|---|---|---|
| 3.4.X | 3.0.X |
3.0.10 |
| 3.5.X | 3.1.X |
3.1.10 |
| 4.0.X | 3.2.X |
3.2.9 |
| 4.1.X | master |
3.3.2 |
Stale development branches:
| Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version |
|---|---|---|
| 3.0.X | 2.0.X |
2.0.6 |
| 3.1.X | 2.1.X |
2.1.6 |
| 3.2.X | 2.2.X |
2.2.6 |
| 3.3.X | 2.3.X |
2.3.5 |
| 3.4.X | 2.4.X |
2.4.4 |
| 3.5.X | 2.5.X |
2.5.3 |
Launch Sparklyr; use the sparklyr.connect.packages configuration option to specify the coordinates of relevant JPMML-SparkML modules:
org.jpmml:pmml-sparkml:${version}- Core module.org.jpmml:pmml-sparkml-lightgbm:${version}- LightGBM via SynapseML extension module.org.jpmml:pmml-sparkml-xgboost:${version}- XGBoost via XGBoost4J-Spark extension module.
Launching core:
library("sparklyr")
config = spark_config()
config[["sparklyr.connect.packages"]] = "org.jpmml:pmml-sparkml:${version}"
sc = spark_connect(master = "local", config = config)Fitting a Spark ML pipeline:
library("dplyr")
library("sparklyr")
data(iris)
iris_df = copy_to(sc, iris)
iris_pipeline = ml_pipeline(sc) %>%
ft_r_formula(Species ~ .) %>%
ml_decision_tree_classifier()
iris_pipeline_model = ml_fit(iris_pipeline, iris_df)Exporting the fitted Spark ML pipeline to a PMML file:
library("sparklyr2pmml")
pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)
buildFile(pmmlBuilder, "DecisionTreeIris.pmml")Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.
If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.
Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact info@openscoring.io