Skip to content
This repository was archived by the owner on Dec 4, 2019. It is now read-only.

Commit d6f6f56

Browse files
authored
[#54] Update to spark 2.1
Updated spark-sklearn to be compatible with spark versions >= 2.1.1. This change is not backwards compatible with spark 2.0.
2 parents 101a956 + 7583ee1 commit d6f6f56

File tree

6 files changed

+6
-10
lines changed

6 files changed

+6
-10
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ cache:
88
- $HOME/.cache/spark-versions
99
env:
1010
matrix:
11-
- SPARK_VERSION="2.0.0" SPARK_BUILD="spark-$SPARK_VERSION-bin-hadoop2.7" SPARK_BUILD_URL="http://d3kbcqa49mib13.cloudfront.net/$SPARK_BUILD.tgz"
11+
- SPARK_VERSION="2.1.1" SPARK_BUILD="spark-$SPARK_VERSION-bin-hadoop2.7" SPARK_BUILD_URL="http://d3kbcqa49mib13.cloudfront.net/$SPARK_BUILD.tgz"
1212

1313
before_install:
1414
- ./bin/download_travis_dependencies.sh

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This project is also available as as [Spark package](http://spark-packages.org/p
2727

2828
The developer version has the following requirements:
2929
- a recent release of scikit-learn. Release 0.17 has been tested, older versions may work too.
30-
- Spark >= 2.0. Spark may be downloaded from the [Spark official website](http://spark.apache.org/). In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
30+
- Spark >= 2.1.1. Spark may be downloaded from the [Spark official website](http://spark.apache.org/). In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
3131
- [nose](https://nose.readthedocs.org) (testing dependency only)
3232
- Pandas, if using the Pandas integration or testing. Pandas==0.18 has been tested.
3333

build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
scalaVersion := "2.10.4"
55

6-
sparkVersion := "2.0.0"
6+
sparkVersion := "2.1.1"
77

88
spName := "databricks/spark-sklearn"
99

python/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This package is released under the Apache 2.0 license. See the LICENSE file.
2121

2222
This package has the following requirements:
2323
- a recent version of scikit-learn. Version 0.17 has been tested, older versions may work too.
24-
- Spark >= 2.0. Spark may be downloaded from the
24+
- Spark >= 2.1.1 Spark may be downloaded from the
2525
[Spark official website](http://spark.apache.org/). In order to use spark-sklearn, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
2626
- [nose](https://nose.readthedocs.org) (testing dependency only)
2727

python/spark_sklearn/keyed_models.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,7 @@ def __init__(self, sklearnEstimator=None, keyCols=["key"], xCol="features",
320320
self._setDefault(**{paramName: paramSpec["default"]
321321
for paramName, paramSpec in KeyedEstimator._paramSpecs.items()
322322
if "default" in paramSpec})
323-
kwargs = KeyedEstimator._inferredParams(sklearnEstimator, self.__init__._input_kwargs)
323+
kwargs = KeyedEstimator._inferredParams(sklearnEstimator, self._input_kwargs)
324324
self._set(**kwargs)
325325

326326
self._verifyEstimatorType()
@@ -489,7 +489,7 @@ def implies(a, b):
489489
if yCol and type(outputType) not in KeyedModel._sql_types:
490490
raise TypeError("Output type {} is not an AtomicType (expected for {} estimator)"
491491
.format(outputType, estimatorType))
492-
self._set(**self.__init__._input_kwargs)
492+
self._set(**self._input_kwargs)
493493

494494
def _verifyEstimatorType(self):
495495
estimatorType = self.getOrDefault("estimatorType")

python/spark_sklearn/tests/test_gapply.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,6 @@ def pandasAggFunction(series):
8585
dataGen = lambda: (random.randrange(GapplyTests.NVALS), random.randrange(GapplyTests.NVALS))
8686
self.checkGapplyEquivalentToPandas(pandasAggFunction, dataType, dataGen)
8787

88-
@unittest.skip("""
89-
python only UDTs can't be nested in arraytypes for now, see SPARK-15989
90-
this is only available starting in Spark 2.0.1
91-
""")
9288
def test_gapply_python_only_udt_val(self):
9389
def pandasAggFunction(series):
9490
x = float(series.apply(lambda pt: int(pt.x) + int(pt.y)).sum())

0 commit comments

Comments
 (0)