This repository was archived by the owner on Dec 4, 2019. It is now read-only.
Commit 3dc69d9
authored
[53] Update to sklearn 0.18.1
Currently, spark-sklearn gives deprecation warnings when used with sklearn version .18 because several classes in grid_search and cross_validation were refactored into a new module called model_selection. The changes here make spark-sklearn compatible with the changes introduced in sklearn .18.
The most critical changes reflected in the new version of sklearn is that:
sklearn.model_selection.GridSearchCV now has:
cv_results_ : dict of numpy (masked) ndarrays - A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.
best_estimator_ : estimator - Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.
best_score_ : float - Score of best_estimator on the left out data.
best_params_ : dict - Parameter setting that gave the best results on the hold out data.
best_index_ : int - The index (of the cv_results_ arrays) which corresponds to the best candidate parameter setting.
scorer_ : function - Scorer function used on the held out data to choose the best parameters for the model.
n_splits_ : int - The number of cross-validation splits (folds/iterations).
While spark-sklearn.GridSearchCV has:
grid_scores_ : list of named tuples
best_estimator_ : estimator - Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.
best_score_ : float - Score of best_estimator on the left out data.
best_params_ : dict - Parameter setting that gave the best results on the hold out data.
scorer_ : function - Scorer function used on the held out data to choose the best parameters for the model.
The biggest is that sklearn added the more comprehensive cv_results_ which adds data that the formerly compatible grid_scores_ is lacking.
Note: This version of spark-sklearn is not compatible with sklearn <= .17.File tree
7 files changed
+275
-175
lines changed- python
- spark_sklearn
- tests
7 files changed
+275
-175
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
0 commit comments