Skip to content

Commit ae2b57a

Browse files
authored
DOC, TST: Wrapping of Keras models (#713)
1 parent 37405f7 commit ae2b57a

File tree

4 files changed

+198
-0
lines changed

4 files changed

+198
-0
lines changed

ci/posix.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ jobs:
2020
sklearnDev:
2121
envFile: 'ci/environment-3.7.yaml'
2222
SKLEARN_DEV: "yes"
23+
WRAPPERS: "no"
2324

2425
steps:
2526
- bash: echo "##vso[task.prependpath]$CONDA/bin"
@@ -37,9 +38,20 @@ jobs:
3738
conda install -y -q pytorch cpuonly -c pytorch -n dask-ml-test
3839
source activate dask-ml-test
3940
pip install skorch
41+
python -c "import torch; print('PyTorch ' + torch.__version__)"
42+
python -c "import skorch; print('Skorch ' + skorch.__version__)"
4043
displayName: "install PyTorch"
4144
condition: eq(variables['Build.SourceBranch'], 'refs/heads/master')
4245
46+
- bash: |
47+
source activate dask-ml-test
48+
pip install tensorflow>=2.3.0
49+
pip install scikeras>=0.1.8
50+
python -c "import tensorflow as tf; print('TF ' + tf.__version__)"
51+
python -c "import scikeras; print('SciKeras ' + scikeras.__version__)"
52+
displayName: "install Tensorflow and SciKeras"
53+
condition: eq(variables['Build.SourceBranch'], 'refs/heads/master')
54+
4355
- script: |
4456
source activate dask-ml-test
4557
conda uninstall -y --force scikit-learn

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ Scikit-Learn should feel at home with Dask-ML.
133133
joblib.rst
134134
xgboost.rst
135135
pytorch.rst
136+
keras.rst
136137

137138
.. toctree::
138139
:maxdepth: 2

docs/source/keras.rst

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
Keras and Tensorflow
2+
====================
3+
4+
The package SciKeras_ brings a Scikit-learn API to Keras. This allows Dask-ML
5+
to be used seamlessly with Keras models.
6+
7+
Installation
8+
------------
9+
10+
Following the `Tensorflow install directions`_ and `SciKeras install guide`_,
11+
these packages need to be installed:
12+
13+
.. code-block:: bash
14+
15+
$ pip install tensorflow>=2.3.0
16+
$ pip install scikeras>=0.1.8
17+
18+
These are the minimum versions that Dask-ML requires to use Tensorflow/Keras.
19+
20+
.. _Tensorflow install directions: https://www.tensorflow.org/install
21+
.. _SciKeras install guide: https://github.com/adriangb/scikeras#installation
22+
23+
Usage
24+
-----
25+
26+
First, let's start by defining normal function to create our model. This is the
27+
normal way to create a `Keras Sequential model`_
28+
29+
.. _Keras Sequential model: https://keras.io/api/models/sequential/
30+
31+
.. code-block:: python
32+
33+
import tensorflow as tf
34+
from tensorflow.keras.layers import Dense
35+
from tensorflow.keras.models import Sequential
36+
37+
def build_model(lr=0.01, momentum=0.9):
38+
layers = [Dense(512, input_shape=(784,), activation="relu"),
39+
Dense(10, input_shape=(512,), activation="softmax")]
40+
model = Sequential(layers)
41+
42+
opt = tf.keras.optimizers.SGD(
43+
learning_rate=lr, momentum=momentum, nesterov=True,
44+
)
45+
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
46+
return model
47+
48+
Now, we can use the SciKeras to create a Scikit-learn compatible model:
49+
50+
.. code-block:: python
51+
52+
from scikeras.wrappers import KerasClassifier
53+
niceties = dict(verbose=False)
54+
model = KerasClassifier(build_fn=build_model, lr=0.1, momentum=0.9, **niceties)
55+
56+
This model will work with all of Dask-ML: it can use NumPy arrays as inputs and
57+
obeys the Scikit-learn API. For example, it's possible to use Dask-ML to do the
58+
following:
59+
60+
* Use Keras with Dask-ML's model selection, including
61+
:class:`~dask_ml.model_selection.HyperbandSearchCV`.
62+
* Use Keras with Dask-ML's :class:`~dask_ml.wrappers.Incremental`.
63+
64+
If we want to tune ``lr`` and ``momentum``, SciKeras requires that we pass
65+
``lr`` and ``momentum`` at initialization:
66+
67+
.. code-block::
68+
69+
model = KerasClassifier(build_fn=build_model, lr=None, momentum=None, **niceties)
70+
71+
.. _SciKeras: https://github.com/adriangb/scikeras
72+
73+
SciKeras supports more model creation methods, including some that are
74+
backwards-compatible with Tensorflow. Refer to their documentation for details.
75+
76+
Example: Hyperparameter Optimization
77+
------------------------------------
78+
79+
If we wanted to, we could use the model above with
80+
:class:`~dask_ml.model_selection.HyperbandSearchCV`. Let's tune this model on
81+
the MNIST dataset:
82+
83+
.. code-block:: python
84+
85+
from tensorflow.keras.datasets import mnist
86+
from tensorflow.keras.utils import to_categorical
87+
import numpy as np
88+
from typing import Tuple
89+
90+
def get_mnist() -> Tuple[np.ndarray, np.ndarray]:
91+
(X_train, y_train), _ = mnist.load_data()
92+
X_train = X_train.reshape(X_train.shape[0], 784)
93+
X_train = X_train.astype("float32")
94+
X_train /= 255
95+
return X_train, y_train
96+
97+
And let's perform the basic task of tuning our SGD implementation:
98+
99+
.. code-block:: python
100+
101+
from scipy.stats import loguniform, uniform
102+
params = {"lr": loguniform(1e-3, 1e-1), "momentum": uniform(0, 1)}
103+
X, y = get_mnist()
104+
105+
Now, the search can be run:
106+
107+
.. code-block:: python
108+
109+
from dask.distributed import Client
110+
client = Client()
111+
112+
from dask_ml.model_selection import HyperbandSearchCV
113+
search = HyperbandSearchCV(model, params, max_iter=27)
114+
search.fit(X, y)

tests/model_selection/test_keras.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import numpy as np
2+
import pandas as pd
3+
import pytest
4+
from distributed import Nanny
5+
from distributed.utils_test import gen_cluster
6+
from packaging import version
7+
from scipy.stats import loguniform
8+
from sklearn.datasets import make_classification
9+
10+
from dask_ml.model_selection import IncrementalSearchCV
11+
12+
try:
13+
import scikeras
14+
import tensorflow as tf
15+
from scikeras.wrappers import KerasClassifier
16+
from tensorflow.keras.layers import Dense
17+
from tensorflow.keras.models import Sequential
18+
19+
pytestmark = [
20+
pytest.mark.skipif(
21+
version.parse(tf.__version__) < version.parse("2.3.0"),
22+
reason="pickle support",
23+
),
24+
pytest.mark.skipif(
25+
version.parse(scikeras.__version__) < version.parse("0.1.8"),
26+
reason="partial_fit support",
27+
),
28+
]
29+
except ImportError:
30+
pytestmark = pytest.mark.skip(reason="Missing tensorflow or scikeras")
31+
32+
33+
def _keras_build_fn(lr=0.01):
34+
layers = [
35+
Dense(512, input_shape=(784,), activation="relu"),
36+
Dense(10, input_shape=(512,), activation="softmax"),
37+
]
38+
39+
model = Sequential(layers)
40+
41+
opt = tf.keras.optimizers.SGD(learning_rate=lr)
42+
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
43+
return model
44+
45+
46+
@gen_cluster(client=True, Worker=Nanny)
47+
def test_keras(c, s, a, b):
48+
# Mirror the mnist dataset
49+
X, y = make_classification(n_classes=10, n_features=784, n_informative=100)
50+
X = X.astype("float32")
51+
assert y.dtype == np.dtype("int64")
52+
53+
model = KerasClassifier(build_fn=_keras_build_fn, lr=0.01, verbose=False)
54+
params = {"lr": loguniform(1e-3, 1e-1)}
55+
56+
search = IncrementalSearchCV(
57+
model, params, max_iter=3, n_initial_parameters=5, decay_rate=None
58+
)
59+
yield search.fit(X, y)
60+
# search.fit(X, y)
61+
62+
assert search.best_score_ >= 0
63+
64+
# Make sure the model trains, and scores aren't constant
65+
scores = {
66+
ident: [h["score"] for h in hist]
67+
for ident, hist in search.model_history_.items()
68+
}
69+
assert all(len(hist) == 3 for hist in scores.values())
70+
nuniq_scores = [pd.Series(v).nunique() for v in scores.values()]
71+
assert max(nuniq_scores) > 1

0 commit comments

Comments
 (0)