Skip to content

Commit 7191a7d

Browse files
authored
Merge pull request #21 from big-o/develop
v0.0.5
2 parents d65705f + 7df1a10 commit 7191a7d

File tree

14 files changed

+224
-117
lines changed

14 files changed

+224
-117
lines changed

.coveragerc

Lines changed: 0 additions & 21 deletions
This file was deleted.

doc/_static/img/cover.png

-10.1 KB
Loading

doc/_static/img/dag2.png

-55.4 KB
Loading

doc/_static/img/dag2a.png

50.1 KB
Loading

doc/_static/img/dag3.png

-80.2 KB
Loading

doc/_static/img/dag3a.png

62.3 KB
Loading

doc/quick_start.rst

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -26,23 +26,26 @@ The simplest DAGs are just a chain of singular dependencies. These DAGs may be
2626
created from the :meth:`skdag.dag.DAG.from_pipeline` method in the same way as a
2727
DAG:
2828

29-
>>> from sklearn.decomposition import PCA
30-
>>> from sklearn.impute import SimpleImputer
31-
>>> from sklearn.linear_model import LogisticRegression
32-
>>> dag = DAG.from_pipeline(
33-
... steps=[
34-
... ("impute", SimpleImputer()),
35-
... ("pca", PCA()),
36-
... ("lr", LogisticRegression())
37-
... ]
38-
... )
39-
>>> dag.draw()
40-
o impute
41-
|
42-
o pca
43-
|
44-
o lr
45-
<BLANKLINE>
29+
.. code-block:: python
30+
31+
>>> from skdag import DAGBuilder
32+
>>> from sklearn.decomposition import PCA
33+
>>> from sklearn.impute import SimpleImputer
34+
>>> from sklearn.linear_model import LogisticRegression
35+
>>> dag = DAGBuilder().from_pipeline(
36+
... steps=[
37+
... ("impute", SimpleImputer()),
38+
... ("pca", PCA()),
39+
... ("lr", LogisticRegression())
40+
... ]
41+
... ).make_dag()
42+
>>> dag.show()
43+
o impute
44+
|
45+
o pca
46+
|
47+
o lr
48+
<BLANKLINE>
4649
4750
.. image:: _static/img/dag1.png
4851

@@ -52,7 +55,6 @@ estimator:
5255

5356
.. code-block:: python
5457
55-
>>> from skdag import DAGBuilder
5658
>>> dag = (
5759
... DAGBuilder(infer_dataframe=True)
5860
... .add_step("impute", SimpleImputer())
@@ -61,15 +63,15 @@ estimator:
6163
... .add_step("lr", LogisticRegression(random_state=0), deps=["blood", "vitals"])
6264
... .make_dag()
6365
... )
64-
>>> dag.draw()
66+
>>> dag.show()
6567
o impute
6668
|\
6769
o o blood,vitals
6870
|/
6971
o lr
7072
<BLANKLINE>
7173
72-
.. image:: _static/img/dag2.png
74+
.. image:: _static/img/dag2a.png
7375

7476
In the above examples we pass the first four columns directly to a regressor, but
7577
the remaining columns have dimensionality reduction applied first before being
@@ -82,36 +84,36 @@ on how to control this behaviour, see the `User Guide <user_guide.html>`_.
8284
The DAG may now be used as an estimator in its own right:
8385

8486
>>> from sklearn import datasets
85-
>>> X, y = datasets.load_diabetes(return_X_y=True)
86-
>>> dag.fit_predict(X, y)
87-
array([...
87+
>>> X, y = datasets.load_diabetes(return_X_y=True, as_frame=True)
88+
>>> type(dag.fit_predict(X, y))
89+
<class 'pandas.core.series.Series'>
8890

8991
In an extension to the scikit-learn estimator interface, DAGs also support multiple
9092
inputs and multiple outputs. Let's say we want to compare two different classifiers:
9193

9294
>>> from sklearn.ensemble import RandomForestClassifier
93-
>>> cal = DAG.from_pipeline(
95+
>>> cal = DAGBuilder(infer_dataframe=True).from_pipeline(
9496
... [("rf", RandomForestClassifier(random_state=0))]
95-
... )
97+
... ).make_dag()
9698
>>> dag2 = dag.join(cal, edges=[("blood", "rf"), ("vitals", "rf")])
97-
>>> dag2.draw()
99+
>>> dag2.show()
98100
o impute
99101
|\
100102
o o blood,vitals
101103
|x|
102104
o o lr,rf
103105
<BLANKLINE>
104106

105-
.. image:: _static/img/dag3.png
107+
.. image:: _static/img/dag3a.png
106108

107109
Now our DAG will return two outputs: one from each classifier. Multiple outputs are
108110
returned as a :class:`sklearn.utils.Bunch<Bunch>`:
109111

110112
>>> y_pred = dag2.fit_predict(X, y)
111-
>>> y_pred.lr
112-
array([...
113-
>>> y_pred.rf
114-
array([...
113+
>>> type(y_pred.lr)
114+
<class 'pandas.core.series.Series'>
115+
>>> type(y_pred.rf)
116+
<class 'pandas.core.series.Series'>
115117

116118
Similarly, multiple inputs are also acceptable and inputs can be provided by
117119
specifying ``X`` and ``y`` as ``dict``-like objects.

doc/user_guide.rst

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,17 @@ scikit-learn :class:`~sklearn.pipeline.Pipeline`. These DAGs may be created from
1818

1919
.. code-block:: python
2020
21+
>>> from skdag import DAGBuilder
2122
>>> from sklearn.decomposition import PCA
2223
>>> from sklearn.impute import SimpleImputer
2324
>>> from sklearn.linear_model import LogisticRegression
24-
>>> dag = DAG.from_pipeline(
25+
>>> dag = DAGBuilder(infer_dataframe=True).from_pipeline(
2526
... steps=[
2627
... ("impute", SimpleImputer()),
2728
... ("pca", PCA()),
2829
... ("lr", LogisticRegression())
29-
... ],
30-
... infer_dataframe=True,
31-
... )
30+
... ]
31+
... ).make_dag()
3232
3333
You may view a diagram of the DAG with the :meth:`~skdag.dag.DAG.show` method. In a
3434
notbook environment this will display an image, whereas in a terminal it will generate
@@ -97,19 +97,20 @@ The DAG may now be used as an estimator in its own right:
9797
.. code-block:: python
9898
9999
>>> from sklearn import datasets
100-
>>> X, y = datasets.load_diabetes(return_X_y=True)
101-
>>> dag.fit_predict(X, y)
102-
array([...
100+
>>> X, y = datasets.load_diabetes(return_X_y=True, as_frame=True)
101+
>>> y_hat = dag.fit_predict(X, y)
102+
>>> type(y_hat)
103+
<class 'pandas.core.series.Series'>
103104
104105
In an extension to the scikit-learn estimator interface, DAGs also support multiple
105106
inputs and multiple outputs. Let's say we want to compare two different classifiers:
106107

107108
.. code-block:: python
108109
109110
>>> from sklearn.ensemble import RandomForestClassifier
110-
>>> rf = DAG.from_pipeline(
111+
>>> rf = DAGBuilder().from_pipeline(
111112
... [("rf", RandomForestClassifier(random_state=0))]
112-
... )
113+
... ).make_dag()
113114
>>> dag2 = dag.join(rf, edges=[("blood", "rf"), ("vitals", "rf")])
114115
>>> dag2.show()
115116
o impute
@@ -126,10 +127,14 @@ returned as a :class:`sklearn.utils.Bunch<Bunch>`:
126127
.. code-block:: python
127128
128129
>>> y_pred = dag2.fit_predict(X, y)
129-
>>> y_pred.lr
130-
array([...
131-
>>> y_pred.rf
132-
array([...
130+
>>> type(y_pred.lr)
131+
<class 'pandas.core.series.Series'>
132+
>>> type(y_pred.rf)
133+
<class 'numpy.ndarray'>
134+
135+
Note that we have different types of output here because ``LogisticRegression`` natively
136+
supports dataframe input whereas ``RandomForestClassifier`` does not. We could fix this
137+
by specifying ``infer_dataframe=True`` when we createed our ``rf`` DAG extension.
133138

134139
Similarly, multiple inputs are also acceptable and inputs can be provided by
135140
specifying ``X`` and ``y`` as ``dict``-like objects.
@@ -174,6 +179,7 @@ the next step(s).
174179
... .make_dag()
175180
... )
176181
>>> stack.fit(X_train, y_train)
182+
DAG(...
177183
178184
.. image:: _static/img/stack.png
179185
@@ -210,7 +216,7 @@ as a dictionary of step name to column indices instead:
210216
... .add_step("pass", "passthrough")
211217
... .add_step("rf", RandomForestClassifier(), deps=["pass"])
212218
... .add_step("svr", SVC(), deps=["pass"])
213-
... .add_step("meta", LinearRegression(), deps={"rf": 1, "svc": 1}])
219+
... .add_step("meta", LinearRegression(), deps={"rf": 1, "svr": 1})
214220
... .make_dag()
215221
... )
216222

setup.cfg

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,38 @@ description-file = README.rst
55
test = pytest
66

77
[tool:pytest]
8+
doctest_optionflags = NORMALIZE_WHITESPACE ELLIPSIS
9+
testpaths = .
810
addopts =
911
-s
1012
--doctest-modules
13+
--doctest-glob="*.rst"
1114
--cov=skdag
1215
--ignore setup.py
1316
--ignore doc/_build
1417
--ignore doc/_templates
1518
--no-cov-on-fail
19+
20+
[coverage:run]
21+
branch = True
22+
source = skdag
23+
include = */skdag/*
24+
omit =
25+
*/tests/*
26+
*_test.py
27+
test_*.py
28+
*/setup.py
29+
30+
[coverage:report]
31+
exclude_lines =
32+
pragma: no cover
33+
def __repr__
34+
if self.debug:
35+
if settings.DEBUG
36+
raise AssertionError
37+
raise NotImplementedError
38+
if 0:
39+
if __name__ == .__main__.:
40+
if self.verbose:
41+
show_missing = True
42+

skdag/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.4"
1+
__version__ = "0.0.5"

0 commit comments

Comments
 (0)