Skip to content

Commit 1691bbb

Browse files
committed
Merge branch 'main' of github.com:big-o/skdag into main
2 parents 80f041d + 1eab9c9 commit 1691bbb

File tree

3 files changed

+224
-5
lines changed

3 files changed

+224
-5
lines changed

doc/_static/img/cover.svg

Lines changed: 154 additions & 0 deletions
Loading

doc/index.rst

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,70 @@ scikit-dag (``skdag``) is an open-sourced, MIT-licenced library that provides ad
55
workflow management to any machine learning operations that follow
66
:mod:`sklearn` conventions. It does this by introducing Directed Acyclic
77
Graphs (:class:`skdag.dag.DAG`) as a drop-in replacement for traditional scikit-learn
8-
:mod:`sklearn.pipeline.Pipeline`.
8+
:mod:`sklearn.pipeline.Pipeline`. This gives you a simple interface for a range of use
9+
cases including complex pre-processing, model stacking and benchmarking.
10+
11+
.. code-block:: python
12+
13+
from skdag import DAGBuilder
14+
15+
dag = (
16+
DAGBuilder()
17+
.add_step("impute", SimpleImputer())
18+
.add_step("vitals", "passthrough", deps={"impute": slice(0, 4)})
19+
.add_step(
20+
"blood",
21+
PCA(n_components=2, random_state=0),
22+
deps={"impute": slice(4, 10)}
23+
)
24+
.add_step(
25+
"rf",
26+
RandomForestRegressor(max_depth=5, random_state=0),
27+
deps=["blood", "vitals"]
28+
)
29+
.add_step("svm", SVR(C=0.7), deps=["blood", "vitals"])
30+
.add_step(
31+
"knn",
32+
KNeighborsRegressor(n_neighbors=5),
33+
deps=["blood", "vitals"]
34+
)
35+
.add_step("meta", LinearRegression(), deps=["rf", "svm", "knn"])
36+
.make_dag(n_jobs=2, verbose=True)
37+
)
38+
39+
dag.show(detailed=True)
40+
41+
.. image:: _static/img/cover.svg
42+
43+
The above DAG imputes missing values, runs PCA on the columns relating to blood test
44+
results and leaves the other columns as they are. Then they get passed to three
45+
different regressors before being passed onto a final meta-estimator. Because DAGs
46+
(unlike pipelines) allow predictors in the middle or a workflow, you can use them to
47+
implement model stacking. We also chose to run the DAG steps in parallel wherever
48+
possible.
49+
50+
After building our DAG, we can treat it as any other estimator:
51+
52+
.. code-block:: python
53+
54+
from sklearn import datasets
55+
56+
X, y = datasets.load_diabetes(return_X_y=True, as_frame=True)
57+
X_train, X_test, y_train, y_test = train_test_split(
58+
X, y, test_size=0.2, random_state=0
59+
)
60+
61+
dag.fit(X_train, y_train)
62+
dag.predict(X_test)
63+
64+
Just like a pipeline, you can optimise it with a gridsearch, pickle it etc.
65+
66+
Note that this package does not deal with things like delayed dependencies and
67+
distributed architectures - consider an `established <https://airflow.apache.org/>`_
68+
`solution <https://dagster.io/>`_ for such use cases. ``skdag`` is just for building and
69+
executing local ensembles from estimators.
70+
71+
:ref:`Read on<quickstart>` to learn more about ``skdag``...
972

1073
.. toctree::
1174
:maxdepth: 2

doc/quick_start.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
#####################################
1+
.. _quickstart:
2+
3+
######################
24
Quick Start with skdag
3-
#####################################
5+
######################
46

57
The following tutorial shows you how to write some simple directed acyclic graphs (DAGs)
68
with ``skdag``.
@@ -21,8 +23,8 @@ to do this in Ubuntu:
2123
2224
sudo apt install graphviz graphviz-dev
2325
24-
Creating your own scikit-learn contribution package
25-
===================================================
26+
Creating a DAG
27+
==============
2628

2729
The simplest DAGs are just a chain of singular dependencies. These DAGs may be
2830
created from the :meth:`skdag.dag.DAG.from_pipeline` method in the same way as a

0 commit comments

Comments
 (0)