Skip to content
This repository was archived by the owner on Aug 25, 2024. It is now read-only.

Commit da2d714

Browse files
committed
model: scikit: examples: Testable LR
Signed-off-by: John Andersen <[email protected]>
1 parent 3572c6e commit da2d714

File tree

10 files changed

+191
-141
lines changed

10 files changed

+191
-141
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1818
- Use randomly generated data for scikit tests
1919
- Change Core to Official to clarify who maintains each plugin
2020
- Name of output of unsupervised model from "Prediction" to "cluster"
21+
- Test scikit LR documentation examples in CI
2122

2223
## [0.3.4] - 2020-02-28
2324
### Added

docs/plugins/dffml_model.rst

Lines changed: 35 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -684,7 +684,7 @@ Predicting with trained model:
684684

685685
**Usage Example:**
686686

687-
Example below uses LinearRegression Model on a small dataset.
687+
Example below uses LinearRegression Model using the command line.
688688

689689
Let us take a simple example:
690690

@@ -704,43 +704,34 @@ Let us take a simple example:
704704
| 5 | 11 | 1.2 | 60 |
705705
+----------------------+------------+--------------+--------+
706706

707-
.. code-block:: console
707+
First we create the files
708+
709+
.. literalinclude:: /../model/scikit/examples/lr/train_data.sh
710+
711+
.. literalinclude:: /../model/scikit/examples/lr/test_data.sh
712+
713+
Train the model
714+
715+
.. literalinclude:: /../model/scikit/examples/lr/train.sh
716+
717+
Assess accuracy
718+
719+
.. literalinclude:: /../model/scikit/examples/lr/accuracy.sh
720+
721+
Output:
722+
723+
.. code-block::
708724
709-
$ cat > train.csv << EOF
710-
Years,Expertise,Trust,Salary
711-
0,1,0.2,10
712-
1,3,0.4,20
713-
2,5,0.6,30
714-
3,7,0.8,40
715-
EOF
716-
$ cat > test.csv << EOF
717-
Years,Expertise,Trust,Salary
718-
4,9,1.0,50
719-
5,11,1.2,60
720-
EOF
721-
$ dffml train \
722-
-model scikitlr \
723-
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
724-
-model-predict Salary:float:1 \
725-
-sources f=csv \
726-
-source-filename train.csv \
727-
-log debug
728-
$ dffml accuracy \
729-
-model scikitlr \
730-
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
731-
-model-predict Salary:float:1 \
732-
-sources f=csv \
733-
-source-filename test.csv \
734-
-log debug
735725
1.0
736-
$ echo -e 'Years,Expertise,Trust\n6,13,1.4\n' | \
737-
dffml predict all \
738-
-model scikitlr \
739-
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
740-
-model-predict Salary:float:1 \
741-
-sources f=csv \
742-
-source-filename /dev/stdin \
743-
-log debug
726+
727+
Make a prediction
728+
729+
.. literalinclude:: /../model/scikit/examples/lr/predict.sh
730+
731+
Output:
732+
733+
.. code-block:: json
734+
744735
[
745736
{
746737
"extra": {},
@@ -749,46 +740,20 @@ Let us take a simple example:
749740
"Trust": 1.4,
750741
"Years": 6
751742
},
752-
"last_updated": "2019-09-18T19:04:18Z",
743+
"key": "0",
744+
"last_updated": "2020-02-07T14:17:08Z",
753745
"prediction": {
754-
"confidence": 1.0,
755-
"value": 70.00000000000001
756-
},
757-
"key": 0
746+
"Salary": {
747+
"confidence": 1.0,
748+
"value": 70.13972055888223
749+
}
750+
}
758751
}
759752
]
760753
761754
Example usage of Linear Regression Model using python API:
762755

763-
.. code-block:: python
764-
765-
from dffml import CSVSource, Features, DefFeature
766-
from dffml.noasync import train, accuracy, predict
767-
from dffml_model_scikit import LinearRegressionModel
768-
769-
model = LinearRegressionModel(
770-
features=Features(
771-
DefFeature("Years", int, 1),
772-
DefFeature("Expertise", int, 1),
773-
DefFeature("Trust", float, 1),
774-
),
775-
predict=DefFeature("Salary", int, 1),
776-
)
777-
778-
# Train the model
779-
train(model, "train.csv")
780-
781-
# Assess accuracy (alternate way of specifying data source)
782-
print("Accuracy:", accuracy(model, CSVSource(filename="test.csv")))
783-
784-
# Make prediction
785-
for i, features, prediction in predict(
786-
model,
787-
{"Years": 6, "Expertise": 13, "Trust": 0.7},
788-
{"Years": 7, "Expertise": 15, "Trust": 0.8},
789-
):
790-
features["Salary"] = prediction["Salary"]["value"]
791-
print(features)
756+
.. literalinclude:: /../model/scikit/examples/lr/lr.py
792757

793758
Example below uses KMeans Clustering Model on a small randomly generated dataset.
794759

model/scikit/dffml_model_scikit/__init__.py

Lines changed: 37 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
123123
**Usage Example:**
124124
125-
Example below uses LinearRegression Model on a small dataset.
125+
Example below uses LinearRegression Model using the command line.
126126
127127
Let us take a simple example:
128128
@@ -142,91 +142,57 @@
142142
| 5 | 11 | 1.2 | 60 |
143143
+----------------------+------------+--------------+--------+
144144
145-
.. code-block:: console
145+
First we create the files
146+
147+
.. literalinclude:: /../model/scikit/examples/lr/train_data.sh
148+
149+
.. literalinclude:: /../model/scikit/examples/lr/test_data.sh
150+
151+
Train the model
152+
153+
.. literalinclude:: /../model/scikit/examples/lr/train.sh
154+
155+
Assess accuracy
156+
157+
.. literalinclude:: /../model/scikit/examples/lr/accuracy.sh
158+
159+
Output:
160+
161+
.. code-block::
146162
147-
$ cat > train.csv << EOF
148-
Years,Expertise,Trust,Salary
149-
0,1,0.2,10
150-
1,3,0.4,20
151-
2,5,0.6,30
152-
3,7,0.8,40
153-
EOF
154-
$ cat > test.csv << EOF
155-
Years,Expertise,Trust,Salary
156-
4,9,1.0,50
157-
5,11,1.2,60
158-
EOF
159-
$ dffml train \\
160-
-model scikitlr \\
161-
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \\
162-
-model-predict Salary:float:1 \\
163-
-sources f=csv \\
164-
-source-filename train.csv \\
165-
-log debug
166-
$ dffml accuracy \\
167-
-model scikitlr \\
168-
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \\
169-
-model-predict Salary:float:1 \\
170-
-sources f=csv \\
171-
-source-filename test.csv \\
172-
-log debug
173163
1.0
174-
$ echo -e 'Years,Expertise,Trust\\n6,13,1.4\\n' | \\
175-
dffml predict all \\
176-
-model scikitlr \\
177-
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \\
178-
-model-predict Salary:float:1 \\
179-
-sources f=csv \\
180-
-source-filename /dev/stdin \\
181-
-log debug
164+
165+
Make a prediction
166+
167+
.. literalinclude:: /../model/scikit/examples/lr/predict.sh
168+
169+
Output:
170+
171+
.. code-block:: json
172+
182173
[
183174
{
184175
"extra": {},
185176
"features": {
186177
"Expertise": 13,
187-
"Trust": 1.4,
178+
"Trust": 0.7,
188179
"Years": 6
189180
},
190-
"last_updated": "2019-09-18T19:04:18Z",
181+
"key": "0",
182+
"last_updated": "2020-03-01T22:26:46Z",
191183
"prediction": {
192-
"confidence": 1.0,
193-
"value": 70.00000000000001
194-
},
195-
"key": 0
184+
"Salary": {
185+
"confidence": 1.0,
186+
"value": 70.0
187+
}
188+
}
196189
}
197190
]
198191
199-
Example usage of Linear Regression Model using python API:
200-
201-
.. code-block:: python
202192
203-
from dffml import CSVSource, Features, DefFeature
204-
from dffml.noasync import train, accuracy, predict
205-
from dffml_model_scikit import LinearRegressionModel
206-
207-
model = LinearRegressionModel(
208-
features=Features(
209-
DefFeature("Years", int, 1),
210-
DefFeature("Expertise", int, 1),
211-
DefFeature("Trust", float, 1),
212-
),
213-
predict=DefFeature("Salary", int, 1),
214-
)
215-
216-
# Train the model
217-
train(model, "train.csv")
218-
219-
# Assess accuracy (alternate way of specifying data source)
220-
print("Accuracy:", accuracy(model, CSVSource(filename="test.csv")))
193+
Example usage of Linear Regression Model using python API:
221194
222-
# Make prediction
223-
for i, features, prediction in predict(
224-
model,
225-
{"Years": 6, "Expertise": 13, "Trust": 0.7},
226-
{"Years": 7, "Expertise": 15, "Trust": 0.8},
227-
):
228-
features["Salary"] = prediction["Salary"]["value"]
229-
print(features)
195+
.. literalinclude:: /../model/scikit/examples/lr/lr.py
230196
231197
Example below uses KMeans Clustering Model on a small randomly generated dataset.
232198
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
dffml accuracy \
2+
-model scikitlr \
3+
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
4+
-model-predict Salary:float:1 \
5+
-sources f=csv \
6+
-source-filename test.csv

model/scikit/examples/lr/lr.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from dffml import CSVSource, Features, DefFeature
2+
from dffml.noasync import train, accuracy, predict
3+
from dffml_model_scikit import LinearRegressionModel
4+
5+
model = LinearRegressionModel(
6+
features=Features(
7+
DefFeature("Years", int, 1),
8+
DefFeature("Expertise", int, 1),
9+
DefFeature("Trust", float, 1),
10+
),
11+
predict=DefFeature("Salary", int, 1),
12+
)
13+
14+
# Train the model
15+
train(model, "train.csv")
16+
17+
# Assess accuracy (alternate way of specifying data source)
18+
print("Accuracy:", accuracy(model, CSVSource(filename="test.csv")))
19+
20+
# Make prediction
21+
for i, features, prediction in predict(
22+
model,
23+
{"Years": 6, "Expertise": 13, "Trust": 0.7},
24+
{"Years": 7, "Expertise": 15, "Trust": 0.8},
25+
):
26+
features["Salary"] = prediction["Salary"]["value"]
27+
print(features)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
echo -e 'Years,Expertise,Trust\n6,13,0.7\n' | \
2+
dffml predict all \
3+
-model scikitlr \
4+
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
5+
-model-predict Salary:float:1 \
6+
-sources f=csv \
7+
-source-filename /dev/stdin
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
cat > test.csv << EOF
2+
Years,Expertise,Trust,Salary
3+
4,9,0.5,50
4+
5,11,0.6,60
5+
EOF
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
import os
2+
import ast
3+
import sys
4+
import json
5+
import tempfile
6+
import contextlib
7+
import subprocess
8+
import unittest.mock
9+
10+
from dffml.util.os import chdir
11+
12+
13+
def sh_filepath(filename):
14+
return os.path.join(os.path.dirname(__file__), filename)
15+
16+
17+
@contextlib.contextmanager
18+
def directory_with_csv_files():
19+
with tempfile.TemporaryDirectory() as tempdir:
20+
with chdir(tempdir):
21+
subprocess.check_output(["bash", sh_filepath("train_data.sh")])
22+
subprocess.check_output(["bash", sh_filepath("test_data.sh")])
23+
yield tempdir
24+
25+
26+
class TestExample(unittest.TestCase):
27+
def python_test(self, filename):
28+
# Path to target file
29+
filepath = os.path.join(os.path.dirname(__file__), filename)
30+
# Capture output
31+
stdout = subprocess.check_output([sys.executable, filepath])
32+
lines = stdout.decode().split("\n")
33+
# Check the Accuracy
34+
self.assertIn("Accuracy: 1.0", lines[0])
35+
# Check the salary
36+
self.assertEqual(round(ast.literal_eval(lines[1])["Salary"]), 70.0)
37+
self.assertEqual(round(ast.literal_eval(lines[2])["Salary"]), 80.0)
38+
39+
def test_python_filenames(self):
40+
with directory_with_csv_files() as tempdir:
41+
self.python_test("lr.py")
42+
43+
def test_shell(self):
44+
with directory_with_csv_files() as tempdir:
45+
# Run training
46+
subprocess.check_output(["bash", sh_filepath("train.sh")])
47+
# Check the Accuracy
48+
stdout = subprocess.check_output(
49+
["bash", sh_filepath("accuracy.sh")]
50+
)
51+
self.assertEqual(stdout.decode().strip(), "1.0")
52+
# Make the prediction
53+
stdout = subprocess.check_output(
54+
["bash", sh_filepath("predict.sh")]
55+
)
56+
records = json.loads(stdout.decode())
57+
# Check the salary
58+
self.assertEqual(
59+
round(records[0]["prediction"]["Salary"]["value"]), 70.0
60+
)

model/scikit/examples/lr/train.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
dffml train \
2+
-model scikitlr \
3+
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
4+
-model-predict Salary:float:1 \
5+
-sources f=csv \
6+
-source-filename train.csv

0 commit comments

Comments
 (0)