Skip to content

Commit 5be037c

Browse files
committed
update example
1 parent 743e2a2 commit 5be037c

File tree

1 file changed

+130
-2
lines changed

1 file changed

+130
-2
lines changed

Examples/LogisticExample/Logistic2.ipynb

Lines changed: 130 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"source": [
6+
"The [`data_algebra`](https://github.com/WinVector/data_algebra/tree/master/data_algebra) locum stand-in gives us the ability to build up pipelines out of larger pieces.\n",
7+
"\n",
8+
"A traditiona all in one way of building up a pipeline looks like the following."
9+
],
10+
"metadata": {
11+
"collapsed": false,
12+
"pycharm": {
13+
"name": "#%% md\n"
14+
}
15+
}
16+
},
317
{
418
"cell_type": "code",
519
"execution_count": 1,
@@ -51,6 +65,18 @@
5165
"print(ops)"
5266
]
5367
},
68+
{
69+
"cell_type": "markdown",
70+
"source": [
71+
"Instead we can build up this calculation as three major steps: computing probability, rank based selection, and cleanup."
72+
],
73+
"metadata": {
74+
"collapsed": false,
75+
"pycharm": {
76+
"name": "#%% md\n"
77+
}
78+
}
79+
},
5480
{
5581
"cell_type": "code",
5682
"execution_count": 2,
@@ -138,6 +164,18 @@
138164
}
139165
}
140166
},
167+
{
168+
"cell_type": "markdown",
169+
"source": [
170+
"We can then combine this into a new pipeline as follows."
171+
],
172+
"metadata": {
173+
"collapsed": false,
174+
"pycharm": {
175+
"name": "#%% md\n"
176+
}
177+
}
178+
},
141179
{
142180
"cell_type": "code",
143181
"execution_count": 5,
@@ -150,6 +188,84 @@
150188
"output_type": "stream"
151189
}
152190
],
191+
"source": [
192+
"ops = Locum(). \\\n",
193+
" append(prob_calculation). \\\n",
194+
" append(top_rank). \\\n",
195+
" append(clean_up_columns). \\\n",
196+
" apply_to(data_algebra.data_ops.describe_table(d_local, 'd'))\n",
197+
"\n",
198+
"print(ops)"
199+
],
200+
"metadata": {
201+
"collapsed": false,
202+
"pycharm": {
203+
"name": "#%%\n",
204+
"is_executing": false
205+
}
206+
}
207+
},
208+
{
209+
"cell_type": "markdown",
210+
"source": [
211+
"The pipeline is applied to data as follows."
212+
],
213+
"metadata": {
214+
"collapsed": false,
215+
"pycharm": {
216+
"name": "#%% md\n"
217+
}
218+
}
219+
},
220+
{
221+
"cell_type": "code",
222+
"execution_count": 6,
223+
"outputs": [
224+
{
225+
"data": {
226+
"text/plain": " subjectID diagnosis probability\n0 1 withdrawal behavior 0.670622\n1 2 positive re-framing 0.558974",
227+
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>subjectID</th>\n <th>diagnosis</th>\n <th>probability</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>withdrawal behavior</td>\n <td>0.670622</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>positive re-framing</td>\n <td>0.558974</td>\n </tr>\n </tbody>\n</table>\n</div>"
228+
},
229+
"metadata": {},
230+
"output_type": "execute_result",
231+
"execution_count": 6
232+
}
233+
],
234+
"source": [
235+
"ops.transform(d_local)"
236+
],
237+
"metadata": {
238+
"collapsed": false,
239+
"pycharm": {
240+
"name": "#%%\n",
241+
"is_executing": false
242+
}
243+
}
244+
},
245+
{
246+
"cell_type": "markdown",
247+
"source": [
248+
"Or we can use `+`/append notation to build up the pipeline."
249+
],
250+
"metadata": {
251+
"collapsed": false,
252+
"pycharm": {
253+
"name": "#%% md\n"
254+
}
255+
}
256+
},
257+
{
258+
"cell_type": "code",
259+
"execution_count": 7,
260+
"outputs": [
261+
{
262+
"name": "stdout",
263+
"text": [
264+
"TableDescription(table_name='d', column_names=['subjectID', 'surveyCategory', 'assessmentTotal', 'irrelevantCol1', 'irrelevantCol2']) .\\\n extend({'probability': '(assessmentTotal * 0.237).exp()'}) .\\\n extend({'total': 'probability.sum()'}, partition_by=['subjectID']) .\\\n extend({'probability': 'probability / total'}) .\\\n extend({'sort_key': '-probability'}) .\\\n extend({'row_number': '_row_number()'}, partition_by=['subjectID'], order_by=['sort_key']) .\\\n select_rows('row_number == 1') .\\\n select_columns(['subjectID', 'surveyCategory', 'probability']) .\\\n rename_columns({'diagnosis': 'surveyCategory'}) .\\\n order_rows(['subjectID'], reverse=['subjectID'])\n"
265+
],
266+
"output_type": "stream"
267+
}
268+
],
153269
"source": [
154270
"ops = data_algebra.data_ops.describe_table(d_local, 'd') +\\\n",
155271
" prob_calculation +\\\n",
@@ -166,9 +282,21 @@
166282
}
167283
}
168284
},
285+
{
286+
"cell_type": "markdown",
287+
"source": [
288+
"And we could \"pipe\" the data into the operators, but that is less \"Pythonic\" (or idiomatic for Python)."
289+
],
290+
"metadata": {
291+
"collapsed": false,
292+
"pycharm": {
293+
"name": "#%% md\n"
294+
}
295+
}
296+
},
169297
{
170298
"cell_type": "code",
171-
"execution_count": 6,
299+
"execution_count": 8,
172300
"outputs": [
173301
{
174302
"data": {
@@ -177,7 +305,7 @@
177305
},
178306
"metadata": {},
179307
"output_type": "execute_result",
180-
"execution_count": 6
308+
"execution_count": 8
181309
}
182310
],
183311
"source": [

0 commit comments

Comments
 (0)