update example

JohnMount · JohnMount · commit 5be037c503dd · 2019-09-29T10:13:25.000-07:00
diff --git a/Examples/LogisticExample/Logistic2.ipynb b/Examples/LogisticExample/Logistic2.ipynb
@@ -1,5 +1,19 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "The [`data_algebra`](https://github.com/WinVector/data_algebra/tree/master/data_algebra) locum stand-in gives us the ability to build up pipelines out of larger pieces.\n",
+    "\n",
+    "A traditiona all in one way of building up a pipeline looks like the following."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -51,6 +65,18 @@
     "print(ops)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Instead we can build up this calculation as three major steps: computing probability, rank based selection, and cleanup."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -138,6 +164,18 @@
     }
    }
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "We can then combine this into a new pipeline as follows."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -150,6 +188,84 @@
      "output_type": "stream"
     }
    ],
+   "source": [
+    "ops = Locum(). \\\n",
+    "    append(prob_calculation). \\\n",
+    "    append(top_rank). \\\n",
+    "    append(clean_up_columns). \\\n",
+    "    apply_to(data_algebra.data_ops.describe_table(d_local, 'd'))\n",
+    "\n",
+    "print(ops)"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%%\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "The pipeline is applied to data as follows."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "   subjectID            diagnosis  probability\n0          1  withdrawal behavior     0.670622\n1          2  positive re-framing     0.558974",
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>subjectID</th>\n      <th>diagnosis</th>\n      <th>probability</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>withdrawal behavior</td>\n      <td>0.670622</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>positive re-framing</td>\n      <td>0.558974</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+     },
+     "metadata": {},
+     "output_type": "execute_result",
+     "execution_count": 6
+    }
+   ],
+   "source": [
+    "ops.transform(d_local)"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%%\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Or we can use `+`/append notation to build up the pipeline."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "outputs": [
+    {
+     "name": "stdout",
+     "text": [
+      "TableDescription(table_name='d', column_names=['subjectID', 'surveyCategory', 'assessmentTotal', 'irrelevantCol1', 'irrelevantCol2']) .\\\n   extend({'probability': '(assessmentTotal * 0.237).exp()'}) .\\\n   extend({'total': 'probability.sum()'}, partition_by=['subjectID']) .\\\n   extend({'probability': 'probability / total'}) .\\\n   extend({'sort_key': '-probability'}) .\\\n   extend({'row_number': '_row_number()'}, partition_by=['subjectID'], order_by=['sort_key']) .\\\n   select_rows('row_number == 1') .\\\n   select_columns(['subjectID', 'surveyCategory', 'probability']) .\\\n   rename_columns({'diagnosis': 'surveyCategory'}) .\\\n   order_rows(['subjectID'], reverse=['subjectID'])\n"
+     ],
+     "output_type": "stream"
+    }
+   ],
    "source": [
     "ops =  data_algebra.data_ops.describe_table(d_local, 'd') +\\\n",
     "    prob_calculation +\\\n",
@@ -166,9 +282,21 @@
     }
    }
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "And we could \"pipe\" the data into the operators, but that is less \"Pythonic\" (or idiomatic for Python)."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
    "outputs": [
     {
      "data": {
@@ -177,7 +305,7 @@
      },
      "metadata": {},
      "output_type": "execute_result",
-     "execution_count": 6
+     "execution_count": 8
     }
    ],
    "source": [