Skip to content

Commit c880fdf

Browse files
committed
add example
rebuild and retest
1 parent 23b322a commit c880fdf

File tree

4 files changed

+148
-10
lines changed

4 files changed

+148
-10
lines changed

Examples/ValuesAsColumns/ValuesAsColumns.ipynb

Lines changed: 147 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,73 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"source": [
6+
"# Values as Columns\n",
7+
"\n",
8+
"A [SQL](https://en.wikipedia.org/wiki/SQL) feature I realy like is the equivalence or interchangeability of values and columns. It is a small convenience, but a nice feature.\n",
9+
"\n",
10+
"Let's work an example to illustrate the point. Our task will be to count how many rows are in each group of a data frame.\n",
11+
"\n",
12+
"In the [data algebra](https://github.com/WinVector/data_algebra) over [Pandas](https://pandas.pydata.org) this looks like the following.\n",
13+
"\n",
14+
"First we import our packges and set up our example Pandas data frame."
15+
],
16+
"metadata": {
17+
"collapsed": false,
18+
"pycharm": {
19+
"name": "#%% md\n"
20+
}
21+
}
22+
},
323
{
424
"cell_type": "code",
5-
"execution_count": null,
25+
"execution_count": 1,
626
"metadata": {
727
"collapsed": true
828
},
9-
"outputs": [],
29+
"outputs": [
30+
{
31+
"data": {
32+
"text/plain": " group one\n0 a 1\n1 a 1\n2 b 1\n3 b 1\n4 b 1",
33+
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>group</th>\n <th>one</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>b</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>b</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>"
34+
},
35+
"execution_count": 1,
36+
"metadata": {},
37+
"output_type": "execute_result"
38+
}
39+
],
1040
"source": [
1141
"import pandas as pd\n",
12-
"from data_algebra.data_ops import data, descr, ex\n",
13-
"from data_algebra.BigQuery import BigQueryModel, BigQuery_DBHandle\n",
42+
"from data_algebra.data_ops import descr\n",
43+
"from data_algebra.BigQuery import BigQueryModel\n",
1444
"\n",
1545
"\n",
1646
"d = pd.DataFrame({\n",
1747
" 'group': ['a', 'a', 'b', 'b', 'b'],\n",
1848
" 'one': [1, 1, 1, 1, 1],\n",
1949
"})\n",
2050
"\n",
51+
"d\n"
52+
]
53+
},
54+
{
55+
"cell_type": "markdown",
56+
"source": [
57+
"Now we specify our grouped counting operations, using a data algebra project step."
58+
],
59+
"metadata": {
60+
"collapsed": false,
61+
"pycharm": {
62+
"name": "#%% md\n"
63+
}
64+
}
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": 2,
69+
"outputs": [],
70+
"source": [
2171
"ops = (\n",
2272
" descr(d=d)\n",
2373
" .project(\n",
@@ -27,26 +77,114 @@
2777
" },\n",
2878
" group_by=['group']\n",
2979
" )\n",
30-
")\n",
80+
")"
81+
],
82+
"metadata": {
83+
"collapsed": false,
84+
"pycharm": {
85+
"name": "#%%\n"
86+
}
87+
}
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"source": [
92+
"The point is, we have the freedom to count using a value in a column (such as the column `one`) *or* just by summing a value directly (such as `1`, the parenthesis are so that the dot is interpreted as an attribute lookup, and not as a floating point marker).\n",
3193
"\n",
94+
"As desired, both calculations return the same result."
95+
],
96+
"metadata": {
97+
"collapsed": false,
98+
"pycharm": {
99+
"name": "#%% md\n"
100+
}
101+
}
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": 3,
106+
"outputs": [
107+
{
108+
"data": {
109+
"text/plain": " group sum_one sum_1\n0 a 2 2\n1 b 3 3",
110+
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>group</th>\n <th>sum_one</th>\n <th>sum_1</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>2</td>\n <td>2</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>3</td>\n <td>3</td>\n </tr>\n </tbody>\n</table>\n</div>"
111+
},
112+
"execution_count": 3,
113+
"metadata": {},
114+
"output_type": "execute_result"
115+
}
116+
],
117+
"source": [
32118
"ops.transform(d)"
33-
]
119+
],
120+
"metadata": {
121+
"collapsed": false,
122+
"pycharm": {
123+
"name": "#%%\n"
124+
}
125+
}
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"source": [
130+
"And the equivalent SQL is given as follows."
131+
],
132+
"metadata": {
133+
"collapsed": false,
134+
"pycharm": {
135+
"name": "#%% md\n"
136+
}
137+
}
34138
},
35139
{
36140
"cell_type": "code",
37-
"execution_count": null,
38-
"outputs": [],
141+
"execution_count": 4,
142+
"outputs": [
143+
{
144+
"name": "stdout",
145+
"output_type": "stream",
146+
"text": [
147+
"-- data_algebra SQL https://github.com/WinVector/data_algebra\n",
148+
"-- dialect: BigQueryModel\n",
149+
"-- string quote: \"\n",
150+
"-- identifier quote: `\n",
151+
"SELECT -- .project({ 'sum_one': 'one.sum()', 'sum_1': '(1).sum()'}, group_by=['group'])\n",
152+
" SUM(`one`) AS `sum_one` ,\n",
153+
" SUM(1) AS `sum_1` ,\n",
154+
" `group`\n",
155+
"FROM\n",
156+
" `d`\n",
157+
"GROUP BY\n",
158+
" `group`\n",
159+
"\n"
160+
]
161+
}
162+
],
39163
"source": [
40164
"db_model = BigQueryModel()\n",
41165
"\n",
42-
"print(db_model.to_sql(ops))\n"
166+
"sql_str = db_model.to_sql(ops)\n",
167+
"\n",
168+
"print(sql_str)"
43169
],
44170
"metadata": {
45171
"collapsed": false,
46172
"pycharm": {
47173
"name": "#%%\n"
48174
}
49175
}
176+
},
177+
{
178+
"cell_type": "markdown",
179+
"source": [
180+
"SQL being where the values and columns equivalence principle is borrowed from."
181+
],
182+
"metadata": {
183+
"collapsed": false,
184+
"pycharm": {
185+
"name": "#%% md\n"
186+
}
187+
}
50188
}
51189
],
52190
"metadata": {

coverage.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,4 +135,4 @@ data_algebra/util.py 140 29 79%
135135
TOTAL 5263 909 83%
136136

137137

138-
============================= 250 passed in 27.40s =============================
138+
============================= 250 passed in 33.25s =============================
0 Bytes
Binary file not shown.

dist/data_algebra-1.2.1.tar.gz

-1 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)