Skip to content
This repository was archived by the owner on Jul 15, 2024. It is now read-only.

Commit 9fd1349

Browse files
committed
Update notebooks
- Remove use of curl - Sync with changes made in ibis repo
1 parent d775755 commit 9fd1349

10 files changed

+70
-156
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,7 @@ dmypy.json
127127

128128
# Pyre type checker
129129
.pyre/
130+
131+
# Tutorial and examples artifacts
132+
geography.db
133+
*.log

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ RUN adduser --disabled-password \
1212
${NB_USER}
1313

1414
RUN apt-get update && \
15-
apt-get install -y git curl && \
15+
apt-get install -y git && \
1616
apt-get clean && \
1717
rm -rf /var/lib/apt/lists/*
1818

tutorial/01-Introduction-to-Ibis.ipynb

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
"source": [
77
"# Getting started\n",
88
"\n",
9-
"To start using Ibis, you need a Python environment with Ibis installed. Follow the <a href='../../backends/SQLite#install'>installation instructions for SQLite</a> to setup an environment.\n",
9+
"To start using Ibis, you need a Python environment with Ibis installed. If you're running through this tutorial on your own machine (rather than binder) please follow the [installation instructions for SQLite](https://ibis-project.org/docs/latest/backends/SQLite/) to setup an environment.\n",
1010
"\n",
11-
"Once you have your environment ready, to start using Ibis simply import the `ibis` module:"
11+
"You'll also need access to the `geography.db` database hosted [here](https://storage.googleapis.com/ibis-tutorial-data/geography.db). Every notebook in the tutorial starts with the following code to download the database if it doesn't already exist."
1212
]
1313
},
1414
{
@@ -17,16 +17,15 @@
1717
"metadata": {},
1818
"outputs": [],
1919
"source": [
20-
"import ibis"
20+
"from tutorial_utils import setup\n",
21+
"setup()"
2122
]
2223
},
2324
{
2425
"cell_type": "markdown",
2526
"metadata": {},
2627
"source": [
27-
"To make it things easier in this tutorial, we will be using _Ibis interactive mode_. For production code, that will rarely be the case. More details on Ibis non-interactive (aka lazy) mode are covered in the third tutorial, _Expressions, lazy mode and logging queries_.\n",
28-
"\n",
29-
"To set the interactive mode, use:"
28+
"You should now have `ibis` and the tutorial data all setup. We're ready to get started. First lets import `ibis`."
3029
]
3130
},
3231
{
@@ -35,20 +34,16 @@
3534
"metadata": {},
3635
"outputs": [],
3736
"source": [
38-
"ibis.options.interactive = True"
37+
"import ibis"
3938
]
4039
},
4140
{
4241
"cell_type": "markdown",
4342
"metadata": {},
4443
"source": [
45-
"Next thing we need is to create a connection object. The connection defines where the data is stored and where the computations will be performed.\n",
44+
"To make it things easier in this tutorial, we will be using Ibis's \"interactive mode\". This is the recommended mode to use when doing interactive/iterative work with `ibis`. When deploying production code you'll typically run in non-interactive mode. More details on Ibis non-interactive mode are covered in [a later notebook](./03-Expressions-Lazy-Mode-Logging.ipynb).\n",
4645
"\n",
47-
"For a comparison to pandas, this is not the same as where the data is imported from (e.g. `pandas.read_sql`). pandas loads data into memory and performs the computations itself. Ibis won't load the data and perform any computation, but instead will leave the data in the backend defined in the connection, and will _ask_ the backend to perform the computations.\n",
48-
"\n",
49-
"In this tutorial we will be using a SQLite connection for its simplicity (no installation is needed). But Ibis can work with many different backends, including big data systems, or GPU-accelerated analytical databases. As well as most common relational databases (PostgreSQL, MySQL,...).\n",
50-
"\n",
51-
"Let's download the SQLite database from the `ibis-tutorial-data` GCS (Google Cloud Storage) bucket, then connect to it using `ibis`."
46+
"To enable interactive mode, run:"
5247
]
5348
},
5449
{
@@ -57,7 +52,18 @@
5752
"metadata": {},
5853
"outputs": [],
5954
"source": [
60-
"!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'"
55+
"ibis.options.interactive = True"
56+
]
57+
},
58+
{
59+
"cell_type": "markdown",
60+
"metadata": {},
61+
"source": [
62+
"Next thing we need is to create a connection object. The connection defines where the data is stored and where the computations will be performed.\n",
63+
"\n",
64+
"For a comparison to pandas, this is not the same as where the data is imported from (e.g. `pandas.read_sql`). pandas loads data into memory and performs the computations itself. Ibis won't load the data and perform any computation, but instead will leave the data in the backend defined in the connection, and will _ask_ the backend to perform the computations.\n",
65+
"\n",
66+
"In this tutorial we will be using a SQLite connection for its simplicity (no installation is needed). But Ibis can work with many different backends, including big data systems, or GPU-accelerated analytical databases. As well as most common relational databases (PostgreSQL, MySQL,...)."
6167
]
6268
},
6369
{
@@ -73,8 +79,6 @@
7379
"cell_type": "markdown",
7480
"metadata": {},
7581
"source": [
76-
"Note that if you installed Ibis with `pip` instead of `conda`, you may need to install the SQLite backend separately with `pip install 'ibis-framework[sqlite]'`.\n",
77-
"\n",
7882
"### Exploring the data\n",
7983
"\n",
8084
"To list the tables in the `connection` object, we can use the `.list_tables()` method. If you are using Jupyter, you can see all the methods and attributes of the `connection` object by writing `connection.` and pressing the `<TAB>` key."
@@ -312,7 +316,7 @@
312316
"name": "python",
313317
"nbconvert_exporter": "python",
314318
"pygments_lexer": "ipython3",
315-
"version": "3.10.5"
319+
"version": "3.10.8"
316320
}
317321
},
318322
"nbformat": 4,

tutorial/02-Aggregates-Joins.ipynb

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,9 @@
66
"source": [
77
"# Aggregating and joining data\n",
88
"\n",
9-
"This is the second introductory tutorial to Ibis. If you are new to Ibis, you may want to start\n",
10-
"by the first tutorial, _01-Introduction-to-Ibis_.\n",
9+
"This is the second introductory tutorial to Ibis. If you are new to Ibis, you may want to start at [the beginning of this tutorial](./01-Introduction-to-Ibis.ipynb).\n",
1110
"\n",
12-
"In the first tutorial, we saw how to operate on the data of a table. We will work again with\n",
13-
"the `countries` table as we did previously."
11+
"In the first notebook we saw how to load and query data using `ibis`. In this notebook we'll continue with the same dataset, building up some more complicated queries."
1412
]
1513
},
1614
{
@@ -19,7 +17,8 @@
1917
"metadata": {},
2018
"outputs": [],
2119
"source": [
22-
"!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'"
20+
"from tutorial_utils import setup\n",
21+
"setup()"
2322
]
2423
},
2524
{
@@ -369,7 +368,7 @@
369368
"name": "python",
370369
"nbconvert_exporter": "python",
371370
"pygments_lexer": "ipython3",
372-
"version": "3.10.5"
371+
"version": "3.10.8"
373372
}
374373
},
375374
"nbformat": 4,

tutorial/03-Expressions-Lazy-Mode-Logging.ipynb

Lines changed: 11 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"In lazy mode, Ibis won't be executing the operations automatically, but instead, will generate an\n",
1414
"expression to be executed at a later time.\n",
1515
"\n",
16-
"Let's see this in practice, starting with the same example as in previous tutorials - the geography database."
16+
"Let's see this in practice, starting with the same database as in previous tutorials."
1717
]
1818
},
1919
{
@@ -22,7 +22,8 @@
2222
"metadata": {},
2323
"outputs": [],
2424
"source": [
25-
"!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'"
25+
"from tutorial_utils import setup\n",
26+
"setup()"
2627
]
2728
},
2829
{
@@ -236,26 +237,16 @@
236237
"metadata": {},
237238
"outputs": [],
238239
"source": [
239-
"import datetime\n",
240-
"import os\n",
241-
"import tempfile\n",
242240
"from pathlib import Path\n",
243241
"\n",
244242
"\n",
245243
"def log_query_to_file(query: str) -> None:\n",
246-
" \"\"\"\n",
247-
" Log queries to `data/tutorial_queries.log`.\n",
248-
"\n",
249-
" Each file is a query. Line breaks in the query are\n",
250-
" represented with the string '\\n'.\n",
251-
"\n",
252-
" A timestamp of when the query is executed is added.\n",
253-
" \"\"\"\n",
254-
" dirname = Path(tempfile.gettempdir())\n",
255-
" fname = dirname / 'tutorial_queries.log'\n",
256-
" query_in_a_single_line = query.replace('\\n', r'\\n')\n",
244+
" \"\"\"Log queries to `./tutorial_queries.log`.\"\"\"\n",
245+
" fname = Path() / 'tutorial_queries.log'\n",
246+
" query = query.replace(\"\\n\", \" \")\n",
257247
" with fname.open(mode='a') as f:\n",
258-
" f.write(f'{query_in_a_single_line}\\n')"
248+
" # log on a single line\n",
249+
" f.write(f\"{query}\\n\")"
259250
]
260251
},
261252
{
@@ -272,20 +263,17 @@
272263
"metadata": {},
273264
"outputs": [],
274265
"source": [
275-
"import time\n",
276-
"\n",
277266
"ibis.options.verbose_log = log_query_to_file\n",
278267
"\n",
279268
"countries.execute()\n",
280-
"time.sleep(1.0)\n",
281269
"countries['name', 'continent', population_in_millions].limit(3).execute()"
282270
]
283271
},
284272
{
285273
"cell_type": "markdown",
286274
"metadata": {},
287275
"source": [
288-
"This has created a log file in `data/tutorial_queries.log` where the executed queries have been logged."
276+
"This has created a log file in `$PWD/tutorial_queries.log` where the executed queries have been logged."
289277
]
290278
},
291279
{
@@ -294,7 +282,7 @@
294282
"metadata": {},
295283
"outputs": [],
296284
"source": [
297-
"!cat -n data/tutorial_queries.log"
285+
"!cat -n $PWD/tutorial_queries.log"
298286
]
299287
}
300288
],
@@ -314,7 +302,7 @@
314302
"name": "python",
315303
"nbconvert_exporter": "python",
316304
"pygments_lexer": "ipython3",
317-
"version": "3.10.5"
305+
"version": "3.10.8"
318306
}
319307
},
320308
"nbformat": 4,

tutorial/04-More-Value-Expressions.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@
2121
"metadata": {},
2222
"outputs": [],
2323
"source": [
24-
"!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'"
24+
"from tutorial_utils import setup\n",
25+
"setup()"
2526
]
2627
},
2728
{
@@ -31,6 +32,7 @@
3132
"outputs": [],
3233
"source": [
3334
"import ibis\n",
35+
"\n",
3436
"ibis.options.interactive = True\n",
3537
"\n",
3638
"connection = ibis.sqlite.connect('geography.db')"
@@ -512,7 +514,7 @@
512514
"name": "python",
513515
"nbconvert_exporter": "python",
514516
"pygments_lexer": "ipython3",
515-
"version": "3.10.5"
517+
"version": "3.10.8"
516518
}
517519
},
518520
"nbformat": 4,

tutorial/05-IO-Create-Insert-External-Data.ipynb

Lines changed: 5 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@
2020
"metadata": {},
2121
"outputs": [],
2222
"source": [
23-
"!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'"
23+
"from tutorial_utils import setup\n",
24+
"setup()"
2425
]
2526
},
2627
{
@@ -30,9 +31,10 @@
3031
"outputs": [],
3132
"source": [
3233
"import ibis\n",
34+
"\n",
3335
"ibis.options.interactive = True\n",
3436
"\n",
35-
"connection = ibis.sqlite.connect('geography.db')"
37+
"connection = ibis.sqlite.connect(\"geography.db\")"
3638
]
3739
},
3840
{
@@ -112,106 +114,6 @@
112114
"source": [
113115
"connection.drop_table('continents')"
114116
]
115-
},
116-
{
117-
"cell_type": "markdown",
118-
"metadata": {},
119-
"source": [
120-
"## Creating new tables from in-memory Pandas dataframes\n",
121-
"\n",
122-
"Pandas and NumPy are convenient to create test data in memory as a dataframe. This can then be turned into an Ibis expression using `ibis.memtable`."
123-
]
124-
},
125-
{
126-
"cell_type": "code",
127-
"execution_count": null,
128-
"metadata": {},
129-
"outputs": [],
130-
"source": [
131-
"import pandas as pd\n",
132-
"import numpy as np\n",
133-
"\n",
134-
"\n",
135-
"def make_students_df(num_records, random_seed=None):\n",
136-
" rng = np.random.default_rng(random_seed)\n",
137-
" return pd.DataFrame(\n",
138-
" {\n",
139-
" \"firstname\": rng.choice([\"Alice\", \"Bob\", \"Jane\", \"John\"], size=num_records),\n",
140-
" \"birth_date\": (\n",
141-
" pd.to_datetime(\"2021-01-01\")\n",
142-
" + pd.to_timedelta(rng.integers(0, 365, size=num_records), unit=\"D\")\n",
143-
" ),\n",
144-
" \"math_grade\": rng.normal(55, 10, size=num_records).clip(0, 100).round(1),\n",
145-
" }\n",
146-
" )\n",
147-
"\n",
148-
"students_df = make_students_df(21, random_seed=42)\n",
149-
"students_memtable = ibis.memtable(students_df)\n",
150-
"students_memtable\n"
151-
]
152-
},
153-
{
154-
"cell_type": "markdown",
155-
"metadata": {},
156-
"source": [
157-
"By default `ibis.memtable` uses the `duckdb` in-memory backend to execute queries against the Pandas dataframe data efficiently.\n",
158-
"\n",
159-
"We can then materialize it as a physical table for a specific backend if necessary:"
160-
]
161-
},
162-
{
163-
"cell_type": "code",
164-
"execution_count": null,
165-
"metadata": {},
166-
"outputs": [],
167-
"source": [
168-
"connection = ibis.duckdb.connect(\"ibis_tutorial_students.duckdb\")\n",
169-
"connection.create_table('students', students_memtable)\n",
170-
"students = connection.table('students')\n",
171-
"students.group_by(students.birth_date.month()).aggregate(\n",
172-
" count=students.count(),\n",
173-
" avg_math_grade=students.math_grade.mean(),\n",
174-
")"
175-
]
176-
},
177-
{
178-
"cell_type": "markdown",
179-
"metadata": {},
180-
"source": [
181-
"Note that NumPy, Pandas and `ibis.memtable` are only suitable to generate data that fits in memory. To generate data larger than memory, we can generate data in chunks and iteratively insert the chunks using `connection.insert(tablename, pandas_dataframe)`:"
182-
]
183-
},
184-
{
185-
"cell_type": "code",
186-
"execution_count": null,
187-
"metadata": {},
188-
"outputs": [],
189-
"source": [
190-
"connection.insert(students.get_name(), make_students_df(10_000, random_seed=43))\n",
191-
"students.count()"
192-
]
193-
},
194-
{
195-
"cell_type": "code",
196-
"execution_count": null,
197-
"metadata": {},
198-
"outputs": [],
199-
"source": [
200-
"connection.insert(students.get_name(), make_students_df(10_000, random_seed=44))\n",
201-
"students.count()"
202-
]
203-
},
204-
{
205-
"cell_type": "code",
206-
"execution_count": null,
207-
"metadata": {},
208-
"outputs": [],
209-
"source": [
210-
"students.group_by(students.birth_date.month()).aggregate(\n",
211-
" count=students.count(),\n",
212-
" avg_math_grade=students.math_grade.mean(),\n",
213-
")"
214-
]
215117
}
216118
],
217119
"metadata": {
@@ -230,7 +132,7 @@
230132
"name": "python",
231133
"nbconvert_exporter": "python",
232134
"pygments_lexer": "ipython3",
233-
"version": "3.10.5"
135+
"version": "3.10.8"
234136
},
235137
"vscode": {
236138
"interpreter": {

0 commit comments

Comments
 (0)