Skip to content
This repository was archived by the owner on Jul 15, 2024. It is now read-only.

Commit 4ece0e9

Browse files
AlenkaFjcrist
authored andcommitted
First run-through
1 parent c1cdd6e commit 4ece0e9

File tree

1 file changed

+59
-16
lines changed

1 file changed

+59
-16
lines changed

tutorial/01-Introduction-to-Ibis.ipynb

Lines changed: 59 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,33 @@
11
{
22
"cells": [
33
{
4+
"attachments": {},
45
"cell_type": "markdown",
56
"metadata": {},
67
"source": [
7-
"# Getting started\n",
8+
"# Getting started"
9+
]
10+
},
11+
{
12+
"attachments": {},
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"### Setting up"
17+
]
18+
},
19+
{
20+
"attachments": {},
21+
"cell_type": "markdown",
22+
"metadata": {},
23+
"source": [
24+
"To start using `ibis`, you need a Python environment with `ibis` installed.\n",
25+
"\n",
26+
"If you're running through this tutorial on your own machine (rather than binder) please follow the [installation instructions](https://ibis-project.org/install/ to setup an environment with the `SQLite` backend.\n",
827
"\n",
9-
"To start using Ibis, you need a Python environment with Ibis installed. If you're running through this tutorial on your own machine (rather than binder) please follow the [installation instructions for SQLite](https://ibis-project.org/docs/latest/backends/SQLite/) to setup an environment.\n",
28+
"You'll also need access to the `geography.db` database hosted [here](https://storage.googleapis.com/ibis-tutorial-data/geography.db).\n",
1029
"\n",
11-
"You'll also need access to the `geography.db` database hosted [here](https://storage.googleapis.com/ibis-tutorial-data/geography.db). Every notebook in the tutorial starts with the following code to download the database if it doesn't already exist."
30+
"Every notebook in the tutorial starts with the following code to download the database if it doesn't already exist:"
1231
]
1332
},
1433
{
@@ -22,10 +41,12 @@
2241
]
2342
},
2443
{
44+
"attachments": {},
2545
"cell_type": "markdown",
2646
"metadata": {},
2747
"source": [
28-
"You should now have `ibis` and the tutorial data all setup. We're ready to get started. First lets import `ibis`."
48+
"You should now have `ibis` and the tutorial data all setup.\n",
49+
"We're ready to get started. First lets import `ibis`."
2950
]
3051
},
3152
{
@@ -38,10 +59,14 @@
3859
]
3960
},
4061
{
62+
"attachments": {},
4163
"cell_type": "markdown",
4264
"metadata": {},
4365
"source": [
44-
"To make it things easier in this tutorial, we will be using Ibis's \"interactive mode\". This is the recommended mode to use when doing interactive/iterative work with `ibis`. When deploying production code you'll typically run in non-interactive mode. More details on Ibis non-interactive mode are covered in [a later notebook](./03-Expressions-Lazy-Mode-Logging.ipynb).\n",
66+
"To make things easier, we will be using `ibis`'s **interactive mode** in order to see the results of an operation immediately.\n",
67+
"This is the recommended mode to use when doing interactive/iterative work with `ibis`.\n",
68+
"\n",
69+
"When deploying production code you'll typically run in **non-interactive/lazy mode**. More details on `ibis` non-interactive mode are covered in [a later notebook](./03-Expressions-Lazy-Mode-Logging.ipynb).\n",
4570
"\n",
4671
"To enable interactive mode, run:"
4772
]
@@ -56,14 +81,25 @@
5681
]
5782
},
5883
{
84+
"attachments": {},
85+
"cell_type": "markdown",
86+
"metadata": {},
87+
"source": [
88+
"### Creating a connection"
89+
]
90+
},
91+
{
92+
"attachments": {},
5993
"cell_type": "markdown",
6094
"metadata": {},
6195
"source": [
62-
"Next thing we need is to create a connection object. The connection defines where the data is stored and where the computations will be performed.\n",
96+
"Next thing we need is to create a **connection object**.\n",
97+
"\n",
98+
"The connection defines where the data is stored and where the computations will be performed.\n",
6399
"\n",
64-
"For a comparison to pandas, this is not the same as where the data is imported from (e.g. `pandas.read_sql`). pandas loads data into memory and performs the computations itself. Ibis won't load the data and perform any computation, but instead will leave the data in the backend defined in the connection, and will _ask_ the backend to perform the computations.\n",
100+
"This is not the same as in `pandas` when we import the data from an external source (e.g. `pandas.read_sql`). In this case `pandas` loads data into memory and performs the computations itself. `ibis` will not load the data and perform any computation, but instead will leave the data in the backend defined in the connection, and will _ask_ the backend to perform the computations.\n",
65101
"\n",
66-
"In this tutorial we will be using a SQLite connection for its simplicity (no installation is needed). But Ibis can work with many different backends, including big data systems, or GPU-accelerated analytical databases. As well as most common relational databases (PostgreSQL, MySQL,...)."
102+
"In this tutorial we will be using a `SQLite` connection for its simplicity (no installation is needed). But `ibis` can work with many different backends, including big data systems, or GPU-accelerated analytical databases. As well as most common relational databases (`PostgreSQL`, `MySQL`, ...)."
67103
]
68104
},
69105
{
@@ -94,14 +130,16 @@
94130
]
95131
},
96132
{
133+
"attachments": {},
97134
"cell_type": "markdown",
98135
"metadata": {},
99136
"source": [
100-
"These two tables include data about countries, and about GDP by country and year.\n",
137+
"These three tables include world countries data, their GDP by year and their independence information.\n",
101138
"\n",
102-
"The data from countries has been obtained from [GeoNames](https://www.geonames.org/countries/).\n",
103-
"The GDP table will be used in the next tutorial, and the data has been obtained from the\n",
139+
"* The data for the countries table has been obtained from [GeoNames](https://www.geonames.org/countries/).\n",
140+
"* The GDP table will be used in the next tutorial, and the data for it has been obtained from the\n",
104141
"[World Bank website](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD).\n",
142+
"* The data for the `independence` table has been obtained from [Wikipedia](https://en.wikipedia.org/wiki/List_of_national_independence_days) and will be used in one of the following tutorials.\n",
105143
"\n",
106144
"Next, we want to access a specific table in the database. We can create a handler to the `countries` table with:"
107145
]
@@ -150,10 +188,11 @@
150188
]
151189
},
152190
{
191+
"attachments": {},
153192
"cell_type": "markdown",
154193
"metadata": {},
155194
"source": [
156-
"The table is too big for all the results to be displayed, and we probably don't want to see all of them at once anyway. For this reason, just the beginning and the end of the results is displayed. Often, the number of rows will be so large that this operation could take a long time.\n",
195+
"The table is too big for all the results to be displayed, and we probably don't want to see all of them at once anyway. For this reason, just the first 10 rows of the results are displayed.\n",
157196
"\n",
158197
"To check how many rows a table has, we can use the `.count()` method:"
159198
]
@@ -204,10 +243,11 @@
204243
]
205244
},
206245
{
246+
"attachments": {},
207247
"cell_type": "markdown",
208248
"metadata": {},
209249
"source": [
210-
"We will focus on Asia (`AS` in the table). We can identify which rows belong to Asian countries using the standard Python `==` operator:"
250+
"We will focus on Asia (`AS`). We can identify which rows belong to Asian countries using the standard Python `==` operator:"
211251
]
212252
},
213253
{
@@ -257,10 +297,11 @@
257297
]
258298
},
259299
{
300+
"attachments": {},
260301
"cell_type": "markdown",
261302
"metadata": {},
262303
"source": [
263-
"Next, we want to find the most populated countries in Asia. To obtain them, we are going to sort the countries by the column `population`, and just fetch the first 10. To sort by a column in Ibis, we can use the `.order_by()` method:"
304+
"Next, we want to find the most populated countries in Asia. We are going to sort the countries by the column `population` and fetch the first 10. We can use the `.order_by()` method to sort by a column:"
264305
]
265306
},
266307
{
@@ -273,10 +314,11 @@
273314
]
274315
},
275316
{
317+
"attachments": {},
276318
"cell_type": "markdown",
277319
"metadata": {},
278320
"source": [
279-
"This will return the least populated countries, since `.order_by` will by default order in ascending order (ascending order like in `1, 2, 3, 4`). This behavior is consistent with SQL `ORDER BY`.\n",
321+
"Because the default for `.order_by` is ascending order (ascending order like in `1, 2, 3, 4`) the operation will return the least populated countries. This behavior is consistent with SQL `ORDER BY`.\n",
280322
"\n",
281323
"To order in descending order we can use `ibis.desc()`:"
282324
]
@@ -291,12 +333,13 @@
291333
]
292334
},
293335
{
336+
"attachments": {},
294337
"cell_type": "markdown",
295338
"metadata": {},
296339
"source": [
297340
"This is the list of the 10 most populated countries based on the data from [GeoNames](https://www.geonames.org/).\n",
298341
"\n",
299-
"To learn more about Ibis, continue to the next tutorial."
342+
"**_To learn more about Ibis, continue to our next tutorial: [Aggregating and joining data](./02-Aggregates-Joins.ipynb)._**"
300343
]
301344
}
302345
],

0 commit comments

Comments
 (0)