Skip to content

Commit 9cd0d18

Browse files
committed
Updated notebooks with proofreading fixes.
1 parent fbb5d59 commit 9cd0d18

File tree

4 files changed

+43
-1656
lines changed

4 files changed

+43
-1656
lines changed

notebooks/1c_visualization.ipynb

Lines changed: 34 additions & 34 deletions
Large diffs are not rendered by default.

notebooks/2a_planning.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"\n",
1111
"1. Identifier transformation \n",
1212
"1. Identifier coding\n",
13-
"1. Data mock ups"
13+
"1. Data mockups"
1414
]
1515
},
1616
{
@@ -718,18 +718,18 @@
718718
"cell_type": "markdown",
719719
"metadata": {},
720720
"source": [
721-
"# Data mock ups\n",
721+
"# Data mockups\n",
722722
"\n",
723723
"At the planning and pilot study stage, we may have a complex and labor-intensive data collection yet to do.\n",
724724
"As a result, we will not have some of the data that we need in order to make sure that we can fit everything together.\n",
725725
"\n",
726-
"A data mock up is a form of data that we create—often manually—to simulate the form of the data that we will retrieve in a subsequent collection.\n",
726+
"A data mockup is a form of data that we create—often manually—to simulate the form of the data that we will retrieve in a subsequent collection.\n",
727727
"This is common for data obtained by web scraping, human coding, or other time-intensive processes.\n",
728728
"Before starting such a collection, we need to know that it will produce the data that we need.\n",
729729
"If we are designing the collection ourselves, it may serve as a target for the form of data produced.\n",
730730
"\n",
731731
"My favorite tool for producing data mockups is a manually-created CSV file.\n",
732-
"Unlike Excel spreadsheets (with a lot of internal complexity and sometimes well-intended but harmful automatic behavior), a CSV file is what the name describes: comma separated values.\n",
732+
"Unlike Excel spreadsheets (with a lot of internal complexity and sometimes well-intended but harmful automatic behavior), a CSV file is what the name describes: comma-separated values.\n",
733733
"To make one manually, we simply type (or, more likely, copy and paste) into a file in a text editor."
734734
]
735735
},
@@ -739,7 +739,7 @@
739739
"source": [
740740
"## CSV example\n",
741741
"\n",
742-
"The contents of a CSV file looks like this:\n",
742+
"The contents of a CSV file look like this:\n",
743743
"\n",
744744
"```csv\n",
745745
"price,tic,yr\n",

notebooks/3a_retrieval3.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -480,7 +480,7 @@
480480
"\n",
481481
"First, I used the `LIMIT` keyword with a value of `10`.\n",
482482
"Compustat is a huge dataset, and retrieving everything would be a big download.\n",
483-
"While we are experimenting or iterating on a query, using `LIMIT` asks the server to provide only a number of results up to the parameter to limit.\n",
483+
"When we are experimenting or iterating on a query, using `LIMIT` asks the server to provide only a number of results up to the parameter to limit.\n",
484484
"This is a strong norm when using this kind of data, as it dramatically reduces the load on the server.\n",
485485
"`LIMIT` becomes more important as we ask the server to do transformation work for us, which increases the computational demand.\n",
486486
"\n",
@@ -894,7 +894,7 @@
894894
"First, we asked for the `cusip` column to be called `cusip9` in our results using `AS`.\n",
895895
"Second, we used a function to transform the `cusip` column (using the `SUBSTRING()` function) to give us only eight characters and to name it `cusip8`.\n",
896896
"This is a simple example of having the server do prep work for us.\n",
897-
"Finally, we added a second condition to `WHERE`, a year restriction."
897+
"Finally, we added a second condition to `WHERE`: a year restriction."
898898
]
899899
},
900900
{
@@ -903,14 +903,14 @@
903903
"source": [
904904
"# Aggregation\n",
905905
"\n",
906-
"Sometimes, the data in a table is more granular than the data that we want out.\n",
906+
"Sometimes, the data in a table is more granular than the data that we returned to us.\n",
907907
"So, we can ask the server to aggregate it for us, returning an aggregated dataset.\n",
908908
"\n",
909909
"There are a few important things to know:\n",
910910
"\n",
911911
"1. We use `GROUP BY` to tell the DBMS how to group rows before aggregating.\n",
912912
"2. Every column must either be in the `GROUP BY` or have an aggregation function applied. A notable example here is that we ask for the `MAX` of the company name. If the name changes in the rows of the search, the DBMS would need to know how to choose. However, this is enforced as a general rule, not only when there is an actual conflict to resolve.\n",
913-
"3. Order of the statements matter. For example, `WHERE` needs to be after `FROM` and before `GROUP BY`. I've done them here, so it will work, but this is a topic better explored in a book on the topic."
913+
"3. Order of the statements matter. For example, `WHERE` needs to be after `FROM` and before `GROUP BY`. I've done them here, so it will work, but this is a topic better explored in an introductory book on SQL."
914914
]
915915
},
916916
{

0 commit comments

Comments
 (0)