Skip to content

Commit 35c1509

Browse files
authored
Merge pull request #152 from advanced-computing/week-8-rearrange
no guest speaker during Lecture 8
2 parents c8e1a1d + b706f01 commit 35c1509

File tree

3 files changed

+23
-51
lines changed

3 files changed

+23
-51
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ You can find the rubric under the [Assignment](https://courseworks2.columbia.edu
130130
| 5 | 2/19 | [Automated testing](lectures/lecture_05.md) | [Readings](readings/week_05.md), [Project Part 2](docs/project.md#part-2) | [Data profiling/quality](labs/lab_05.md) | [Lab 4](labs/lab_04.md) |
131131
| 6 | 2/26 | [Organizing code](lectures/lecture_06.md) | [Readings](readings/week_06.md), [Project Part 3](docs/project.md#part-3) | [Continuous integration](labs/lab_06.md) | [Lab 5](labs/lab_05.md) |
132132
| 7 | 3/5 | [Databases](lectures/lecture_07.md) | [Readings](readings/week_07.md) | [Databases](labs/lab_07.md) | [Lab 6](labs/lab_06.md) |
133-
| 8 | 3/12 | [Guest speaker; data warehousing](lectures/lecture_08.md) | [Project Part 4](docs/project.md#part-4) | [Data loading](labs/lab_08.md) | [Lab 7](labs/lab_07.md) |
133+
| 8 | 3/12 | [Data warehousing](lectures/lecture_08.md) | [Project Part 4](docs/project.md#part-4) | [Data loading](labs/lab_08.md) | [Lab 7](labs/lab_07.md) |
134134
| 9 | 3/19 | none ([Spring Recess][recess]) | none | none ([Spring Recess][recess]) | none |
135135
| 10 | 3/26 | [Data engineering (ETL)](lectures/lecture_10.md) | [Project Part 5](docs/project.md#part-5) | [Data loading, continued](labs/lab_10.md) | [Lab 8](labs/lab_08.md) |
136136
| 11 | 4/2 | [Data engineering, continued (pipelines)](lectures/lecture_11.md) | [Readings](readings/week_11.md), [Project check-in](docs/project.md#check-in) | [Process mapping](labs/lab_11.md) | [Lab 10](labs/lab_10.md) |

labs/lab_08.md

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,6 @@
11
# Lab 8
22

3-
**Goal:** Understand different methods of loading data
4-
5-
---
6-
7-
## Data loading
8-
9-
- Append load
10-
- Trunc(ate) and load
11-
- Incremental load
12-
13-
---
14-
15-
Let's say you were given access to a random table that uses one of the three data loading methods above. How would you tell which it was?
16-
17-
---
18-
19-
### Incremental load
20-
21-
The trick is avoiding duplicates. Your script might then need to say something like:
22-
23-
1. What's the latest timestamp in the database?
24-
1. Pull data from the API that's more recent than that.
3+
**Goal:** Practice data warehousing
254

265
---
276

lectures/lecture_08.md

Lines changed: 21 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,6 @@
22

33
---
44

5-
## _gestures at everything_
6-
7-
---
8-
9-
## Feedback
10-
11-
- Getting a lot of new information
12-
- Don't understand where we're going
13-
14-
---
15-
16-
Next lecture, we'll zoom out.
17-
18-
---
19-
20-
## Guest speaker
21-
22-
> [John Paul Farmer](https://www.linkedin.com/in/johnpaulfarmer) served as the 3rd Chief Technology Officer of New York City, taking point on everything from broadband to digital services to AI. Prior to that, he spent a handful of years at Microsoft, building connections with cities and the civic tech community. Previously, he was Senior Advisor for Innovation in the White House Office of Science and Technology Policy under President Obama, where he confounded and led the Presidential Innovation Fellows. He has also served as an adjunct associate professor at a Columbia and a Fellow of the University of Pennsylvania’s Institute for Urban Research. Most recently, he served as President of a next-gen broadband technology company and is now the President of Smart City Expo USA.
23-
24-
---
25-
26-
## Intros
27-
28-
- Name
29-
- What you're passionate about
30-
31-
---
32-
335
## [Retro](../docs/project.md#retro)
346

357
Anything you'd like to share?
@@ -107,6 +79,27 @@ COMMIT;
10779

10880
---
10981

82+
## Data loading
83+
84+
- Append load
85+
- Trunc(ate) and load
86+
- Incremental load
87+
88+
---
89+
90+
Let's say you were given access to a random table that uses one of the three data loading methods above. How would you tell which it was?
91+
92+
---
93+
94+
### Incremental load
95+
96+
The trick is avoiding duplicates. Your script might then need to say something like:
97+
98+
1. What's the latest timestamp in the database?
99+
1. Pull data from the API that's more recent than that.
100+
101+
---
102+
110103
## [Project Part 5](../docs/project.md#part-5)
111104

112105
---

0 commit comments

Comments
 (0)