Skip to content

Commit eeab46c

Browse files
authored
Merge branch 'main' into add_twitter_profile
2 parents 107713a + b99a44b commit eeab46c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+1344324
-13
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Top 3 must read books are:
3131
### Great [list of over 10 communities to join](communities.md):
3232

3333
Top must-join communities for DE:
34-
- [EcZachly Data Engineering Discord](https://discord.gg/JGumAXncAK)
34+
- [DataExpert.io Community Discord](https://discord.gg/JGumAXncAK)
3535
- [Data Talks Club Slack](https://datatalks.club/slack)
3636
- [Data Engineer Things Community](https://www.dataengineerthings.org/aboutus/)
3737

@@ -80,6 +80,7 @@ Top must-join communities for ML:
8080
- [Looker Studio](https://lookerstudio.google.com/overview)
8181
- [Tableau](https://www.tableau.com/)
8282
- [Power BI](https://powerbi.microsoft.com/)
83+
- [Hex](https://hex.ai/)
8384
- [Apache Superset](https://superset.apache.org/)
8485
- [Evidence](https://evidence.dev)
8586
- Data Integration
@@ -96,6 +97,7 @@ Top must-join communities for ML:
9697
- [Apache Kylin](https://kylin.apache.org/)
9798
- [DuckDB](https://duckdb.org/)
9899
- [QuestDB](https://questdb.io/)
100+
- [StarRocks](https://www.starrocks.io/)
99101
- LLM application library
100102
- [AdalFlow](https://github.com/SylphAI-Inc/AdalFlow)
101103
- [LangChain](https://github.com/langchain-ai/langchain)
@@ -131,6 +133,7 @@ Top must-join communities for ML:
131133
- [Building a Universal Data Lakehouse](https://www.onehouse.ai/whitepaper/onehouse-universal-data-lakehouse-whitepaper)
132134
- [XTable in Action: Seamless Interoperability in Data Lakes](https://arxiv.org/abs/2401.09621)
133135
- [MapReduce: Simplified Data Processing on Large Clusters](https://research.google/pubs/mapreduce-simplified-data-processing-on-large-clusters/)
136+
- [Tidy Data](https://vita.had.co.nz/papers/tidy-data.pdf)
134137

135138
## Social Media Accounts
136139

@@ -153,6 +156,7 @@ Here's the mostly comprehensive list of data engineering creators:
153156
| TECHTFQ by Thoufiq | [TECHTFQ by Thoufiq](https://www.youtube.com/@techTFQ) (100k+) | | | | |
154157
| SQLBI | [SQLBI](https://www.youtube.com/@SQLBI) (100k+) | [Marco Russo](https://www.linkedin.com/in/sqlbi) (50k+) | [marcorus](https://x.com/marcorus) (10k+) | | |
155158
| Azure Lib | [Azure Lib](https://www.youtube.com/@azurelib-academy) (10k+) | [Deepak Goyal](https://www.linkedin.com/in/deepak-goyal-93805a17/) (100k+) | | | |
159+
| Prashanth Kumar Pandey | [ScholarNest](https://www.youtube.com/@ScholarNest) (77k+) | [Prashanth Kumar Pandey](https://www.linkedin.com/in/prashant-kumar-pandey/) (37K+) | | | |
156160
| Advancing Analytics | [Advancing Analytics](https://www.youtube.com/@AdvancingAnalytics) (10k+) | [Simon Whiteley](https://www.linkedin.com/in/simon-whiteley-uk/) (10k+) | | | |
157161
| Kahan Data Solutions | [Kahan Data Solutions](https://www.youtube.com/@KahanDataSolutions) (10k+) | | | | |
158162
| Ankit Bansal | [Ankit Bansal](https://youtube.com/@ankitbansal6) (10k+) | [Ankit Bansal](https://www.linkedin.com/in/ankitbansal6/) (50k+) | | | |
@@ -180,8 +184,15 @@ Here's the mostly comprehensive list of data engineering creators:
180184
| Ijaz Ali | | [Ijaz Ali](https://www.linkedin.com/in/ijaz-ali-6aaa87122/) (24K+)
181185
| Subhankar | | [Subhankar](https://www.linkedin.com/in/subhankarumass/) (5k+) | | | |
182186
| Ankur Ranjan | [Big Data Show](https://www.youtube.com/@TheBigDataShow) (100k+) | [Ankur Ranjan](https://www.linkedin.com/in/thebigdatashow/) (48k+) | | | |
187+
| Lenny | | [Lenny A](https://www.linkedin.com/in/lennyardiles/) (6k+) | | | |
188+
| Mehdi Ouazza | [Mehdio DataTV](https://www.youtube.com/@mehdio) (3k+) | [Mehdi Ouazza](https://www.linkedin.com/in/mehd-io/) (20k+) | [mehd_io](https://x.com/mehd_io) | | [@mehdio_datatv](https://www.tiktok.com/@mehdio_datatv) |
189+
| ITVersity | [ITVersity](https://www.youtube.com/@itversity) (67k+) | [Durga Gadiraju](https://www.linkedin.com/in/durga0gadiraju/) (48k+) | | |
190+
| Arnaud Milleker | | [Arnaud Milleker](https://www.linkedin.com/in/arnaudmilleker/) (7k+) | | | |
191+
| Soumil Shah | [Soumil Shah] (https://www.youtube.com/@SoumilShah) (50k) | [Soumil Shah](https://www.linkedin.com/in/shah-soumil/) (8k+) | | | |
192+
| Ananth Packkildurai | | [Ananth Packkildurai](https://www.linkedin.com/in/ananthdurai/) (18k+) | | | |
183193
| Dan Kornas | | | [dankornas](https://www.twitter.com/dankornas) (66k+) | |
184194

195+
185196
### Great Podcasts
186197

187198
- [The Data Engineering Show](https://www.dataengineeringshow.com/)

bootcamp/introduction.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ This will be six weeks of curricula
88
- Day 1 Lab is [here](https://www.dataexpert.io/lesson/dimensional-data-modeling-lab-day-1-yt)
99
- Day 2 Lecture is [here](https://www.dataexpert.io/lesson/dimensional-data-modeling-day-2-lecture-yt)
1010
- Day 2 Lab is [here](https://www.dataexpert.io/lesson/dimensional-data-modeling-day-2-lab-yt)
11+
- Day 3 Lecture is [here](https://www.dataexpert.io/lesson/dimensional-data-modeling-day-3-lecture-yt)
12+
- Day 3 Lab is [here](https://www.dataexpert.io/lesson/dimensional-data-modeling-day-3-lab-yt)
1113
- Fact Data Modeling
1214
- Homework is (to be added)
1315
- Data Quality (analytics)
61.6 KB
Loading

bootcamp/materials/1-dimensional-data-modeling/Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,21 @@ up:
77
cp example.env .env; \
88
exit 1; \
99
fi
10-
docker-compose up -d;
10+
docker compose up -d;
1111

1212
.PHONY: down
1313
down:
14-
docker-compose down -v
14+
docker compose down -v
1515
@if [[ "$(docker ps -q -f name=${DOCKER_CONTAINER})" ]]; then \
1616
echo "Terminating running container..."; \
1717
docker rm ${DOCKER_CONTAINER}; \
1818
fi
1919

2020
.PHONY: restart
2121
restart:
22-
docker-compose down -v; \
22+
docker compose down -v; \
2323
sleep 5; \
24-
docker-compose up -d;
24+
docker compose up -d;
2525

2626
.PHONY: logs
2727
logs:

bootcamp/materials/1-dimensional-data-modeling/README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,13 @@ There are two methods to get Postgres running locally.
4747
2. Run this command after replacing **`<computer-username>`** with your computer's username:
4848
4949
```bash
50-
psql -U <computer-username> postgres < data.dump
50+
pg_restore -U <computer-username> postgres data.dump
5151
```
5252
5353
3. Set up DataGrip, DBeaver, or your VS Code extension to point at your locally running Postgres instance.
5454
4. Have fun querying!
5555
56-
### 🐳 **Option 2: Run Postgres in Docker**
56+
### 🐳 **Option 2: Run Postgres and PGAdmin in Docker**
5757
5858
- Install Docker Desktop from **[here](https://www.docker.com/products/docker-desktop/)**.
5959
- Copy **`example.env`** to **`.env`**:
@@ -79,6 +79,15 @@ There are two methods to get Postgres running locally.
7979
- You can check that your Docker Compose stack is running by either:
8080
- Going into Docker Desktop: you should see an entry there with a drop-down for each of the containers running in your Docker Compose stack.
8181
- Running **`docker ps -a`** and looking for the containers with the name **`postgres`**.
82+
- If you navigate to **`http://localhost:5050`** you will be able to see the PGAdmin instance up and running and should be able to connect to the following server:
83+
![Image showing the setup for PGAdmin](.attachments/pgadmin-server.png)
84+
Where:
85+
- Host name: host.docker.internal (Or container name i.e my-postgres-container)
86+
- Port: 5432
87+
- Username: postgres
88+
- Password: postgres
89+
90+
8291
- When you're finished with your Postgres instance, you can stop the Docker Compose containers with:
8392

8493
```bash
@@ -115,6 +124,10 @@ There are two methods to get Postgres running locally.
115124
- If the test connection is successful, click "Finish" or "Save" to save the connection. You should now be able to use the database client to manage your PostgreSQL database locally.
116125
117126
## **🚨 Tables not loading!? 🚨**
127+
- If you're seeing errors about `error: invalid command \N`, you should use `pg_restore` to load `data.dump`.
128+
```bash
129+
pg_restore -U $POSTGRES_USER -d $POSTGRES_DB data.dump
130+
```
118131
- If you are on Windows and used **`docker compose up`**, table creation and data load will not take place with container creation. Once you have docker container up and verified that you are able to connect to empty postgres database with your own choice of client, follow the following steps:
119132
1. On Docker desktop, connect to my-postgres-container terminal.
120133
2. Run:
@@ -123,7 +136,7 @@ There are two methods to get Postgres running locally.
123136
-v ON_ERROR_STOP=1 \
124137
--username $POSTGRES_USER \
125138
--dbname $POSTGRES_DB \
126-
< /docker-entrypoint-initdb.d/data.dump>
139+
< /docker-entrypoint-initdb.d/data.dump
127140
```
128141
- → This will run the file `data.dump` from inside your docker container.
129142

bootcamp/materials/1-dimensional-data-modeling/docker-compose.yml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ services:
55
container_name: ${DOCKER_CONTAINER}
66
env_file:
77
- .env
8-
- example.env
98
environment:
109
- POSTGRES_DB=${POSTGRES_SCHEMA}
1110
- POSTGRES_USER=${POSTGRES_USER}
@@ -17,6 +16,17 @@ services:
1716
- ./data.dump:/docker-entrypoint-initdb.d/data.dump
1817
- ./scripts/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh
1918
- postgres-data:/var/lib/postgresql/data
20-
19+
pgadmin:
20+
image: dpage/pgadmin4
21+
restart: on-failure
22+
container_name: pgadmin
23+
environment:
24+
- PGADMIN_DEFAULT_EMAIL=${PGADMIN_EMAIL}
25+
- PGADMIN_DEFAULT_PASSWORD=${PGADMIN_PASSWORD}
26+
ports:
27+
- "${PGADMIN_PORT}:80"
28+
volumes:
29+
- pgadmin-data:/var/lib/pgadmin
2130
volumes:
2231
postgres-data:
32+
pgadmin-data:

bootcamp/materials/1-dimensional-data-modeling/example.env

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,8 @@ HOST_PORT=5432
77
CONTAINER_PORT=5432
88

99
DOCKER_CONTAINER=my-postgres-container
10-
DOCKER_IMAGE=my-postgres-image
10+
DOCKER_IMAGE=my-postgres-image
11+
12+
PGADMIN_EMAIL=[email protected]
13+
PGADMIN_PASSWORD=postgres
14+
PGADMIN_PORT=5050

bootcamp/materials/1-dimensional-data-modeling/lecture-lab/players.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
draft_round TEXT,
1919
draft_number TEXT,
2020
seasons season_stats[],
21-
scorer_class scoring_class,
21+
scoring_class scoring_class,
2222
years_since_last_active INTEGER,
2323
is_active BOOLEAN,
2424
current_season INTEGER,

bootcamp/materials/1-dimensional-data-modeling/sql/load_players_table_day2.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ SELECT
6363
WHEN (seasons[CARDINALITY(seasons)]::season_stats).pts > 15 THEN 'good'
6464
WHEN (seasons[CARDINALITY(seasons)]::season_stats).pts > 10 THEN 'average'
6565
ELSE 'bad'
66-
END::scorer_class AS scorer_class,
66+
END::scoring_class AS scoring_class,
6767
w.season - (seasons[CARDINALITY(seasons)]::season_stats).season as years_since_last_active,
6868
w.season,
6969
(seasons[CARDINALITY(seasons)]::season_stats).season = season AS is_active
4.51 MB
Loading

0 commit comments

Comments
 (0)