DataExpert-io
diff --git a/‎README.md
Lines changed: 5 additions & 1 deletion b/‎README.md
Lines changed: 5 additions & 1 deletion
diff --git a/‎bootcamp/materials/1-dimensional-data-modeling/Makefile
Lines changed: 4 additions & 4 deletions b/‎bootcamp/materials/1-dimensional-data-modeling/Makefile
Lines changed: 4 additions & 4 deletions
diff --git a/‎bootcamp/materials/1-dimensional-data-modeling/README.md
Lines changed: 6 additions & 2 deletions b/‎bootcamp/materials/1-dimensional-data-modeling/README.md
Lines changed: 6 additions & 2 deletions
diff --git a/‎bootcamp/materials/1-dimensional-data-modeling/docker-compose.yml
Lines changed: 0 additions & 1 deletion b/‎bootcamp/materials/1-dimensional-data-modeling/docker-compose.yml
Lines changed: 0 additions & 1 deletion
diff --git a/‎bootcamp/materials/1-dimensional-data-modeling/lecture-lab/players.sql
Lines changed: 1 addition & 1 deletion b/‎bootcamp/materials/1-dimensional-data-modeling/lecture-lab/players.sql
Lines changed: 1 addition & 1 deletion
diff --git a/‎bootcamp/materials/1-dimensional-data-modeling/sql/load_players_table_day2.sql
Lines changed: 1 addition & 1 deletion b/‎bootcamp/materials/1-dimensional-data-modeling/sql/load_players_table_day2.sql
Lines changed: 1 addition & 1 deletion
diff --git a/‎bootcamp/materials/2-fact-data-modeling/.gitignore
Lines changed: 138 additions & 0 deletions b/‎bootcamp/materials/2-fact-data-modeling/.gitignore
Lines changed: 138 additions & 0 deletions
diff --git a/‎bootcamp/materials/2-fact-data-modeling/README.md
Lines changed: 3 additions & 0 deletions b/‎bootcamp/materials/2-fact-data-modeling/README.md
Lines changed: 3 additions & 0 deletions
diff --git a/‎bootcamp/materials/2-fact-data-modeling/homework/.gitkeep b/‎bootcamp/materials/2-fact-data-modeling/homework/.gitkeep
diff --git a/‎bootcamp/materials/2-fact-data-modeling/homework/homework.md
Lines changed: 31 additions & 0 deletions b/‎bootcamp/materials/2-fact-data-modeling/homework/homework.md
Lines changed: 31 additions & 0 deletions
@@ -31,7 +31,7 @@ Top 3 must read books are:
 ### Great [list of over 10 communities to join](communities.md):
 
 Top must-join communities for DE:
-- [EcZachly Data Engineering Discord](https://discord.gg/JGumAXncAK)
+- [DataExpert.io Community Discord](https://discord.gg/JGumAXncAK)
 - [Data Talks Club Slack](https://datatalks.club/slack)
 - [Data Engineer Things Community](https://www.dataengineerthings.org/aboutus/)
 
@@ -80,6 +80,7 @@ Top must-join communities for ML:
   - [Looker Studio](https://lookerstudio.google.com/overview)
   - [Tableau](https://www.tableau.com/)
   - [Power BI](https://powerbi.microsoft.com/)
+  - [Hex](https://hex.ai/)
   - [Apache Superset](https://superset.apache.org/)
   - [Evidence](https://evidence.dev)
 - Data Integration
@@ -183,6 +184,9 @@ Here's the mostly comprehensive list of data engineering creators:
 | Lenny            |                                                                                                                           | [Lenny A](https://www.linkedin.com/in/lennyardiles/) (6k+)                                                                    |                                                                                                                                                                                 |                                                                                                               |                                                                                                                                                                                                     |
 | Mehdi Ouazza | [Mehdio DataTV](https://www.youtube.com/@mehdio) (3k+)                                                               | [Mehdi Ouazza](https://www.linkedin.com/in/mehd-io/) (20k+)                                                                       | [mehd_io](https://x.com/mehd_io)                                                                                                                             |                                                  | [@mehdio_datatv](https://www.tiktok.com/@mehdio_datatv)                                                                                                                                                 |
 | ITVersity           | [ITVersity](https://www.youtube.com/@itversity) (67k+)                                                        | [Durga Gadiraju](https://www.linkedin.com/in/durga0gadiraju/) (48k+)                                                               |                                                                                                                                                                                 |                                                                                                               |                                                                  
+| Arnaud Milleker      |                                                                                                                           | [Arnaud Milleker](https://www.linkedin.com/in/arnaudmilleker/) (7k+)                                                      |                                                                                                                                                                                 |                                                                                                               |                                                                                                                                                                                                     |
+| Soumil Shah      | [Soumil Shah] (https://www.youtube.com/@SoumilShah) (50k) | [Soumil Shah](https://www.linkedin.com/in/shah-soumil/) (8k+) |                                                                                                                                                                                 |                                                                                                               |                                                                                                                                                                                                     |
+| Ananth Packkildurai      |  | [Ananth Packkildurai](https://www.linkedin.com/in/ananthdurai/) (18k+) |                                                                                                                                                                                 |                                                                                                               |                                                                                                                                                                                                     |
 
 ### Great Podcasts
 
 
@@ -7,21 +7,21 @@ up:
 		cp example.env .env; \
         exit 1; \
 	fi
-	docker-compose up -d;
+	docker compose up -d;
 
 .PHONY: down
 down:
-	docker-compose down -v
+	docker compose down -v
 	@if [[ "$(docker ps -q -f name=${DOCKER_CONTAINER})" ]]; then \
 		echo "Terminating running container..."; \
 		docker rm ${DOCKER_CONTAINER}; \
 	fi
 
 .PHONY: restart
 restart:
-	docker-compose down -v; \
+	docker compose down -v; \
 	sleep 5; \
-	docker-compose up -d;
+	docker compose up -d;
 
 .PHONY: logs
 logs:
 
@@ -47,7 +47,7 @@ There are two methods to get Postgres running locally.
 2. Run this command after replacing **`<computer-username>`** with your computer's username:
     
     ```bash
-    psql -U <computer-username> postgres < data.dump
+    pg_restore -U <computer-username> postgres data.dump
     ```
     
 3. Set up DataGrip, DBeaver, or your VS Code extension to point at your locally running Postgres instance.
@@ -124,6 +124,10 @@ Where:
 - If the test connection is successful, click "Finish" or "Save" to save the connection. You should now be able to use the database client to manage your PostgreSQL database locally.
 
 ## **🚨 Tables not loading!? 🚨**
+- If you're seeing errors about `error: invalid command \N`, you should use `pg_restore` to load `data.dump`.
+```bash
+  pg_restore -U $POSTGRES_USER -d $POSTGRES_DB data.dump
+```
 - If you are on Windows and used **`docker compose up`**, table creation and data load will not take place with container creation. Once you have docker container up and verified that you are able to connect to empty postgres database with your own choice of client, follow the following steps:
 1. On Docker desktop, connect to my-postgres-container terminal.
 2. Run:
@@ -132,7 +136,7 @@ Where:
         -v ON_ERROR_STOP=1 \
         --username $POSTGRES_USER \
         --dbname $POSTGRES_DB \
-        < /docker-entrypoint-initdb.d/data.dump>
+        < /docker-entrypoint-initdb.d/data.dump
     ```
     - → This will run the file `data.dump` from inside your docker container.
 
 
@@ -5,7 +5,6 @@ services:
     container_name: ${DOCKER_CONTAINER}
     env_file:
       - .env
-      - example.env
     environment:
       - POSTGRES_DB=${POSTGRES_SCHEMA}
       - POSTGRES_USER=${POSTGRES_USER}
 
@@ -18,7 +18,7 @@
      draft_round TEXT,
      draft_number TEXT,
      seasons season_stats[],
-     scorer_class scoring_class,
+     scoring_class scoring_class,
      years_since_last_active INTEGER,
      is_active BOOLEAN,
      current_season INTEGER,
 
@@ -63,7 +63,7 @@ SELECT
         WHEN (seasons[CARDINALITY(seasons)]::season_stats).pts > 15 THEN 'good'
         WHEN (seasons[CARDINALITY(seasons)]::season_stats).pts > 10 THEN 'average'
         ELSE 'bad'
-    END::scorer_class AS scorer_class,
+    END::scoring_class AS scoring_class,
     w.season - (seasons[CARDINALITY(seasons)]::season_stats).season as years_since_last_active,
     w.season,
     (seasons[CARDINALITY(seasons)]::season_stats).season = season AS is_active
 
@@ -0,0 +1,138 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+dump.sql
+
+# Personal workspace files
+.idea/*
+.vscode/*
+
+postgres-data/*
+homework/your_username
@@ -0,0 +1,3 @@
+# Week 2 Fact Data Modeling
+
+This repo follows the same setup as week 1. Please go to the dimensional data modeling [README](../1-dimensional-data-modeling/README.md) for instructions.
@@ -0,0 +1,31 @@
+# Week 2 Fact Data Modeling
+The homework this week will be using the `devices` and `events` dataset
+
+Construct the following eight queries:
+
+- A query to deduplicate `game_details` from Day 1 so there's no duplicates
+
+- A DDL for an `user_devices_cumulated` table that has:
+  - a `device_activity_datelist` which tracks a users active days by `browser_type`
+  - data type here should look similar to `MAP<STRING, ARRAY[DATE]>`
+    - or you could have `browser_type` as a column with multiple rows for each user (either way works, just be consistent!)
+
+- A cumulative query to generate `device_activity_datelist` from `events`
+
+- A `datelist_int` generation query. Convert the `device_activity_datelist` column into a `datelist_int` column 
+
+- A DDL for `hosts_cumulated` table 
+  - a `host_activity_datelist` which logs to see which dates each host is experiencing any activity
+  
+- The incremental query to generate `host_activity_datelist`
+
+- A monthly, reduced fact table DDL `host_activity_reduced`
+   - month
+   - host
+   - hit_array - think COUNT(1)
+   - unique_visitors array -  think COUNT(DISTINCT user_id)
+
+- An incremental query that loads `host_activity_reduced`
+  - day-by-day
+
+Please add these queries into a folder, zip them up and submit [here](https://bootcamp.techcreator.io)
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Week 2 Fact Data Modeling`
	`2`	`+`
	`3`	`+This repo follows the same setup as week 1. Please go to the dimensional data modeling [README](../1-dimensional-data-modeling/README.md) for instructions.`