Data Modeling using Postgres (Fact and Dimension data model)
Data Modeling using Apache Cassandra
When to use and limitations of RDBMS (OLTP, OLAP) or NoSQL data models.
- https://github.com/Kameshwarsingh/data-engineering-nanodegree-udacity/blob/main/data-modeling/project_nonrelationaldatabase/README.md
- https://github.com/Kameshwarsingh/data-engineering-nanodegree-udacity/blob/main/data-modeling/project_relationaldatabase/README.md
Big data ecosystem and how to use Spark to work with massive datasets.
Difference between Data warehouse and Data lake - Business need and justification
RedShift, AWS
Why traditional Data warehouse approach does not meet demands of Data scientist and realtime business analytics?
EMR Cluster Spark, AWS
How Airflow simplifies Bigdata workflow and fits well in Cloud and BigData context?
Data Pipelines with Airflow (DAG, Operators, Hooks), AWS
Store big data in a data lake and query it with Spark, AWS
Run data quality checks, track data lineage, and work with data pipelines in production.
Data Enginnering Capstone