Skip to content

Kameshwarsingh/data-engineering-nanodegree-udacity

Repository files navigation

Data Engineering

1 Data Modeling

1.1 Relational and NoSQL data models to fit the diverse needs of data

  Data Modeling using Postgres (Fact and Dimension data model)
  Data Modeling using Apache Cassandra
  When to use and limitations of RDBMS (OLTP, OLAP) or NoSQL data models. 

Data modeling project reads are at :


2 Cloud Data Warehouses and Data Lake

Big data ecosystem and how to use Spark to work with massive datasets.
Difference between Data warehouse and Data lake - Business need and justification

2.1 Cloud Data Warehouse

    RedShift, AWS

Cloud Data Warehouses project read is at :

3 Data Lake

  Why traditional Data warehouse approach does not meet demands of Data scientist and realtime business analytics?

3.1 Data Lake with Spark

        EMR Cluster Spark, AWS

Data Lake project read is at :


4 Data Pipeline

4.1 Data Pipeline using Apache Airflow

How Airflow simplifies Bigdata workflow and fits well in Cloud and BigData context?

  Data Pipelines with Airflow (DAG, Operators, Hooks), AWS
  Store big data in a data lake and query it with Spark, AWS
  Run data quality checks, track data lineage, and work with data pipelines in production.

Data Pipeline project read is at :


5 Capstone Project

  Data Enginnering Capstone

Capstone project read is at:

About

Data Engineering, Udacity, Nanodegree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published