This project is a hands-on learning experience with dbt (data build tool) and Snowflake. It includes setting up the environment, configuring dbt, creating and transforming models, using macro functions, applying tests, and deploying models using Airflow.
- Setup dbt + Snowflake
- Configure dbt_project.yml and Packages
- Create Source and Staging Tables
- Transformed Models (Fact Tables, Data Marts)
- Macro Functions
- Generic and Singular Tests
- Deploy Models Using Airflow
- Lessons learnt
- Credit
- Restriction
- Disclaimer
- Install dbt: Follow the official dbt installation guide to install dbt on your local machine.
- Snowflake Account: Ensure to have access to a Snowflake account.
- Set Up Connection: Configure the connection to Snowflake by setting up your
profiles.ymlfile with the necessary Snowflake credentials.
- dbt_project.yml: Configure the
dbt_project.ymlfile to define the project's configuration, such as project name, version, and profile. - Install Packages: Use the
packages.ymlfile to define and install any required dbt packages.
- Source Tables: Define the source tables.
- Staging Tables: Create staging models in the
models/stagingdirectory to clean and prepare your source data for further transformations.
Data Marts: Create data marts in the models/marts directory to organize data in a way that is useful for analysis and reporting.
Create reusable SQL snippets using dbt's Jinja-based macros. Store these macros in the macros directory.
- Generic Tests: Apply built-in or custom generic tests to ensure data quality.
- Singular Tests: Write singular tests to validate specific business logic or data constraints.
- Setup Airflow: Install and configure Apache Airflow.
- Create DAGs: Define Directed Acyclic Graphs (DAGs) in Airflow to orchestrate the execution of dbt models.
- Deploy: Use Airflow to schedule and run your dbt transformations.
- During the configuration, Snowflake account name should be formatted as
ORGNAME-ACCNAME(refer to the documentation here). - Ensure the
dbt_project.ymlfile is located at a fixed path to prevent errors when importing the DAG into Airflow. - Used the audit log to debug an Airflow failure error.
Code along with Jayzern
The code is provided for educational purposes and demonstration use only.
This code is provided "as is" without warranty of any kind, and I am not liable for any issues that arise from its use. While you are welcome to learn from it, please do not copy or distribute it for your own coursework or assignments without permission.