A hands-on dbt project for learning data pipeline testing and monitoring using the complete Chinook database with real production-like data.
This repository accompanies Module 4: Data Pipeline Testing and Monitoring and contains 8 hands-on labs across 4 decks:
| Deck | Topic | Labs |
|---|---|---|
| 1 | Introduction to Testing Tools | Lab 1.1, Lab 1.2 |
| 2 | Testing Data Pipelines | Lab 2.1, Lab 2.2 |
| 3 | Writing Unit and Integration Tests | Lab 3.1, Lab 3.2 |
| 4 | Monitoring and Maintenance | Lab 4.1, Lab 4.2 |
- β Connect dbt to real data in BigQuery
- β Write schema tests (unique, not_null, accepted_values, relationships)
- β Create singular tests for complex business logic
- β Build custom generic tests (reusable macros)
- β Implement unit tests for transformation logic
- β Set up integration tests across models
- β Configure monitoring and alerts
- β Debug pipeline failures systematically
The Chinook database represents a digital music store (like iTunes) with real, complete data:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Artist ββββββΆβ Album ββββββΆβ Track β
β 275 records β β 347 records β β 3,503 records β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Customer ββββββΆβ Invoice ββββββΆβ InvoiceLine β
β 59 records β β 412 records β β 2,240 records β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Total: 15,000+ records across 11 tables - This is real-world scale data!
| Table | Records | Description |
|---|---|---|
| Artist | 275 | Music artists and bands |
| Album | 347 | Albums linked to artists |
| Track | 3,503 | Individual songs with pricing |
| Genre | 25 | Music genres |
| MediaType | 5 | File format types |
| Customer | 59 | Customer information |
| Employee | 8 | Sales representatives |
| Invoice | 412 | Purchase transactions |
| InvoiceLine | 2,240 | Line items per invoice |
| Playlist | 18 | Music playlists |
| PlaylistTrack | 8,715 | Playlist-track associations |
Before starting, ensure you have:
- Google Cloud account with a GCP project
- BigQuery API enabled in your project
- Python 3.8+ installed
- Google Cloud SDK installed (
gcloud)
git clone https://github.com/your-org/chinook-dbt-testing-labs.git
cd chinook-dbt-testing-labs# Create virtual environment
python -m venv venv
# Activate it
# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Login to Google Cloud
gcloud auth login
# Set your project
gcloud config set project YOUR_PROJECT_ID
# Create application default credentials
gcloud auth application-default loginThis is the key step! Run our data loader to populate your BigQuery project with the complete Chinook database:
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_IDThis script will:
- Download the official Chinook database
- Create a
chinook_rawdataset in your BigQuery project - Load all 11 tables with complete data
- Verify the load was successful
Expected output:
π΅ Chinook Database Loader for BigQuery
==========================================
π₯ Downloading Chinook database...
β
Downloaded
π Loading tables to BigQuery...
β
Artist: 275 rows loaded
β
Album: 347 rows loaded
β
Track: 3,503 rows loaded
...
π SUCCESS! Chinook database loaded to BigQuery
Create or edit ~/.dbt/profiles.yml:
chinook_testing:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: YOUR_PROJECT_ID # <-- Your GCP project ID
dataset: chinook_dev # dbt creates this for transformed models
location: US
threads: 4
timeout_seconds: 300# Test the connection
dbt debug
# Install dbt packages
dbt deps
# Build all models
dbt run
# Run all tests
dbt testYou should see:
Completed successfully
Done. PASS=X WARN=0 ERROR=0
π You're ready to start the labs!
chinook-dbt-testing-labs/
βββ README.md # This file
βββ dbt_project.yml # dbt project configuration
βββ packages.yml # dbt package dependencies
βββ requirements.txt # Python dependencies
β
βββ scripts/
β βββ load_chinook_to_bigquery.py # β¬
οΈ Run this first!
β βββ setup.sh # Quick setup script
β βββ run_tests_with_retry.sh # Test runner with retries
β
βββ models/
β βββ staging/ # Source β Staging transformations
β β βββ _stg_chinook.yml # Source definitions & tests
β β βββ stg_artists.sql
β β βββ stg_albums.sql
β β βββ stg_tracks.sql
β β βββ stg_genres.sql
β β βββ stg_media_types.sql
β β βββ stg_customers.sql
β β βββ stg_employees.sql
β β βββ stg_invoices.sql
β β βββ stg_invoice_lines.sql
β β
β βββ intermediate/ # Business logic transformations
β β βββ _int_chinook.yml
β β βββ int_tracks_enriched.sql
β β βββ int_invoice_totals.sql
β β
β βββ marts/ # Analytics-ready models
β β βββ _marts_chinook.yml
β β βββ dim_customers.sql
β β βββ dim_tracks.sql
β β βββ fct_sales.sql
β β
β βββ unit_tests/ # Unit test models
β βββ ...
β
βββ tests/ # Test SQL files
β βββ assert_*.sql # Singular tests
β βββ integration/ # Integration tests
β βββ monitoring/ # Monitoring tests
β βββ unit_tests/ # Unit test assertions
β
βββ macros/tests/ # Custom generic tests
β βββ test_is_positive.sql
β βββ test_valid_email.sql
β βββ test_within_range.sql
β
βββ seeds/ # Test fixtures only (not source data)
β βββ test_tracks_input.csv
β βββ test_tracks_expected.csv
β
βββ labs/ # Lab instructions
β βββ deck1/
β β βββ LAB_1_1_explore_chinook.md
β β βββ LAB_1_2_first_tests.md
β βββ deck2/
β β βββ LAB_2_1_schema_tests.md
β β βββ LAB_2_2_singular_tests.md
β βββ deck3/
β β βββ LAB_3_1_unit_tests.md
β β βββ LAB_3_2_integration_tests.md
β βββ deck4/
β βββ LAB_4_1_monitoring_setup.md
β βββ LAB_4_2_debugging_failures.md
β
βββ analyses/ # Ad-hoc debug queries
βββ debug_invoice_issues.sql
- Lab 1.1: Explore Chinook & Build Your First Models
- Lab 1.2: Write Your First dbt Tests
- Lab 2.1: Master Schema Tests (unique, not_null, relationships)
- Lab 2.2: Create Singular Tests for Business Logic
- Lab 3.1: Build Unit Tests with Test Fixtures
- Lab 3.2: Implement Integration Tests Across Models
- Lab 4.1: Set Up Freshness Monitoring & Alerts
- Lab 4.2: Debug Pipeline Failures Systematically
# Load data to BigQuery (run once)
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_ID
# Build all models
dbt run
# Run all tests
dbt test
# Build and test together
dbt build
# Run specific model
dbt run --select stg_customers
# Test specific model
dbt test --select stg_customers
# Run only schema tests
dbt test --select test_type:schema
# Run only singular tests
dbt test --select test_type:singular
# Generate and view documentation
dbt docs generate && dbt docs serve# Make sure you're authenticated
gcloud auth application-default loginThe loader creates chinook_raw dataset. Make sure you ran:
python scripts/load_chinook_to_bigquery.py --project YOUR_PROJECT_IDCheck that your profile points to the correct project where you loaded the data.
# Store failures for investigation
dbt test --store-failures
# Then query the failure table in BigQueryThis project is for educational purposes as part of the CBF Data Engineering curriculum.
Happy Testing! π