Teamster is the data engineering platform powering analytics and reporting across KIPP Newark, Camden, Miami, and Paterson. It ingests data from 30+ source systems, transforms it through dbt, and delivers it to Tableau, Google Sheets, PowerSchool, and other consumers — all orchestrated by Dagster.
- 🎻 Dagster — orchestrates every ETL step across five code locations, one per school network; observe and run pipelines in Dagster Cloud
- 🔧 dbt — transforms raw source data into staging, intermediate, mart, and extract models in Google BigQuery
- 🚿 dlt — loads data from API sources into BigQuery alongside dbt
- 🔀 Airbyte — managed connector pipelines for select integrations
- 🪣 Google Cloud Storage — intermediate storage layer between pipeline steps
- ☸️ Google Kubernetes Engine — runs each code location in its own container in production
- ⚙️ GitHub Actions — CI/CD for building and deploying code locations
- 📊 Tableau — primary BI consumer; Dagster manages workbook extract refreshes
KIPP's data infrastructure was previously a patchwork of Python scripts, cron jobs, stored procedures, Fivetran, and Selenium automation spread across multiple databases. Synchronous scheduling meant a slow pull from one system would cascade into downstream failures. A single data engineer spent more time firefighting than building.
Teamster replaced all of it with a unified, asset-based platform. The results:
- ⚡ Pipeline development time dropped from weeks to days
- 🎫 Data-related support tickets fell 30% year-over-year
- 🧑💻 Analysts gained Git, SQL, and DevOps skills through shared PR workflows
- 🔔 Real-time Slack alerts replaced reactive debugging
"The visibility into the pipelines is a game changer. We know as soon as something fails and why."
Read the full story in the Dagster case study.
New to the project? Start here:
- Guides — account setup and task-focused walkthroughs
- Architecture — how the code is organized
- Contributing — workflow and PR guidelines
| Topic | Description |
|---|---|
| Automations | All schedules and sensors across every code location |
| Automation Conditions | How asset auto-materialization works |
| Adding an Integration | Step-by-step guide for new data sources |
| dbt Conventions | Model naming, contracts, and testing standards |
| IO Managers | How intermediate data is stored in GCS |
| Fiscal Year & Partitioning | Partition strategy for historical loads |
| Topic | Description |
|---|---|
| Dagster Guide | Tableau scheduling, backfills, branch deployments |
| Google Sheets & Forms | Adding and updating Google Sheets sources |
| Troubleshooting: Dagster | Pipeline failures, partitions, unsynced views |
| Troubleshooting: dbt | Contract violations, compilation errors, test failures |
| Troubleshooting: VS Code | Interpreter, secrets, Trunk, container issues |
