Skip to content

TEAMSchools/teamster

teamster 🚛

kipptaf kippnewark kippcamden kippmiami

uv Trunk License: AGPL v3 Contributor Covenant

Photograph taken in 1960. Upload from http://www.fortepan.hu/?lang=en&img=20566, part of Commons:Batch_uploading/Fortepan.HU

Next-gen data orchestration for KIPP TEAM & Family Schools

Teamster is the data engineering platform powering analytics and reporting across KIPP Newark, Camden, Miami, and Paterson. It ingests data from 30+ source systems, transforms it through dbt, and delivers it to Tableau, Google Sheets, PowerSchool, and other consumers — all orchestrated by Dagster.

  • 🎻 Dagster — orchestrates every ETL step across five code locations, one per school network; observe and run pipelines in Dagster Cloud
  • 🔧 dbt — transforms raw source data into staging, intermediate, mart, and extract models in Google BigQuery
  • 🚿 dlt — loads data from API sources into BigQuery alongside dbt
  • 🔀 Airbyte — managed connector pipelines for select integrations
  • 🪣 Google Cloud Storage — intermediate storage layer between pipeline steps
  • ☸️ Google Kubernetes Engine — runs each code location in its own container in production
  • ⚙️ GitHub Actions — CI/CD for building and deploying code locations
  • 📊 Tableau — primary BI consumer; Dagster manages workbook extract refreshes

📖 Background

KIPP's data infrastructure was previously a patchwork of Python scripts, cron jobs, stored procedures, Fivetran, and Selenium automation spread across multiple databases. Synchronous scheduling meant a slow pull from one system would cascade into downstream failures. A single data engineer spent more time firefighting than building.

Teamster replaced all of it with a unified, asset-based platform. The results:

  • ⚡ Pipeline development time dropped from weeks to days
  • 🎫 Data-related support tickets fell 30% year-over-year
  • 🧑‍💻 Analysts gained Git, SQL, and DevOps skills through shared PR workflows
  • 🔔 Real-time Slack alerts replaced reactive debugging

"The visibility into the pipelines is a game changer. We know as soon as something fails and why."

Read the full story in the Dagster case study.

🚀 Get started

New to the project? Start here:

  1. Guides — account setup and task-focused walkthroughs
  2. Architecture — how the code is organized
  3. Contributing — workflow and PR guidelines

📚 Reference

Topic Description
Automations All schedules and sensors across every code location
Automation Conditions How asset auto-materialization works
Adding an Integration Step-by-step guide for new data sources
dbt Conventions Model naming, contracts, and testing standards
IO Managers How intermediate data is stored in GCS
Fiscal Year & Partitioning Partition strategy for historical loads

🗺️ Guides & Troubleshooting

Topic Description
Dagster Guide Tableau scheduling, backfills, branch deployments
Google Sheets & Forms Adding and updating Google Sheets sources
Troubleshooting: Dagster Pipeline failures, partitions, unsynced views
Troubleshooting: dbt Contract violations, compilation errors, test failures
Troubleshooting: VS Code Interpreter, secrets, Trunk, container issues

About

A modern data stack for K-12 education

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages