Simplify workflow and improve beginner experience#12
Merged
themightychris merged 4 commits intomainfrom Jan 28, 2026
Merged
Conversation
- Rename prefetch_data.py → download_data.py with --defaults flag for zero-config download - Simplify macro to read all data from local data/ directory with glob patterns - Remove base models layer (incremental models with date filtering) - Update staging models to use macro directly, add feed_base64 column - Remove feed/date variables from dbt_project.yml (no longer needed) - Remove httpfs extension from profiles.yml (no longer reading from gs://) - Auto-download sample data in devcontainer and CI workflow - Rewrite README for new two-phase workflow (download → transform) - Create comprehensive docs/downloading_data.md with examples - Update schema documentation to reflect new architecture This change improves the beginner experience by: - Separating download time from dbt run time - Removing complex configuration (feed variables, date ranges) - Simplifying the dbt layer (no incremental materialization) - Making it obvious that data must be downloaded before running dbt Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add inventory.json integration for live feed discovery - Add --list command to show available agencies with date ranges - Add --agency flag for downloading all feeds by agency name - Show estimated download sizes before downloading - Remove read_gtfs_parquet macro (inline read_parquet in staging models) - Delete seeds/available_feeds.csv and scripts/generate_feed_list.py - Update README and docs with simplified workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Separates dbt documentation fragments from human-facing docs by moving them to docs/data/ and updating docs-paths configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update project terminology to better reflect its purpose as a sandbox environment for exploring transit operational data transformation patterns. - Rename database file from workshop.duckdb to sandbox.duckdb - Update README with new introduction describing the project context - Add link to TIDES specification and mention Common Transit Operations Data Framework - Update all references in CI, scripts, and documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major refactoring to simplify the project workflow and improve the beginner experience:
data/directory--listand--agencycommands to discover and download data for any available agencyKey Changes
Data Download Script (
scripts/download_data.py)--defaultsflag for AC Transit sample data--listcommand to browse all available agencies from inventory.json--agencycommand to download all feeds for a specific agencySimplified dbt Architecture
macros/read_gtfs_parquet.sql- logic inlined into staging modelsmodels/staging/base/)dbt_project.ymlhttpfsextension from profiles (no longer needed for GCS)Developer Experience
docs/downloading_data.mdwith comprehensive usage examplesdocs/- dbt fragments moved todocs/data/Terminology
workshop.duckdbtosandbox.duckdbTesting
Generated with Claude Code