You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Query real-time transit data using DuckDB and dbt.
3
+
A sandbox environment for exploring transit operational data transformation patterns using DuckDB and dbt. Part of the **Common Transit Operations Data Framework**, this demo shows how raw operational data can be transformed into [TIDES](https://tides-transit.org/)-compliant analytics tables using architectural patterns that scale from a laptop to enterprise cloud infrastructure.
4
4
5
-
## Overview
6
-
7
-
This workshop demonstrates how to query GTFS Realtime parquet data from a public GCS bucket using DuckDB's httpfs extension and dbt for data transformation.
8
-
9
-
**Data source**: `gs://parquet.gtfsrt.io/` (also available at <http://parquet.gtfsrt.io/>)
This sandbox uses publicly available GTFS-RT feeds as source data. In production, you would typically use raw AVL system exports which contain richer data, but GTFS-RT provides an accessible starting point for learning the patterns.
16
6
17
7
## Quick Start
18
8
19
9
### Option 1: GitHub Codespaces (Recommended)
20
10
21
11
1. Click the green "Code" button → "Open with Codespaces"
22
-
2. Wait for the container to build (~2 minutes)
12
+
2. Wait for setup (~3 minutes, includes sample data download)
23
13
3. Run dbt:
24
14
25
15
```bash
@@ -29,7 +19,7 @@ Three feed types are available:
29
19
4. Query your data:
30
20
31
21
```bash
32
-
duckdb workshop.duckdb -ui
22
+
duckdb sandbox.duckdb -ui
33
23
```
34
24
35
25
### Option 2: Local Setup
@@ -44,147 +34,88 @@ Three feed types are available:
> **Note:** If you get a "Failed to download extension" error with `-ui`, see [DuckDB UI Extension Error](docs/troubleshooting.md#duckdb-ui-extension-error).
59
51
60
-
## Choosing a Feed
61
-
62
-
Available feeds are listed in `seeds/available_feeds.csv`. To use a different feed:
52
+
## How It Works
63
53
64
-
```bash
65
-
# View available feeds
66
-
duckdb -c "SELECT * FROM read_csv_auto('seeds/available_feeds.csv')"
67
-
68
-
# Run dbt with specific feeds (one variable per feed type)
| feed_timestamp | timestamp | When the feed was fetched |
222
156
| header_text | string | Alert title |
223
157
| description_text | string | Alert details |
@@ -226,8 +160,8 @@ duckdb workshop.duckdb
226
160
227
161
## Need Help?
228
162
229
-
See [docs/troubleshooting.md](docs/troubleshooting.md) for common issues and solutions.
163
+
See [docs/troubleshooting.md](docs/troubleshooting.md) for common issues and solutions, or [open an issue](https://github.com/JarvusInnovations/gtfsrt-sandbox/issues) if you're stuck.
230
164
231
165
## License
232
166
233
-
Data sourced from public GTFS-RT feeds. Workshop materials are MIT licensed.
167
+
Data sourced from public GTFS-RT feeds. Sandbox materials are MIT licensed.
0 commit comments