Skip to content

Commit acf4fd8

Browse files
Merge pull request #14 from JarvusInnovations/themightychris/dbt-inventory-model
Add stg_available_feeds model for inventory.json
2 parents 236f398 + d3618d7 commit acf4fd8

File tree

3 files changed

+60
-0
lines changed

3 files changed

+60
-0
lines changed

models/staging/_staging.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,32 @@ models:
9090
- name: effect
9191
description: Alert effect code
9292
- *_uuid
93+
94+
- name: stg_available_feeds
95+
description: >
96+
Feed inventory from parquet.gtfsrt.io. Lists all available GTFS-RT
97+
feeds with metadata including agency info, feed type, date range,
98+
and size statistics. Reads directly from remote JSON.
99+
columns:
100+
- name: feed_url
101+
description: Original GTFS-RT feed URL
102+
- name: feed_base64
103+
description: Base64url-encoded feed URL (used in parquet file paths)
104+
- name: agency_id
105+
description: Transit agency identifier (e.g., "septa", "actransit")
106+
- name: agency_name
107+
description: Transit agency display name
108+
- name: feed_type
109+
description: Feed type - service_alerts, trip_updates, or vehicle_positions
110+
- name: date_min
111+
description: Earliest date with data available
112+
- name: date_max
113+
description: Latest date with data available
114+
- name: total_records
115+
description: Total number of records across all dates
116+
- name: total_bytes
117+
description: Total size of all parquet files in bytes
118+
- name: days_available
119+
description: Number of days with data (computed)
120+
- name: avg_bytes_per_day
121+
description: Average bytes per day (computed)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{{
2+
config(
3+
materialized='table'
4+
)
5+
}}
6+
7+
/*
8+
Staging model for parquet.gtfsrt.io feed inventory.
9+
Reads directly from the public inventory JSON file.
10+
11+
This provides metadata about all available GTFS-RT feeds
12+
including agency info, feed types, date ranges, and sizes.
13+
*/
14+
15+
SELECT
16+
url AS feed_url,
17+
base64url AS feed_base64,
18+
agency_id,
19+
agency_name,
20+
feed_type,
21+
date_min,
22+
date_max,
23+
total_records,
24+
total_bytes,
25+
26+
-- Computed fields
27+
date_max::date - date_min::date + 1 AS days_available,
28+
total_bytes / NULLIF(date_max::date - date_min::date + 1, 0) AS avg_bytes_per_day
29+
30+
FROM read_json_auto('gs://parquet.gtfsrt.io/inventory.json')

profiles.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,6 @@ gtfsrt_sandbox:
66
path: sandbox.duckdb
77
extensions:
88
- parquet
9+
- httpfs
910
settings:
1011
threads: 4

0 commit comments

Comments
 (0)