Skip to content
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: docs

on:
push:
branches:
- main
pull_request:
branches:
- main

permissions:
pages: write
id-token: write

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.13"
- uses: astral-sh/setup-uv@v5
- run: uv sync --group docs
- run: uv run mkdocs build
- if: github.ref == 'refs/heads/main'
uses: actions/upload-pages-artifact@v3
with:
path: site

deploy:
if: github.ref == 'refs/heads/main'
needs: build
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- id: deployment
uses: actions/deploy-pages@v4
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ __pycache__
config/local.json
.envrc

# docs
site

# logs
gobble.log
s3_upload.log
5 changes: 5 additions & 0 deletions docs/api/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# config

Configuration module for Gobble.

::: config
7 changes: 7 additions & 0 deletions docs/api/constants.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# constants

Constants for MBTA route and stop definitions.

::: constants
options:
show_source: false
5 changes: 5 additions & 0 deletions docs/api/disk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# disk

Disk I/O operations for writing transit events to CSV files.

::: disk
5 changes: 5 additions & 0 deletions docs/api/event.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# event

Event processing logic for MBTA real-time vehicle updates.

::: event
5 changes: 5 additions & 0 deletions docs/api/gobble.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# gobble

Main entry point for the Gobble streaming service.

::: gobble
5 changes: 5 additions & 0 deletions docs/api/gtfs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# gtfs

GTFS archive management and schedule data processing.

::: gtfs
5 changes: 5 additions & 0 deletions docs/api/logger.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# logger

Logging configuration for Gobble.

::: logger
5 changes: 5 additions & 0 deletions docs/api/s3_upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# s3_upload

S3 upload functionality for syncing event data to AWS.

::: s3_upload
5 changes: 5 additions & 0 deletions docs/api/timing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# timing

Performance timing utilities for debugging and profiling.

::: timing
5 changes: 5 additions & 0 deletions docs/api/trip_state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# trip_state

Trip state management for tracking vehicle positions across events.

::: trip_state
5 changes: 5 additions & 0 deletions docs/api/util.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# util

Utility functions for date/time handling and path generation.

::: util
83 changes: 83 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Architecture

## Overview

Gobble is a multi-threaded Python service that streams real-time vehicle data from the MBTA, detects meaningful transit events, and writes enriched data to disk for upload to S3.

```
MBTA V3 Streaming API
┌─────────────┐
│ gobble.py │ Main entry point — spawns threads per mode
└──────┬───────┘
│ SSE streams (one per thread)
┌─────────────┐
│ event.py │ Detects arrivals/departures, enriches with GTFS
└──────┬───────┘
┌────┴────┐
▼ ▼
┌────────┐ ┌──────────┐
│ disk.py│ │ gtfs.py │ Writes CSVs / Manages GTFS schedule data
└────┬───┘ └──────────┘
┌────────────┐
│s3_upload.py│ Cron-triggered upload to S3 (every 30 min)
└────────────┘
```

## Threading model

Gobble spawns one thread per transit mode group:

| Thread | Routes |
|--------|--------|
| `rapid_routes` | All rapid transit lines (Red, Blue, Orange, Green-B/C/D/E, Mattapan) |
| `cr_routes` | All commuter rail lines |
| `routes_bus_chunk0`, `routes_bus_chunk10`, ... | Bus routes in chunks of 10 (MBTA API limitation) |

Each thread runs `client_thread()`, which maintains a persistent SSE connection that automatically reconnects on failure.

A separate daemon thread (`update_gtfs`) runs in the background to refresh GTFS schedule data when the service date changes (around 3 AM Eastern).

## Event detection

Gobble detects two types of events by comparing successive vehicle updates:

- **Arrival (ARR)**: The vehicle's status changes to `STOPPED_AT` and the previous event was a departure (`DEP`).
- **Departure (DEP)**: The vehicle's stop ID changes and the stop sequence has advanced.

Each event processing thread maintains its own `TripsStateManager` to track per-trip state. State is persisted to JSON files in `data/trip_states/` so it survives restarts.

## GTFS enrichment

After detecting an event, Gobble enriches it with schedule data:

1. Looks up the nearest scheduled arrival at the same stop using `merge_asof` (time-based matching)
2. Calculates **scheduled headway** — time since the previous scheduled arrival at this stop
3. Matches the real-time trip to a scheduled trip and calculates **scheduled travel time** — time since the trip's first stop

GTFS archives are automatically downloaded from the [MBTA CDN](https://cdn.mbta.com/archive/) and cached locally in `data/gtfs_archives/`.

## Service date concept

The MBTA defines a "service date" as running from 3:00 AM to 2:59 AM the next day, rather than midnight to midnight. Gobble uses this convention throughout — events between midnight and 3 AM are attributed to the previous day's service.

## S3 upload

`s3_upload.py` runs as a separate process (triggered by cron every 30 minutes in production). It:

1. Finds all CSV files for today's service date
2. Compresses each file with gzip
3. Uploads to the `tm-mbta-performance` S3 bucket under the `Events-live/` prefix

## Monitoring

When `DATADOG_TRACE_ENABLED` is set to `true`, Gobble reports:

- APM traces for key functions (via `@tracer.wrap()` decorators)
- JSON-formatted logs for Datadog log aggregation
- Exception details with stack traces
55 changes: 55 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Configuration

Gobble is configured via a JSON file at `config/local.json`. A template is provided at `config/template.json`.

## Setup

```bash
cp config/template.json config/local.json
```

Then edit `config/local.json` with your values.

## Reference

```json
{
"mbta": {
"v3_api_key": null
},
"gtfs": {
"dir": null,
"refresh_interval_days": 7
},
"modes": ["rapid", "cr", "bus"],
"DATADOG_TRACE_ENABLED": false
}
```

### `mbta.v3_api_key`

**Required.** Your MBTA V3 API key. Get one at [api-v3.mbta.com](https://api-v3.mbta.com/).

### `gtfs.dir`

Optional override for the GTFS archives storage directory. Defaults to `data/gtfs_archives/` when `null`.

### `gtfs.refresh_interval_days`

How often (in days) to check for a newer GTFS archive. The MBTA publishes new GTFS feeds regularly as schedules change. Defaults to `7`.

### `modes`

Which transit modes to stream. Any combination of:

| Value | Description |
|-------|-------------|
| `"rapid"` | Rapid transit: Red, Blue, Orange, Green-B/C/D/E, Mattapan |
| `"cr"` | Commuter rail: all lines |
| `"bus"` | Bus: curated set of routes with monitored stops |

Useful for development — set `["rapid"]` to reduce API load and output volume.

### `DATADOG_TRACE_ENABLED`

Set to `true` to enable Datadog APM tracing and structured JSON logging. Should be `true` in production and `false` for local development.
90 changes: 90 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Contributing

Thank you for your interest in contributing to Gobble! This guide will help you get set up and familiar with the development workflow.

## Getting started

1. Fork and clone the repository
2. Install [uv](https://docs.astral.sh/uv/) and Python 3.13
3. Set up your environment:

```bash
uv venv --python 3.13
uv sync --group dev
uv run pre-commit install
```

4. Copy `config/template.json` to `config/local.json` and add your [MBTA V3 API key](https://api-v3.mbta.com/)

## Running locally

```bash
uv run src/gobble.py
```

!!! tip
Set `"modes": ["rapid"]` in your `config/local.json` to reduce API load and output volume during development.

## Running tests

```bash
uv run pytest
```

To run with coverage:

```bash
uv run coverage run -m pytest
uv run coverage report
```

## Linting and formatting

Gobble uses [Ruff](https://docs.astral.sh/ruff/) for both linting and formatting. Pre-commit hooks run these automatically, but you can also run them manually:

```bash
uv run ruff check --fix src
uv run ruff format src
```

## Code style

- All modules and public functions should have Google-style docstrings
- Type hints are used throughout
- Ruff handles formatting and lint rules

## Pull request workflow

1. Create a feature branch from `main`
2. Make your changes
3. Ensure tests pass (`uv run pytest`)
4. Ensure linting passes (`uv run ruff check src`)
5. Open a pull request against `main`
6. CI will run tests and linting automatically

## Building the docs

```bash
uv sync --group docs
uv run mkdocs serve
```

This starts a local server at `http://127.0.0.1:8000` with live reload.

## Project structure

```
src/
├── gobble.py # Main entry point, SSE client, threading
├── event.py # Event detection and enrichment
├── gtfs.py # GTFS archive management and schedule queries
├── trip_state.py # Trip state tracking and persistence
├── disk.py # CSV file writing
├── s3_upload.py # S3 upload
├── util.py # Date/time and path utilities
├── config.py # Configuration loading
├── constants.py # Route and stop definitions
├── logger.py # Logging setup
├── timing.py # Performance measurement
└── tests/ # Test suite
```
Loading
Loading