Skip to content

jasonwidjajadev/sydney-bus-gtfs-api

Repository files navigation

Python Flask JWT Database Swagger License

Sydney Bus Network GTFS REST API

A production-style RESTful API that ingests real-world GTFS transit data from Transport for NSW and exposes it via a secure, role-based, documented backend service.

GTFS: https://gtfs.org/getting-started/what-is-GTFS/

Built with Flask-RESTX, JWT authentication, and SQLite, this system supports transit planners, commuters, and administrators with real-world data ingestion, querying, visualisation, and export capabilities.


High-Level Architecture

Architecture

Core components

  • Flask-RESTX API layer (Swagger-first design)
  • JWT-based authentication & role-based authorization
  • GTFS ingestion pipeline (ZIP → CSV → SQLite)
  • Read-optimized relational schema
  • Visualisation & export layer (PNG maps, CSV)

Setup & Running

Set up environment

  • python3.13 -m venv .venv
  • source .venv/bin/activate

Tech Stack

  • Python 3.13
  • Flask-RESTX — REST API + Swagger documentation
  • SQLite — lightweight runtime persistence
  • GTFS Schedule API (Transport for NSW)
  • RapidFuzz — fuzzy string matching
  • pandas / matplotlib — data processing & visualisation

Install dependencies

  • pip install -r requirements.txt
  • Create your own API key from Transport for NSW Open Data
    • transport_api_key.env: TRANSPORT_API_KEY=your_key_here
  • Run the API: python z5494973.py

API Documentation (Swagger)

  • Swagger UI available at root (/)
  • Fully interactive
  • Includes:
    • request/response schemas
    • role-based access notes
    • error cases
  • Always synchronized with implementation

API

Role-Based Access Control (RBAC)

The system bootstraps with three users on first run:

Username Password Role
admin admin Admin
planner planner Planner
commuter commuter Commuter
  • Admin
    • Full system access
    • User management: Create, delete, activate, and deactivate users
    • GTFS imports
  • Planner
    • Import and manage GTFS data
    • Full read/write access to transit data
  • Commuter
    • Read-only access to transit data
    • Manage personal favourites and visualisations

Authentication & Security

JWT Authentication

  • All protected endpoints require a JWT token
  • Tokens are issued via: POST /auth/login
  • Token must be sent in request headers: AUTH-TOKEN: <jwt_token>

Token Properties

  • Signed using HS256
  • Contains:
    • username
    • role (Admin | Planner | Commuter)
    • exp (1 hour expiry)
  • Token validity is checked per request
  • User existence and enabled status are re-validated against the database

GTFS Data Ingestion

Import Dataset

POST /gtfs/datasets/{dataset_id}

  • Imports GTFS Schedule data directly from Transport for NSW
  • Supports Sydney Metro bus agencies (GSBC*, SBSC*)
  • Parses .zip GTFS datasets into structured SQLite tables
  • Existing data is fully replaced on re-import
  • Enforces “data must be imported before use” guarantees

Stored GTFS Tables in sqlite

  • agency
  • routes
  • calendar
  • calendar_dates
  • trips
  • stops
  • stop_times
  • shapes
  • notes

Dataset Metadata

GET /gtfs/datasets Returns:

  • active dataset ID
  • agency name
  • import timestamp
  • per-table row counts

Core API Endpoints

Transit Data Exploration

All users can:

  • Retrieve routes, trips, and stops by ID
  • Browse:
    • routes for an agency
    • trips for a route
    • stops for a trip
  • Navigate large datasets using REST-friendly pagination patterns

Routes

GET /routes GET /routes/{route_id}

  • Paginated route listing
  • Supports large datasets safely
  • Deterministic ordering

Trips

GET /trips?route_id=... GET /trips/{trip_id}

  • Lists all trips for a route
  • Pagination supported
  • Validates route existence

Stops

GET /stops?trip_id=... GET /stops/{stop_id}

  • Stops are returned in correct sequence order
  • Joined with stop metadata (lat/lon, accessibility)

Intelligent Stop Search

GET /stops/search?query=...

  • Case-insensitive and partial text matching

  • Partial matching ("Quay", "circula")

  • Powered by RapidFuzz for fast, fuzzy search

  • Configurable:

    • match threshold
    • result limits
    • trips per stop
  • Returns:

    • matching stops
    • associated routes and trips
  • Handles zero-result queries gracefully


Favourite Routes

Each user may store up to 2 favourite routes.

  • Each user can manage their own favourites
  • Supports add / update / delete operations

Manage Favourites

  • PUT /favourites/routes/{route_id}
  • DELETE /favourites/routes/{route_id}
  • POST /favourites/routes
  • GET /favourites/routes

Design Notes

  • Route data is denormalised
  • Favourites survive dataset re-imports
  • Enforced limits at API + DB level

Route Visualisation (PNG)

GET /favourites/routes/maps

Behaviour

  • Generates a PNG map (no file writes) for favourite routes
  • Distinct colours per route + headsign
  • Uses real GTFS shapes.txt geometry
  • Rendered server-side using Matplotlib

Response

  • Inline browser-renderable image
  • Includes metadata headers:
    • trip count
    • shape count
    • headsign count
    • route IDs & names

CSV Export

GET /favourites/routes/csv

Export Includes

  • Dataset ID
  • Route metadata
  • Trip headsigns
  • Shape geometry (lat/lon sequence)
  • Sorted deterministically for downstream analysis

Designed for:

  • GIS tools
  • Data science pipelines
  • External analytics

Pagination Strategy

  • Default page size with hard max cap
  • Consistent pagination metadata: page, total, total_pages, has_prev / has_next
  • Prevents accidental large scans

Reflection

This system mirrors real backend systems used in transport platforms, data infrastructure, and public-sector APIs:

  • Secure Restful API design (JWT + RBAC)
    • Clean, resource-oriented endpoint design
    • Correct HTTP verbs and status codes
    • Clear separation of concerns
  • Real-world data ingestion pipelines
  • Designed to scale to large datasets
  • Defensive API engineering (input validation and error handling)
  • Non-trivial visualisation
  • Production-quality documentation
  • Automated tests validation of endpoints: z5494973_tests.py
  • Further works includes adding rate limiting and a recommendation system for when best to leave

MIT License

For demonstration and educational purposes only

About

Production-style REST API for real-world data ingestion, serving GTFS transit data with JWT authentication, role-based access control, stop search, route visualisation, and CSV exports.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages