Academic Project | Principles of Big Data Management Course
An automated data pipeline for analyzing Kansas City 311 service requests
This project implements an end-to-end data engineering pipeline that extracts, transforms, and loads (ETL) civic service request data from Kansas City's Open Data Portal into a cloud data warehouse for analysis and visualization.
Key Objectives:
- Demonstrate proficiency in cloud-based data engineering
- Automate data ingestion using CI/CD practices
- Enable real-time municipal service analytics
┌─────────────────────────────────────────────────────────────────────────────┐
│ KC 311 DATA PIPELINE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ KC Open │ │ Python/ │ │ Google │ │
│ │ Data API │ ───▶ │ Pandas │ ───▶ │ BigQuery │ │
│ │ (Socrata) │ │ ETL │ │ (DWH) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ GitHub │ │ Tableau │ │
│ │ Actions │ │ Public │ │
│ │ (Scheduler)│ │ (Viz) │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Feature | Description |
|---|---|
| 🔄 Automated Ingestion | GitHub Actions runs the pipeline on a configurable schedule (weekly by default) |
| 🧹 Data Quality | Automated deduplication prevents duplicate records |
| ☁️ Cloud-Native | Leverages Google BigQuery for scalable analytics |
| 📊 Visual Analytics | Interactive Tableau dashboard for insights |
| 🔐 Secure Credentials | GCP service account keys managed via GitHub Secrets |
The interactive Tableau dashboard provides two views:
🔗 View Live Dashboard on Tableau Public
| Layer | Technology |
|---|---|
| Data Source | Kansas City Open Data (Socrata API) |
| Processing | Python 3.9+, Pandas |
| Data Warehouse | Google BigQuery |
| Orchestration | GitHub Actions |
| Visualization | Tableau Public |
- Python 3.9 or higher
- Google Cloud Platform account with BigQuery enabled
- GCP Service Account with BigQuery permissions
-
Clone the repository
git clone https://github.com/YOUR_USERNAME/311-KC-Dashboard.git cd 311-KC-Dashboard -
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
cp .env.example .env # Edit .env with your GCP project details -
Add your GCP credentials
- Download your service account JSON key from GCP Console
- Save it as
gbq_key.jsonin the project root ⚠️ Never commit this file (it's in.gitignore)
cd src
python pipeline.py311-KC-Dashboard/
├── README.md # This file
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── .gitignore # Git ignore rules
│
├── src/ # Source code
│ ├── __init__.py
│ ├── config.py # Configuration loader
│ └── pipeline.py # Main ETL script
│
├── docs/ # Documentation & visuals
│ ├── architecture.png # Architecture diagram
│ └── dashboard_preview.png
│
└── .github/
└── workflows/
└── hourly_scheduler.yml # GitHub Actions workflow
The pipeline runs automatically via GitHub Actions. To configure:
-
Add Repository Secret
- Go to: Settings → Secrets → Actions
- Add secret named
GCP_SA_KEYcontaining your service account JSON
-
Adjust Schedule (optional)
- Edit
.github/workflows/hourly_scheduler.yml - Modify the cron expression as needed
- Edit
| Column | Type | Description |
|---|---|---|
issue_id |
STRING | Unique identifier for each request |
current_status |
STRING | Status (open, closed, etc.) |
category |
STRING | Service category |
open_date_time_ymd |
DATE | Request creation date |
resolved_date_ymd |
DATE | Resolution date (if applicable) |
ingest_timestamp |
INTEGER | Pipeline run timestamp |
This project is for educational purposes as part of academic coursework.


