Build a full MLOps pipeline to forecast road traffic congestion and analyze traffic incidents using live data from the TomTom API and weather conditions from OpenWeather.
This repository includes:
- 📥 Live data collection from APIs (TomTom + OpenWeather)
- 🧼 Data cleaning and preparation
- 🤖 Model training for multiple incident-related targets
- 📈 Evaluation and metrics logging
- 📦 Model deployment via FastAPI
- 🔑 Authentication & Authorization with JWT tokens (role-based access: admin, user, guest)
- 📊 Strategy comparison
- 📊 Drift monitoring (optional)
- 🔁 Versioning with DVC
.
├── main.py # Main script entrypoint
├── requirements.txt # All required packages
├── pytest.ini # Pytest config
├── README.md # This file
├── .env # API keys (not tracked)
│
├── data/ # All data (raw, processed, live)
│ ├── raw/
│ ├── processed/
│ └── live/
│
├── model/ # Trained models (.joblib)
├── metrics/ # Output metrics
├── logs/ # Log files
├── eda/ # Exploratory Data Analysis
│
├── docker/ # (Optional) Docker structure
│
├── src/ # Source code for the entire pipeline and API
│ ├── auth/ # JWT-based authentication & role-based authorization
│ ├── core/ # Core logic: API service, constants, centralized logging
│ ├── data/ # Data acquisition, synchronization and preprocessing workflows
│ ├── eda/ # Exploratory Data Analysis (EDA) scripts and visualizations
│ ├── model/ # Model training, evaluation and serialization
│ ├── monitoring/ # Drift detection and monitoring modules (optional)
│ └── utils/ # General-purpose helper functions shared across modules
│
└── tests/ # API and logic tests
# 1. Clone the repository
git clone <repo_url>
cd traffic_prediction
# 2. Create and activate virtual env
python -m venv venv
venv\Scripts\activate.bat # (Windows)
# or
source venv/bin/activate # (Linux/Mac)
# 3. Install dependencies
pip install -r requirements.txt
# 4. Add your API keys in a `.env` file
echo TOMTOM_API_KEY=your_tomtom_key >> .env
echo WEATHER_KEY=your_weather_key >> .env
# 5. Run full pipeline
python main.py- Incidents and traffic: TomTom Traffic API
- Weather: OpenWeatherMap
We focus on live data extraction instead of historical data for the following reasons:
- The historical
Traffic Stats APIfrom TomTom is paid and limited with partial data during the free trial period. - With live data, we can automate calls and build a consistent time-series dataset.
- Daily limits: 2500 calls (TomTom), 1000 calls (Weather).
We use a .csv file with bounding boxes (BBOX) of Paris arrondissements, applied fully or split.
- Multiple strategies can be tested and compared by changing configuration values.
- Models support different targets like:
jam_factorcongestion_labelincident_duration_min
- Results are stored in
metrics/metrics.csvand exported as visual plots.
The figure below illustrates the two possible strategies for data extraction:
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| Traffic Analysis | Record traffic at fixed points | Easy to set up, real-time, no need for incidents | Many empty calls, low incident yield or non realistic call number to detect |
| Incident Analysis | Get traffic where incidents occurred (via BBOX) | Targeted, efficient, good incident coverage | Misses pre-incident flow, no normal traffic context |
The API is secured with JWT tokens:
-
Admin: full access (metrics, prediction, monitoring)
-
User: limited access (prediction, test-token)
-
Guest: minimal access (healthcheck only)
-
Tokens are issued via the
/loginendpoint. -
Protected endpoints validate tokens with role-based dependencies.
-
Invalid or expired tokens return
401 Unauthorized. -
Access to forbidden routes returns
403 Forbidden.
Example:
# Obtain a token
curl -X POST http://localhost:8000/login \
-d "username=admin&password=adminpass" \
-H "Content-Type: application/x-www-form-urlencoded"
# Use token to call a protected endpoint
curl -X GET http://localhost:8000/metrics \
-H "Authorization: Bearer <token>"This repository uses DVC to track data and models important for the project.
- data/raw/ → raw
.csvfiles storing collected data for each Paris district - data/processed/ → processed features for training
- model/ → model artifacts
Install and initialize DVC with S3 support:
pip install "dvc[s3]"
dvc init
Configure the DagsHub remote:
dvc remote add -d origin s3://dvc/mateovillaarias/traffic_prediction
dvc remote modify origin endpointurl https://dagshub.com
Local-only credentials (never commit these)
export DAGSHUB_USER="your_username"
export DAGSHUB_TOKEN="your_personal_access_token"
dvc remote modify origin --local access_key_id $DAGSHUB_USER
dvc remote modify origin --local secret_access_key $DAGSHUB_TOKEN
Mandatory: once the remote is set, pull the latest data:
dvc pull
Collected live data is stored in data/live/ (per district .csv files).
- To synchronize
data/liveinto the main dataset indata/raw, run option 2 inmain.py:
python main.py
=== TRAFFIC LIVE DATA MENU ===
...
2. Sync live files into raw directory
...
Select an option: 2
Delete live files after sync? (y/n): n
- After synchronization, you can push data and models to DVC using option 8:
python main.py
=== TRAFFIC LIVE DATA MENU ===
...
8. Push data to DVC remote
...
Select an option: 8
DVC remote name, leave blank for default:
Git commit message, leave blank for default:
Select what to push [raw/processed/train/all] (default: raw): raw
-
This will:
- Commit changes (dvc commit, git add, git commit)
- Push to the configured DagsHub remote
-
Option 8 works with:
- raw → pushes data/raw
- processed → pushes data/processed
- train → pushes model/
- all → pushes all of the above
- Python 3.11
- FastAPI for the prediction service
- TomTom and OpenWeatherMap APIs
- Pandas, Scikit-learn, Joblib
- DVC for data and model versioning
- Dagshub for data storage and DVC integration with Github repo
- (Planned): uv, airflow, monitoring
Email: georges.nassopoulos@gmail.com, ingmatvillaa@gmail.com, elqounss.karim@gmail.com
