The aim of this full-stack project is to predict and visualize crowdedness for 1 week ahead in 3 metro stations of Amsterdam: Centraal Station, Station Zuid and Station Bijlmer ArenA. Except for the number of check-ins & check-outs for each station, external factors are considered such as weather, events, holidays, vacations and COVID-19 pandemic.
The project consists of the following components:
instagram-event-scraper→ scraper for events from Instagram using instagram's public URLsticketmaster-event-fetcher→ fetcher for events from Ticketmaster APImodel→ back-end and front-end for making predictionsdata_utils.py→ helper functions for data manipulation and loggingmodel_utils.py→ functions for model pipelinepredictions.ipynb→ notebook for running model pipelinepredictions_server.py→ Flask server for running model pipelineUI→ front-end for running model pipeline
- Read and preprocess data
- Merge data of external factors (e.g. weather) with check-ins & check-outs per hour
- Interpolate missing check-ins & check-outs by using Random Forest algorithm
- Split dataset into training, validation and test set
- Create a separate Random Forest model for each of the 3 metro stations
- Train each model with historical data (X)
- Predict the check-ins & check-outs for each hour for 1 week ahead (Y)
- Python 3.7+
- All the libraries included in
requirements.txt
- Run
pip install -r requirements.txt - Datasets for check-ins & check-outs (
model/data/gvb/&model/data/gvb-herkomst/), ** weather** (model/data/knmi/) and events (model/data/events/) are expected to be inmodel/as per this directory structure:
model
└───data
└───gvb
│ └───<year>
│ │ └───<month_number>
│ │ │ └───<day_number>
│ │ │ │ <csv_or_json.gz>
│ │ │ │ ...
│ │ │
│ │ └───...
│ └───...
└───gvb-herkomst
│ └───<year>
│ │ └───<month_number>
│ │ │ └───<day_number>
│ │ │ │ <csv_or_json.gz>
│ │ │ │ ...
│ │ │
│ │ └───...
│ └───...
└───knmi
│ └───knmi
│ │ └───<year>
│ │ │ └───<month_number>
│ │ │ │ └───<day_number>
│ │ │ │ │ <json>
│ │ │ │ │ ...
│ │ │ │
│ │ │ └───...
│ │ └───...
│ └───knmi-observations
│ └───<year>
│ │ └───<month_number>
│ │ │ └───<day_number>
│ │ │ │ <json>
│ │ │ │ ...
│ │ │
│ │ └───...
│ └───...
└───events
│ events_zuidoost.xlsx
│
└───instagram
│ │ <csv>
│ │ ...
│
└───ticketmaster
│ <csv>
│ ...
- WARNING: For the model to produce valid predictions, check-ins & check-outs (
model/data/gvb/&model/data/gvb-herkomst/) and weather data (model/data/knmi/) should be manually up-to-date
- Modify
usernamesarray inscraper.pyto include the usernames of the accounts which you want to be scraped - Go to
instagram-event-scraper/and runpython scraper.py - After execution,
instagram-event-scraper/events.csvwill be updated with the scraped events
- Create
ticketmaster-event-fetcher/config.pycontainingapi_key=EXAMPLEwhereEXAMPLEis a placeholder for your Ticketmaster API key - Modify
year_to_fetchvariable infetcher.pyto fetch events for the year of your choice - Go to
ticketmaster-event-fetcher/and runpython fetcher.py - After execution, a file with format
ticketmaster-event-fetcher/events_amsterdam_center_DATE_TIME_UTC.csvwill be created with the fetched events
- Using
model/predictions.ipynb:- Modify
config.inifor the model to use the feature configuration of your choice - Run
model/predictions.ipynb - See below bullet point "After execution"
- Modify
- Using front-end and back-end server:
- Go to
model/, runpython predictions_server.pyand wait for the server output to show "Preprocessing finished" and be up - Go to
model/UI/, runpython test.pyand wait for the front-end server to be up - Open the URL of the front-end server on a browser
- Choose your desired parameters for the model and press "Submit"
- After execution
- If you press to any of the 3 available metro stations in the map, the graph should be updated with the current predictions
- Each station's folder in
model/output/will be updated with a new file with formatprediction_next_week_CURRENT-DATE.csvwhich will contain the current predictions - NOTE: Only if you ran the model using
model/predictions.ipynbnotebook, thenmodel/output/models_log.csvwill be updated with the model's parameters and metrics
- Go to