This repository contains a data science project for the Inter-University Datathon 2025, focusing on predicting ski resort visitor numbers using historical visitation data and climate factors.
The project aims to forecast weekly visitor numbers for Australian ski resorts for the 2026 ski season using:
- Historical visitation data (2021-2025) across multiple ski resorts
- Climate data including temperature, rainfall/snowfall, freeze days, and sub-zero nights
- Machine learning techniques including time series forecasting with Prophet, DARTS, and XGBoost
Datathon2025.ipynb— Main analysis notebook featuring data preprocessing, feature engineering, Prophet time series forecasting, and visitor prediction modelsEDA.ipynb— Exploratory data analysis with comprehensive visualizations and statistical insightsdata/2025 Allianz Datathon Dataset.xlsx— Primary dataset containing visitation and climate data
notebooks/Copy_of_DS3_EDA.ipynb— Educational EDA notebook from workshop seriesworkshops/DataSoc_2025_Interuni_Workshop_DARTS— DARTS (Deep AR Time Series) forecasting techniquesworkshops/DataSoc_2025_Interuni_Workshop_XGBoost— XGBoost ensemble learning methodsdocs/— Competition case brief, information packs, and workshop slides
- Workshop 1 recording: Workshop 1 recording
- Workshop 2 recording: Workshop 2 recording
- Data Loading — Import visitation and climate data from Excel sheets
- Data Cleaning — Handle missing values, outliers, and data type conversions
- Feature Engineering — Create climate-based indicators (freeze days, temperature comfort scores)
- Resort Mapping — Match climate stations to ski resort locations
- Time Series Preparation — Structure data for weekly forecasting
- Prophet — Facebook's time series forecasting tool for seasonal visitor patterns
- DARTS — Deep learning-based time series forecasting
- XGBoost — Gradient boosting for capturing complex feature interactions
- Snow Reliability Score (SRS) — Composite metric combining snowfall, freeze days, and thaw risk
- Weather Comfort Score (WCS) — Temperature-based visitor comfort index
- Historical Patterns — Seasonal trends and year-over-year visitor growth
The analysis reveals strong correlations between climate factors and ski resort visitation, with models achieving robust forecasting performance for the 2026 ski season across multiple Australian ski resorts.
- Start with the dataset (
data/2025 Allianz Datathon Dataset.xlsx) and case reveal document to understand the problem scope - Explore the data using
EDA.ipynbfor comprehensive visualizations and statistical insights - Run the main analysis with
Datathon2025.ipynbto see the complete modeling pipeline - Learn techniques from workshop materials covering DARTS and XGBoost implementations
- Reference documentation in
docs/for competition guidelines and methodology
This project was developed for the Inter-University Datathon 2025, sponsored by Allianz, focusing on practical applications of data science in the tourism and recreation industry.
Repository maintained by: keyurgohel1