|
| 1 | +--- |
| 2 | +layout: single |
| 3 | +title: "Singapore Retail Sales Forecasting Project (Python, ML, Tableau) " |
| 4 | +date: 2025-6-03 |
| 5 | +category: analysis |
| 6 | +author_profile: true |
| 7 | +toc: true |
| 8 | +toc_label: "Table of Contents" |
| 9 | +toc_icon: "file" |
| 10 | +toc_sticky: true |
| 11 | +order: 5 |
| 12 | +#classes: wide |
| 13 | +--- |
| 14 | + |
| 15 | +Date Posted: 2025-06-03 |
| 16 | + |
| 17 | +Category: [Data Projects](https://meng-kiat.github.io/analysis/){: .btn .btn--info .btn--small} |
| 18 | + |
| 19 | +# Project Objectives |
| 20 | + |
| 21 | +This project aims to predict monthly retail sales (in million SGD) in Singapore using machine learning. The goal is to provide accurate and interpretable forecasts that could help businesses and policymakers anticipate market demand, allocate inventory more effectively, and make data-driven decisions. |
| 22 | + |
| 23 | +Specific objectives include: |
| 24 | +1. Feature Engineering: Exploring adding various time-step features and exogenous variables for machine learning model |
| 25 | +2. Prediction & Accuracy: Evaluate the efficacy of using machine learning in forecasting retail sales in SG, and create predictions along with visualizations for simple & effective presentation |
| 26 | +3. Visualization & Communication: Present insights and forecasts through dashboards using Tableau |
| 27 | + |
| 28 | +The full code can be found below: |
| 29 | + |
| 30 | +[View Notebook](){: .btn .btn--info .btn--small} |
| 31 | + |
| 32 | +# Dataset |
| 33 | + |
| 34 | +In this analysis, I used the [SingStat Table Builder](https://tablebuilder.singstat.gov.sg/table/TS/M601741) from Department of Statistics, Singapore (DOS) to retrieve 29 years of monthly retail sales data for Singapore from 1997 to 2025. |
| 35 | + |
| 36 | +## Preprocessing the Dataset |
| 37 | +### Outlier Analysis |
| 38 | + |
| 39 | +COVID-19 caused heavy drop in retail sales. We created an indicator feature ['is_covid'] for the machine learning algorithm to identify the COVID-19 period as outliers, ensuring that the model is not misled. |
| 40 | + |
| 41 | +{% highlight ruby %} |
| 42 | +{% endhighlight %} |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +### Feature Engineering |
| 47 | + |
| 48 | +Creating time-step features: 'month', 'quarter' and 'year'. |
| 49 | + |
| 50 | +{% highlight ruby %} |
| 51 | +{% endhighlight %} |
| 52 | + |
| 53 | +We also created lag features: 'sales_lag_1' and 'sales_lag_2' |
| 54 | + |
| 55 | +{% highlight ruby %} |
| 56 | +{% endhighlight %} |
| 57 | + |
| 58 | +# Model Building |
| 59 | + |
| 60 | +## Time Series Cross Validation |
| 61 | +{% highlight ruby %} |
| 62 | +{% endhighlight %} |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +{% highlight ruby %} |
| 67 | +{% endhighlight %} |
| 68 | + |
| 69 | +## XGBoost Regression |
| 70 | +We used the XGB Reg module to build our model. We used parameters |
| 71 | + |
| 72 | +# Overall Evaluation |
| 73 | + |
| 74 | + |
| 75 | +# Conclusion |
0 commit comments