Skip to content

Commit 2925025

Browse files
committed
Create analysis-5.md
1 parent 62ea848 commit 2925025

File tree

1 file changed

+75
-0
lines changed

1 file changed

+75
-0
lines changed

docs/_analysis/analysis-5.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
layout: single
3+
title: "Singapore Retail Sales Forecasting Project (Python, ML, Tableau) "
4+
date: 2025-6-03
5+
category: analysis
6+
author_profile: true
7+
toc: true
8+
toc_label: "Table of Contents"
9+
toc_icon: "file"
10+
toc_sticky: true
11+
order: 5
12+
#classes: wide
13+
---
14+
15+
Date Posted: 2025-06-03
16+
17+
Category: [Data Projects](https://meng-kiat.github.io/analysis/){: .btn .btn--info .btn--small}
18+
19+
# Project Objectives
20+
21+
This project aims to predict monthly retail sales (in million SGD) in Singapore using machine learning. The goal is to provide accurate and interpretable forecasts that could help businesses and policymakers anticipate market demand, allocate inventory more effectively, and make data-driven decisions.
22+
23+
Specific objectives include:
24+
1. Feature Engineering: Exploring adding various time-step features and exogenous variables for machine learning model
25+
2. Prediction & Accuracy: Evaluate the efficacy of using machine learning in forecasting retail sales in SG, and create predictions along with visualizations for simple & effective presentation
26+
3. Visualization & Communication: Present insights and forecasts through dashboards using Tableau
27+
28+
The full code can be found below:
29+
30+
[View Notebook](){: .btn .btn--info .btn--small}
31+
32+
# Dataset
33+
34+
In this analysis, I used the [SingStat Table Builder](https://tablebuilder.singstat.gov.sg/table/TS/M601741) from Department of Statistics, Singapore (DOS) to retrieve 29 years of monthly retail sales data for Singapore from 1997 to 2025.
35+
36+
## Preprocessing the Dataset
37+
### Outlier Analysis
38+
39+
COVID-19 caused heavy drop in retail sales. We created an indicator feature ['is_covid'] for the machine learning algorithm to identify the COVID-19 period as outliers, ensuring that the model is not misled.
40+
41+
{% highlight ruby %}
42+
{% endhighlight %}
43+
44+
![](/assets/images/wisconsin/corrplot.png)
45+
46+
### Feature Engineering
47+
48+
Creating time-step features: 'month', 'quarter' and 'year'.
49+
50+
{% highlight ruby %}
51+
{% endhighlight %}
52+
53+
We also created lag features: 'sales_lag_1' and 'sales_lag_2'
54+
55+
{% highlight ruby %}
56+
{% endhighlight %}
57+
58+
# Model Building
59+
60+
## Time Series Cross Validation
61+
{% highlight ruby %}
62+
{% endhighlight %}
63+
64+
![](/assets/images/wisconsin/rf_parameter.png)
65+
66+
{% highlight ruby %}
67+
{% endhighlight %}
68+
69+
## XGBoost Regression
70+
We used the XGB Reg module to build our model. We used parameters
71+
72+
# Overall Evaluation
73+
![](/assets/images/wisconsin/accuracy.png)
74+
75+
# Conclusion

0 commit comments

Comments
 (0)