This project analyzes the weekly sales data of a UK-based online retailer. With rising inventory costs and customer demand for fast, reliable service, it's crucial to anticipate revenue fluctuations in advance.
By forecasting weekly total sales, this model helps the business:
- 📉 Minimize overstock and understock risks
- 🚚 Improve inventory and fulfillment efficiency
- 📊 Support strategic planning with data-driven insights
Can we accurately forecast weekly sales revenue based on historical sales patterns, calendar seasonality, and recent performance trends?
We treat this as a supervised regression problem, using XGBoost to forecast weekly revenue. The model incorporates both short-term memory (lags, rolling means) and long-term seasonal indicators (month, week of year, spike-week flags).
- Filtered for valid UK transactions (removed cancellations and returns)
- Ensured accurate parsing of timestamps and numeric fields
- Ranked top products by quantity sold
- Analyzed category-level revenue and sales
- Identified and investigated a strong December spike in 2011
- Time-based fields:
Month,Quarter,WeekOfYear - Lag and rolling statistics:
Sales_Lag_1,Sales_Lag_4,RollingMean_4,RollingSTD_4 - Seasonal flags:
IsDecember,IsQ4,IsSpikeWeek - Cyclical encodings:
Month_sin,Month_cos
- Trained
XGBoostRegressoron weekly sales data - Held out last 6 weeks for evaluation (time-aware split)
- MAE: £41,024
- RMSE: £64,712
- Captured the year-end spike trend, though slightly underestimated its magnitude
WeekOfYearwas the most important feature- Seasonal indicators provided useful signal
- Spike underprediction was attributed to unmodeled holiday events (e.g., promotions)
📉 The model captured trend and seasonality well, though it slightly underpredicted the final December spike.
Actual vs Predicted Weekly Sales (Final 6 Weeks)
- Handles non-linear relationships and feature interactions
- Supports calendar-based features, lags, and rolling statistics
- Interpretable via feature importance
| File | Description |
|---|---|
E-Commerce Sales Forecast.ipynb |
Full modeling notebook |
train_df_ready.csv / test_df_ready.csv |
Cleaned datasets (linked externally if large) |
forecast_plot.png |
Visualization of model predictions |
README.md |
This file |
- Add holiday and promotion flags (e.g., Black Friday, Cyber Monday)
- Test alternative models like Prophet or hybrid LSTM
- Extend forecasting horizon beyond 6 weeks
Shawn Waringu
Data Scientist & Analyst