|
| 1 | +# RetailForecast: Production-Ready Sales Forecasting with ZenML and Prophet |
| 2 | + |
| 3 | +A robust MLOps pipeline for retail sales forecasting designed for retail data scientists and ML engineers. |
| 4 | + |
| 5 | +## 📊 Business Context |
| 6 | + |
| 7 | +In retail, accurate demand forecasting is critical for optimizing inventory, staff scheduling, and financial planning. This project provides a production-ready sales forecasting solution that can be immediately deployed in retail environments to: |
| 8 | + |
| 9 | +- Predict future sales volumes across multiple stores and products |
| 10 | +- Capture seasonal patterns and trends in customer purchasing behavior |
| 11 | +- Support data-driven inventory management and purchasing decisions |
| 12 | +- Provide actionable insights through visual forecasting dashboards |
| 13 | + |
| 14 | +<div align="center"> |
| 15 | + <br/> |
| 16 | + <img alt="Forecast Dashboard" src="assets/forecast_dashboard.png" width="70%"> |
| 17 | + <br/> |
| 18 | + <p><em>HTML dashboard visualization showing forecasts with uncertainty intervals</em></p> |
| 19 | +</div> |
| 20 | + |
| 21 | +## 🔍 Data Overview |
| 22 | + |
| 23 | +The pipeline works with time-series retail sales data structured as follows: |
| 24 | + |
| 25 | +| Field | Description | |
| 26 | +|-------|-------------| |
| 27 | +| date | Date of sales record (YYYY-MM-DD) | |
| 28 | +| store | Store identifier (e.g., Store_1, Store_2) | |
| 29 | +| item | Product identifier (e.g., Item_A, Item_B) | |
| 30 | +| sales | Number of units sold | |
| 31 | +| price | Unit price | |
| 32 | + |
| 33 | +The system automatically handles: |
| 34 | +- Multiple store/item combinations as separate time series |
| 35 | +- Train/test splitting for model validation |
| 36 | +- Proper data transformations required by Prophet |
| 37 | +- Missing value imputation and outlier detection |
| 38 | + |
| 39 | +<div align="center"> |
| 40 | + <br/> |
| 41 | + <img alt="Data Visualization" src="assets/data_visualization.gif" width="70%"> |
| 42 | + <br/> |
| 43 | + <p><em>Interactive visualization of historical sales patterns</em></p> |
| 44 | +</div> |
| 45 | + |
| 46 | +## 🚀 Pipeline Architecture |
| 47 | + |
| 48 | +The project includes two primary pipelines: |
| 49 | + |
| 50 | +### 1. Training Pipeline |
| 51 | + |
| 52 | +The training pipeline performs the following steps: |
| 53 | + |
| 54 | +1. **Data Loading**: Imports historical sales data from CSV files |
| 55 | +2. **Data Preprocessing**: |
| 56 | + - Transforms data into Prophet-compatible format |
| 57 | + - Creates separate time series for each store-item combination |
| 58 | + - Performs train/test splitting based on configurable ratio |
| 59 | +3. **Model Training**: |
| 60 | + - Trains multiple Facebook Prophet models simultaneously, one for each store-item combination |
| 61 | + - Configures seasonality parameters based on domain knowledge |
| 62 | + - Handles price changes as regressors when available |
| 63 | +4. **Model Evaluation**: |
| 64 | + - Calculates MAPE, RMSE, and MAE metrics on test data |
| 65 | + - Generates visual diagnostics for model performance |
| 66 | +5. **Forecasting**: |
| 67 | + - Produces forecasts with uncertainty intervals |
| 68 | + - Creates interactive HTML visualizations |
| 69 | + |
| 70 | +<div align="center"> |
| 71 | + <br/> |
| 72 | + <img alt="Training Pipeline DAG" src="assets/training_pipeline.png" width="70%"> |
| 73 | + <br/> |
| 74 | + <p><em>ZenML visualization of the training pipeline DAG</em></p> |
| 75 | +</div> |
| 76 | + |
| 77 | +### 2. Inference Pipeline |
| 78 | + |
| 79 | +The inference pipeline enables fast forecasting with pre-trained models: |
| 80 | + |
| 81 | +1. **Data Loading**: Imports the most recent sales data |
| 82 | +2. **Data Preprocessing**: Transforms data into Prophet format |
| 83 | +3. **Forecasting**: Generates predictions using production models |
| 84 | +4. **Visualization**: Creates interactive dashboards with forecasts |
| 85 | + |
| 86 | +<div align="center"> |
| 87 | + <br/> |
| 88 | + <img alt="Inference Pipeline DAG" src="assets/inference_pipeline.png" width="70%"> |
| 89 | + <br/> |
| 90 | + <p><em>ZenML visualization of the inference pipeline DAG</em></p> |
| 91 | +</div> |
| 92 | + |
| 93 | +## 📈 Model Details |
| 94 | + |
| 95 | +The forecasting solution uses [Facebook Prophet](https://github.com/facebook/prophet), chosen specifically for its combination of accuracy and simplicity in retail forecasting scenarios: |
| 96 | + |
| 97 | +- **Multiple Models Approach**: Rather than a one-size-fits-all model, we generate individual Prophet models for each store-item combination, allowing forecasts that capture the unique patterns of each product in each location |
| 98 | +- **Components**: Prophet automatically decomposes time series into trend, seasonality, and holidays |
| 99 | +- **Seasonality**: Captures weekly, monthly, and yearly patterns in sales data |
| 100 | +- **Special Events**: Handles holidays and promotions as custom seasonality effects |
| 101 | +- **Uncertainty Estimation**: Provides prediction intervals for better inventory planning |
| 102 | +- **Extensibility**: Supports additional regressors like price and marketing spend |
| 103 | + |
| 104 | +Prophet was selected for this solution because it excels at: |
| 105 | +- Handling missing data and outliers common in retail sales data |
| 106 | +- Automatically detecting seasonal patterns without extensive feature engineering |
| 107 | +- Providing intuitive parameters that business users can understand |
| 108 | +- Scaling to thousands of individual time series efficiently |
| 109 | + |
| 110 | + |
| 111 | +## 💻 Technical Implementation |
| 112 | + |
| 113 | +The project leverages ZenML's MLOps framework to provide: |
| 114 | + |
| 115 | +- **Model Versioning**: Track all model versions and their performance metrics |
| 116 | +- **Reproducibility**: All experiments are fully reproducible with tracked parameters |
| 117 | +- **Pipeline Caching**: Speed up experimentation with intelligent caching of pipeline steps |
| 118 | +- **Artifact Tracking**: All data and models are properly versioned and stored |
| 119 | +- **Deployment Ready**: Models can be directly deployed to production environments |
| 120 | + |
| 121 | +A key innovation in this project is the custom ProphetMaterializer that enables serialization/deserialization of Prophet models for ZenML artifact storage. |
| 122 | + |
| 123 | +<div align="center"> |
| 124 | + <br/> |
| 125 | + <img alt="ZenML Dashboard" src="assets/zenml_dashboard.png" width="70%"> |
| 126 | + <br/> |
| 127 | + <p><em>ZenML model registry tracking model versions and performance</em></p> |
| 128 | +</div> |
| 129 | + |
| 130 | +## 🛠️ Getting Started |
| 131 | + |
| 132 | +### Prerequisites |
| 133 | + |
| 134 | +- Python 3.9+ |
| 135 | +- ZenML installed and configured |
| 136 | + |
| 137 | +### Installation |
| 138 | + |
| 139 | +```bash |
| 140 | +# Clone the repository |
| 141 | +git clone https://github.com/zenml-io/zenml-projects.git |
| 142 | +cd zenml-projects/retail-forecast |
| 143 | + |
| 144 | +# Install dependencies |
| 145 | +pip install -r requirements.txt |
| 146 | + |
| 147 | +# Initialize ZenML (if needed) |
| 148 | +zenml init |
| 149 | +``` |
| 150 | + |
| 151 | +### Running the Pipelines |
| 152 | + |
| 153 | +To train models and generate forecasts: |
| 154 | + |
| 155 | +```bash |
| 156 | +# Run the training pipeline (default) |
| 157 | +python run.py |
| 158 | + |
| 159 | +# Run with custom parameters |
| 160 | +python run.py --forecast-periods 60 --test-size 0.3 --weekly-seasonality True |
| 161 | +``` |
| 162 | + |
| 163 | +To make predictions using existing models: |
| 164 | + |
| 165 | +```bash |
| 166 | +# Run the inference pipeline |
| 167 | +python run.py --inference |
| 168 | +``` |
| 169 | + |
| 170 | +### Viewing Results |
| 171 | + |
| 172 | +Start the ZenML dashboard: |
| 173 | + |
| 174 | +```bash |
| 175 | +zenml login |
| 176 | +``` |
| 177 | + |
| 178 | +Navigate to the dashboard to explore: |
| 179 | +- Pipeline runs and their status |
| 180 | +- Model performance metrics |
| 181 | +- Interactive forecast visualizations |
| 182 | +- Version history of all models |
| 183 | + |
| 184 | +## 🔄 Integration with Retail Systems |
| 185 | + |
| 186 | +This solution can be integrated with existing retail systems: |
| 187 | + |
| 188 | +- **Inventory Management**: Connect forecasts to automatic reordering systems |
| 189 | +- **ERP Systems**: Feed forecasts into financial planning modules |
| 190 | +- **BI Dashboards**: Export forecasts to Tableau, Power BI, or similar tools |
| 191 | +- **Supply Chain**: Share forecasts with suppliers via API endpoints |
| 192 | + |
| 193 | +## 📊 Example Use Case: Store-Level Demand Planning |
| 194 | + |
| 195 | +A retail chain with 50 stores and 500 products uses this pipeline to: |
| 196 | + |
| 197 | +1. Train models on 2 years of historical sales data |
| 198 | +2. Generate daily forecasts for the next 30 days for each store-item combination |
| 199 | +3. Aggregate forecasts to support central purchasing decisions |
| 200 | +4. Update models weekly with new sales data |
| 201 | + |
| 202 | +The result: 15% reduction in stockouts and 20% decrease in excess inventory. |
| 203 | + |
| 204 | + |
| 205 | +## 📄 License |
| 206 | + |
| 207 | +This project is licensed under the Apache License 2.0. |
0 commit comments