This project predicts household electric power consumption using the UCI Individual Household Electric Power Consumption Dataset.
The goal is to accurately forecast Global Active Power based on features such as voltage, reactive power, current, and sub-meter readings.
This notebook demonstrates the full pipeline of a Machine Learning regression project:
- Data Loading β Imported directly from the UCI Machine Learning Repository
- Data Cleaning β Handled missing values, replaced
'?'with NaN, interpolated time-series data - Feature Engineering β Combined date and time columns, removed outliers, and converted all data to numeric types
- Model Training β Used Random Forest Regressor for prediction
- Model Evaluation β Evaluated using metrics like RΒ², MAE, MSE, and MAPE
- Visualization β Compared actual vs predicted power consumption using scatter plots and line charts
- Cleaned and preprocessed 1.9 million energy data points
- Used Random Forest for robust and high-accuracy regression
- Achieved RΒ² = 0.998 and Average Accuracy β 96.7%
- Generated performance visualizations:
- Actual vs Predicted Scatter Plot
- Residual Distribution
- Feature Importance Plot
Dataset Name: Individual Household Electric Power Consumption
Source: UCI Machine Learning Repository
Rows: ~2 million
Columns:
Global_active_powerβ Total active power consumed (Target)Global_reactive_powerβ Reactive powerVoltageβ Average voltageGlobal_intensityβ Average currentSub_metering_1,Sub_metering_2,Sub_metering_3β Energy consumption in different household areas
- Python 3
- Pandas, NumPy β Data manipulation
- Scikit-learn β Model training and evaluation
- Matplotlib, Seaborn β Visualization
- ucimlrepo β Fetch dataset from UCI repository
| Metric | Value |
|---|---|
| RΒ² Score | 0.998 |
| Mean Absolute Error (MAE) | 0.0176 |
| Mean Squared Error (MSE) | 0.00098 |
| Average Accuracy | 96.67% |
| Mean Absolute Percentage Error (MAPE) | 3.32% |
β These results show the model predicts energy usage with extremely high precision.
- Actual vs Predicted Scatter Plot
- Residual Distribution
- Feature Importance Bar Chart
These plots help visualize how close predictions are to actual values and which features influence power usage the most.
-
Clone this repository
git clone https://github.com/your-username/your-repo-name.git
-
Install required dependencies
pip install -
Open the notebook
jupyter notebook Electric_Power_Prediction.ipynb -
Run all cells sequentially to reproduce results.
The Random Forest model provides near-perfect predictions for household energy consumption. It can be extended for real-time power monitoring, energy efficiency analysis, or smart grid applications.
UCI Machine Learning Repository for the dataset
Scikit-learn Team for ML tools
Google Colab for providing a free compute environment