Detect potential electricity theft using Random Forest and yearly average consumption trends. Includes a dashboard to classify customer risk levels and highlight high-risk cases.
-
Generate or load balanced synthetic dataset for Normal, Medium, and High-risk customers.
-
Compute yearly average consumption per customer for improved model accuracy.
-
Random Forest classifier to detect potential theft.
-
Vacation adjustment to reduce false positives.
-
Dashboard visualization including:
- Customer risk distribution
- Top high-risk customers
- Feature importance
- File:
energy_balanced_data.csv - Columns:
| Column | Description |
|---|---|
customer_id |
Unique customer ID |
month |
Month (1–12) |
consumption_kwh |
Monthly electricity consumption |
is_theft |
Label: 0 = Normal, 1 = Potential Theft |
yearly_avg |
Computed yearly average consumption |
Dataset is synthetic but modeled to mimic real-world electricity usage.
- Clone the repository:
git clone https://github.com/<your-username>/energy-theft-detection.git
cd energy-theft-detection- Install dependencies:
pip install -r requirements.txt- Run the main script:
python energy_theft_detection.py- Load or generate dataset – 1000+ customers, 12 months each.
- Feature Engineering – Compute
yearly_avgper customer. - Train Model – Random Forest (300 trees).
- Evaluate Model – Accuracy, confusion matrix, classification report, feature importance.
- Predict New Customer – Input monthly consumption, yearly average, vacation status.
- Dashboard – Visualize risk levels and top high-risk customers.
- Risk Distribution: Normal, Medium, High Risk
- Top High-Risk Customers: Sorted by theft probability
- Feature Importance: Contribution of month, consumption, yearly average
text Month: 5 Consumption: 180 kWh Yearly Average: 300 kWh Vacation: yes
Output:
- Threshold for High Risk: 0.7
- Vacation adjustment reduces probability to avoid false positives
python
joblib.dump(model, "energy_theft_yearlyavg_model.pkl")
model = joblib.load("energy_theft_yearlyavg_model.pkl")
- Accuracy: Replace with your model score
- Classification Report: Precision, Recall, F1-score for Normal vs Theft
- Feature Importance Ranking:
consumption_kwh > yearly_avg > month
- Integrate real electricity consumption datasets.
- Add seasonal, regional, and holiday adjustments.
- Build an interactive web dashboard using Streamlit or Dash.
- Include unsupervised anomaly detection for rare theft patterns.
| Screenshot | Description |
|---|---|
![]() |
Model performance evaluation |
![]() |
Feature impact on prediction |
![]() |
Customer risk classification |
MIT License – See LICENSE
Author-TejaHussain💖


