Industrial Asset Health Monitoring with ML-Powered Anomaly Detection
Real-time sensor monitoring β’ Dual Isolation Forest anomaly detection β’ 100Hz batch feature ML β’ Health scoring β’ PDF/Excel reporting
π Live Demo Β |Β π API Documentation Β |Β β€οΈ Health Check
An end-to-end Predictive Maintenance system that monitors industrial assets (motors, pumps, compressors) in real-time and predicts maintenance needs before failures occur.
| Feature | Description |
|---|---|
| π Sensor Ingestion | Real-time voltage, current, power factor, vibration data via REST API |
| π Feature Engineering | Rolling means, spike detection, efficiency scores, RMS calculations |
| π€ Anomaly Detection | Isolation Forest model trained on healthy baseline data |
| β€οΈ Health Assessment | 0-100 health score with risk classification (LOW β CRITICAL) |
| ποΈ Fault Simulation | Configurable severity levels (MILD/MEDIUM/SEVERE) for targeted testing |
| π‘ Explainability | Human-readable explanations: "Vibration 3.2Ο above normal" |
| π Dashboard | React + Recharts real-time visualization with glassmorphism UI |
| π Reporting | Role-specialized reports: Executive PDF (Plant Managers), Multi-sheet Excel (Analysts), 5-page Industrial Certificate (Engineers) |
| π Operator Logs | Ground-truth maintenance event logging with InfluxDB persistence for supervised ML training |
| π― Baseline Benchmarking | Live status cards display baseline target values for instant comparison |
| π Purge & Re-Calibrate | One-click system reset: wipes InfluxDB data + DI state, returns to IDLE |
| π Keep-Alive Heartbeat | 10-minute /ping heartbeat prevents Render free-tier cold starts |
| π Cumulative Prognostics | Monotonic Degradation Index (DI), Damage Rate, and Remaining Useful Life (RUL) |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React + Vite) β
β π Vercel β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β
β β Metrics β β Chart β β Health β β Explanations β β
β β Cards β β Recharts β β Summary β β Panel β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β HTTPS/JSON (Vercel Rewrites)
ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI + Docker) β
β π Render β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
β β Ingest β β Features β β ML Pipeline β β
β β /ingest β β Engine β β Baseline β Detector β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
β β Health β β Explainer β β Report β β
β β Assessor β β Engine β β Generator β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β InfluxDB Cloud (Time-Series) β
β sensor_data β’ features β’ anomalies β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Technology | Hosting | URL |
|---|---|---|---|
| Frontend | React 18 + Vite | Vercel | predictive-maintenance-ten.vercel.app |
| Backend | FastAPI + Docker | Render | predictive-maintenance-uhlb.onrender.com |
| Database | InfluxDB 2.x | InfluxDB Cloud | AWS us-east-1 |
# Clone the repository
git clone https://github.com/BhaveshBytess/PREDICTIVE-MAINTENANCE.git
cd PREDICTIVE-MAINTENANCE
# Start all services (backend + frontend)
docker-compose up --build
# Access the application
# Frontend: http://localhost:5173
# Backend: http://localhost:8000
# API Docs: http://localhost:8000/docs
β οΈ Windows Users: Never commitnode_modules/to Git. Windows binaries cause permission errors on Linux servers (Vercel Error 126).
cd backend
python -m venv venv
.\venv\Scripts\activate # Windows
source venv/bin/activate # Linux/Mac
pip install -r requirements.txt
uvicorn backend.api.main:app --reloadcd frontend
npm install
npm run devSee DEPLOY.md for detailed instructions on deploying to:
- Render (Backend)
- Vercel (Frontend)
- InfluxDB Cloud (Database)
predictive-maintenance/
βββ backend/
β βββ api/ # FastAPI routes & schemas
β β βββ main.py # Application instance
β β βββ routes.py # /ingest, /health endpoints
β β βββ system_routes.py # Calibration, fault injection, monitoring, purge
β β βββ integration_routes.py # Health scoring, data history, events
β β βββ operator_routes.py # Operator log endpoints
β β βββ sandbox_routes.py # What-If analysis
β β βββ services.py # Business logic helpers
β β βββ schemas.py # Pydantic models
β βββ database.py # InfluxDB client wrapper
β βββ config.py # Settings & environment loader
β βββ storage/ # Blob/file storage abstraction
β βββ models/ # Saved ML model artifacts
β βββ features/ # Feature engineering
β β βββ calculator.py # 1Hz rolling means, spikes, RMS
β β βββ engine.py # Feature extraction orchestrator
β βββ ml/ # Machine Learning (Dual Model)
β β βββ baseline.py # Healthy data profiling
β β βββ detector.py # Legacy Isolation Forest (6 features, 1Hz)
β β βββ batch_features.py # 16-D batch feature extraction (100Hz)
β β βββ batch_detector.py # Batch Isolation Forest (16 features)
β β βββ validation.py # 3-Sigma baseline validation
β βββ events/ # Event Engine
β β βββ engine.py # State machine (HEALTHY β ANOMALY_DETECTED)
β βββ rules/ # Business logic
β β βββ assessor.py # Health scoring, risk & cumulative degradation (DI)
β β βββ explainer.py # Human-readable explanations
β βββ reports/ # PDF/Excel generation
β β βββ generator.py # Basic PDF/Excel reports
β β βββ industrial_report.py # 5-page Industrial Health Certificate
β β βββ constants.py # Colors, costs, thresholds
β β βββ mock_data.py # Simulated historical data
β β βββ components/ # Gauge, charts, audit components
β βββ generator/ # Digital Twin data generator
β βββ generator.py # 100Hz hybrid data generator
β βββ config.py # NASA/IMS fault patterns
βββ frontend/
β βββ src/
β β βββ components/ # React components
β β β βββ Header/
β β β βββ MetricCard/
β β β βββ SignalChart/
β β β βββ HealthSummary/
β β β βββ InsightPanel/
β β β βββ OperatorLog/
β β β βββ LogWatcher/ # Real-time event feed
β β β βββ SystemControlPanel/
β β β βββ StatusCard/
β β β βββ PerformanceCard/
β β β βββ SandboxModal/
β β βββ hooks/ # usePolling
β β βββ api/ # API client
β βββ Dockerfile # Multi-stage nginx build
βββ scripts/
β βββ generate_data.py # CLI data generator (healthy/faulty)
β βββ demo_pipeline.py # End-to-end demo automation
β βββ benchmark_model.py # Model performance benchmarking
β βββ retrain_batch_model.py # Standalone batch model retraining
β βββ setup_linux.sh # Bare-metal Linux setup
β βββ backend.service # Systemd unit file
βββ tests/ # 182 unit tests
βββ docker-compose.yml # Full stack deployment
βββ Dockerfile # Backend container
βββ ENGINEERING_LOG.md # Decision journal
POST /ingest
Content-Type: application/json
{
"event_id": "uuid-v4",
"timestamp": "2026-01-12T00:00:00Z",
"asset_id": "Motor-01",
"sensor_data": {
"voltage_v": 230.5,
"current_a": 12.3,
"power_factor": 0.92,
"vibration_g": 0.15
}
}GET /health
Response: { "status": "healthy", "db_connected": true }GET /ping
Response: { "status": "ok" }Used by the frontend's 10-minute heartbeat to keep the Render free-tier backend warm.
POST /system/purge
Response: { "status": "purged", "message": "All data and models cleared. System reset to IDLE." }Writes DI=0.0 to InfluxDB, clears in-memory baselines/detectors/history, and resets state to IDLE.
The system runs two Isolation Forest models trained during calibration:
| Model | Features | Input | F1 @ 0.5 | AUC-ROC | Jitter Detection |
|---|---|---|---|---|---|
| Legacy (v2) | 6 | 1Hz averages | 78.1% | 1.000 | β |
| Batch (v3) | 16 | 100Hz windows | 99.6% | 1.000 | β |
The batch model is primary for inference; the legacy model is retained for backward compatibility.
Each 1-second window of 100 raw sensor points is reduced to a 16-D statistical feature vector:
| Signal | mean | std | peak_to_peak | rms |
|---|---|---|---|---|
voltage_v |
β | β | β | β |
current_a |
β | β | β | β |
power_factor |
β | β | β | β |
vibration_g |
β | β | β | β |
Why it matters: A "Jitter Fault" where average vibration is 0.15g (normal) but Ο=0.17g (5x healthy) is invisible to 1Hz models. The batch model catches it because std and peak_to_peak are explicit features.
| Feature | Formula | Window |
|---|---|---|
voltage_rolling_mean_1h |
Mean of voltage over 1 hour | Past-only |
current_spike_count |
Points > 3Ο from local mean | 10-point window |
power_factor_efficiency_score |
(PF - 0.8) / 0.2 * 100 |
Instantaneous |
vibration_intensity_rms |
β(mean(vibrationΒ²)) | Past-only |
voltage_stability |
` | V - 230.0 |
power_vibration_ratio |
vibration / (PF + 0.01) |
Instantaneous |
| Type | Description | Detectable By |
|---|---|---|
| SPIKE | Voltage/current surges | Both models |
| DRIFT | Gradual degradation | Both models |
| JITTER | Normal means, high variance | Batch model only |
| DEFAULT | General fault pattern | Both models |
Health is derived from the Cumulative Degradation Index (DI), a monotonically increasing damage accumulator:
# Dead-zone: healthy noise produces zero damage
HEALTHY_FLOOR = 0.65
if batch_score < HEALTHY_FLOOR:
effective_severity = 0.0
else:
effective_severity = (batch_score - HEALTHY_FLOOR) / (1.0 - HEALTHY_FLOOR)
# Cumulative damage increment
SENSITIVITY_CONSTANT = 0.005
DI_increment = (effective_severity ** 2) * SENSITIVITY_CONSTANT * dt
DI = min(DI + DI_increment, 1.0) # monotonic, capped at 1.0
# Health & RUL derived from DI
health_score = round(100 * (1.0 - DI))
RUL_hours = (1.0 - DI) / max(damage_rate, 1e-9)
# Risk Classification
if health_score < 25: risk = CRITICAL
elif health_score < 50: risk = HIGH
elif health_score < 75: risk = MODERATE
else: risk = LOWKey properties:
- Monotonic: DI never decreases (except on explicit purge). A quiet minute doesn't erase past damage.
- Dead-Zone: Batch scores below 0.65 (healthy noise) accumulate zero damage.
- Hydration: On restart, DI is recovered from InfluxDB (
|> last()), so state survives process restarts. - Purge Reset:
POST /system/purgewrites DI=0.0 to InfluxDB and clears in-memory state.
Dark theme with glassmorphism β’ Real-time charts β’ Color-coded risk levels
Core Features:
- π’ STATUS: LIVE badge with real-time connection indicator
- π Real-time Power Signature chart with Recharts
- οΏ½ Multi-signal streaming chart β Voltage (V), Current (A), Vibration (g) with fixed Y-axis domains and 60s right-anchored sliding window
- π΄ Red shaded regions for anomaly spans (noise-suppressed: majority-rules aggregation)
- π― Health Score ring (0-100) with color coding:
- Green (75-100): LOW risk
- Yellow/Orange (50-74): MODERATE risk
- Orange (25-49): HIGH risk
- Red (0-24): CRITICAL risk
- β° Maintenance Window estimation (days until recommended service)
- π‘ Insight panel with batch-feature explanations (e.g., "High vibration variance: Ο=0.17g")
- π Log Watcher β real-time event feed with transition-based state machine events
- π₯ Download options:
- Executive PDF β 1-page summary with Health Grade (A/B/C/D/F) for Plant Managers
- Multi-sheet Excel β Summary, Operator Logs, Raw Sensor Data for Data Analysts
- Industrial PDF β 5-page technical report with Maintenance Correlation Analysis for Engineers
- π Operator Log Panel β Real-time maintenance event logging with severity levels
- π― Baseline Target Display β Status cards show calibrated baseline targets alongside live readings
- π Purge & Re-Calibrate β Purple button to wipe all data and restart calibration from scratch
- π Keep-Alive Heartbeat β Automatic 10-minute
/pingto prevent Render free-tier cold starts
Anomaly Visualization Logic:
- Red dashed lines appear only when risk β LOW
- When system is healthy, no anomaly markers shown
Fault Injection Controls:
- π― Fault Type: Spike, Drift, Jitter, or Default patterns
- ποΈ Severity Levels:
- π‘ MILD β Targets MODERATE risk (health 50-74)
- π MEDIUM β Targets HIGH risk (health 25-49)
- π΄ SEVERE β Targets CRITICAL risk (health 0-24)
- Jitter fault: Normal means, abnormal variance β specifically tests batch model advantage
All risk levels have been tested with real sensor data:
| Risk Level | Health Score | Red Lines | Maintenance Window | Test Status |
|---|---|---|---|---|
| LOW | 75+ | β None | ~60 days | β Pass |
| MODERATE | 50-74 | β
Yes + |
~19 days | β Pass |
| HIGH | 25-49 | β
Yes + |
~4 days | β Pass |
| CRITICAL | 0-24 | β
Yes + |
< 1 day | β Pass |
# Run all tests
pytest tests/ -v
# Run specific test module
pytest tests/test_features.py -v
pytest tests/test_detector.py -v
pytest tests/test_assessor.py -v
pytest tests/test_degradation.py -v
pytest tests/test_reports.py -v
# Coverage report
pytest tests/ --cov=backend --cov-report=htmlTest coverage by module (182 total):
| Module | Tests | Coverage |
|---|---|---|
| Cumulative Degradation | 37 | β |
| Health Assessment | 21 | β |
| Data Generator | 21 | β |
| Feature Engineering | 20 | β |
| API Validation | 17 | β |
| Reporting (PDF/Excel) | 15 | β |
| Baseline Construction | 14 | β |
| Anomaly Detection | 14 | β |
| Explainability | 13 | β |
| Storage | 10 | β |
Backend (backend/.env):
ENVIRONMENT=production
PORT=8000
INFLUX_URL=https://us-east-1-1.aws.cloud2.influxdata.com
INFLUX_TOKEN=<your-influxdb-token>
INFLUX_ORG=<your-org-id>
INFLUX_BUCKET=sensor_dataFrontend (Vercel Dashboard or local .env):
VITE_API_URL=https://predictive-maintenance-uhlb.onrender.com| Service | Port | Description |
|---|---|---|
backend |
8000 | FastAPI application |
frontend |
5173 | React dashboard (nginx, production build) |
All services have restart: unless-stopped for resilience.
Key architectural decisions are documented in ENGINEERING_LOG.md:
- Phase 4: NaN for cold-start windows (prevents false zeros)
- Phase 6: Inverted sigmoid for anomaly score semantics
- Phase 7: Deterministic health formula with named thresholds
- Phase 8: Epsilon rule for practical significance
- Phase 9: Pure renderer pattern (frontend displays, backend computes)
- Phase 10: Snapshot rule for auditable reports; 5-page Industrial Certificate
- Phase 11: Dual deployment (Docker + systemd)
- Phase 13: Operator Log feature with InfluxDB persistence; role-specialized reports
- Phase 14: 100Hz high-frequency pipeline with server-side aggregation; event engine state machine
- Phase 15: Batch ML retraining β 16-D features from 100Hz windows; JITTER fault type; F1=99.6%
- Phase 16: Temporal anchoring β 60s right-anchored sliding window, fixed Y-axis domains, multi-signal chart
- Phase 17: Noise suppression β 25% tolerance, majority-rules aggregation (β₯15/100), 2s event debounce
- Phase 18: Cloud recovery β lazy-loaded ML imports to prevent Render 503,
/pingendpoint,from __future__ import annotationsfor deferred type evaluation - Phase 19: Baseline benchmarking on status cards, deep system purge (
/system/purge), report refinement (real anomaly scores, sanitized operator logs) - Phase 20: Cumulative Degradation Index (DI) engine with dead-zone (
HEALTHY_FLOOR=0.65), sensitivity tuning (SENSITIVITY_CONSTANT=0.005), DI hydration from InfluxDB, purge DI-reset, CORS hardening, report enrichment with DI/Damage-Rate/RUL - Scoring: Batch-feature inference (primary) with legacy model fallback
docker-compose up -dsudo ./scripts/setup_linux.sh
sudo systemctl status predictive-maintenanceResilience features:
- Docker:
restart: unless-stopped - Systemd:
Restart=always,RestartSec=5 - Health checks on all services
This project is for educational and demonstration purposes.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'feat: add amazing feature') - Push to branch (
git push origin feature/amazing) - Open a Pull Request
Built with β€οΈ for Industrial IoT