This guide compares different statistical methods for setting thresholds in manufacturing anomaly detection systems.
When analyzing sensor data (Speed, Humidity, Temperature, Vibration, etc.) to establish alert thresholds, you have two primary approaches:
- Percentile-Based Thresholds (P5/P95)
- Z-Score Based Thresholds (Standard Deviation)
| Aspect | P5 & P95 Percentiles | Z-Score ±2.0 |
|---|---|---|
| Data Coverage | 90% (excludes 5% each side) | 95.4% (excludes 2.3% each side) |
| Sensitivity | More sensitive (tighter bounds) | Less sensitive (looser bounds) |
| Distribution Assumption | None - works with any data shape | Assumes normal distribution |
| Outlier Handling | Robust to extreme outliers | Affected by outliers |
| Symmetry | Adapts to data asymmetry | Always symmetric around mean |
| False Alarm Rate | Higher (more alerts) | Lower (fewer alerts) |
| Industry Standard | Data-driven approach | Statistical process control |
- Manufacturing sensor data with unknown distribution
- Need high sensitivity to process changes
- Data contains occasional sensor spikes/errors
- Want distribution-free approach
- Starting point for new monitoring systems
- Data follows roughly normal distribution
- Want fewer false alarms
- Following statistical process control standards
- Need theoretical foundation for thresholds
- Percentile method generates too many alerts
events
| where Timestamp >= startTime and Timestamp <= endTime
| summarize
LowerThreshold_P5 = percentile(Humidity, 5),
UpperThreshold_P95 = percentile(Humidity, 95),
MedianValue = percentile(Humidity, 50)events
| where Timestamp >= startTime and Timestamp <= endTime
| summarize
AvgValue = avg(Humidity),
StdDevValue = stdev(Humidity),
LowerThreshold_ZScore2 = avg(Humidity) - (2.0 * stdev(Humidity)),
UpperThreshold_ZScore2 = avg(Humidity) + (2.0 * stdev(Humidity))- Start with Percentile Method (P5/P95) for initial threshold setting
- Monitor alert frequency and accuracy over 1-2 weeks
- Switch to Z-Score ±2.0 if experiencing too many false alarms
- Consider Z-Score ±1.5 for even higher sensitivity if needed
- Historical Analysis: Use 30 days of data excluding last 24 hours
- Exclude Recent Data: Avoid contaminating baseline with current anomalies
- Regular Updates: Recalculate thresholds monthly or quarterly
find_humidity_range_z_score.kql- Z-score implementation for humidityspeed_range.kql- Percentile implementation for speed analysisDefectProbability_range.kql- Percentile analysis for defect prediction
Both methods are valid. Percentiles are often better for real-world manufacturing data due to their robustness and sensitivity, while Z-scores provide theoretical grounding and are standard in statistical process control.