You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(isolationforestprocessor): add adaptive window sizing for dynamic resource optimization (open-telemetry#42752)
#### Description
This PR adds **adaptive window sizing** functionality to the Isolation
Forest processor, enabling dynamic adjustment of sliding window sizes
based on traffic patterns, memory usage, and model stability. This
enhancement optimizes resource utilization and improves anomaly
detection accuracy for varying workload conditions.
**Key Features Added:**
- **Dynamic window scaling** that adapts to traffic velocity
- **Memory-aware shrinkage** to prevent resource exhaustion
- **Stability-based expansion** for optimal model performance
- **Configurable adaptation parameters** with sensible defaults
- **Enhanced logging** with adaptive statistics and metrics
**Backward Compatibility:**
- **100% backward compatible** - all existing configurations continue to
work unchanged
- **No breaking changes** - adaptive window is disabled by default
(`enabled: false`)
- **Preserves existing behavior** - when adaptive window is disabled,
processor functions identically to before
- **Optional feature** - users can opt-in to adaptive functionality via
configuration
#### Link to tracking issue
Fixesopen-telemetry#42751
#### Testing
**Comprehensive test coverage added:**
- **Unit tests** for all adaptive window algorithms (velocity tracking,
memory monitoring, stability checking)
- **Configuration validation tests** for all adaptive window parameters
and edge cases
- **Integration tests** verifying adaptive behavior with processor
lifecycle
- **Boundary condition tests** for min/max window sizes and adaptation
rates
- **Multi-model adaptive tests** ensuring compatibility with existing
multi-model functionality
- **Error handling tests** for invalid configurations and runtime edge
cases
**Test Results:**
- All existing tests continue to pass (backward compatibility verified)
- New adaptive functionality achieves >90% code coverage
- Integration tests confirm proper adaptive behavior under various load
patterns
#### Documentation
**Documentation updated:**
- README.md enhanced with adaptive window sizing feature, configuration
parameters, and usage examples
- Configuration tables expanded with adaptive window parameters and best
practices
- Performance characteristics updated to reflect dynamic training costs
with adaptive sizing
---------
Co-authored-by: Antoine Toulme <[email protected]>
Copy file name to clipboardExpand all lines: processor/isolationforestprocessor/README.md
+28-5Lines changed: 28 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,7 @@ The **Isolation Forest processor** adds inline, unsupervised anomaly detection t
24
24
|**Realtime Isolation Forest**| Builds an ensemble of random trees over a sliding window of recent data and assigns a 0–1 anomaly score on ingestion (≈ *O(log n)* per point). |
25
25
|**Multi‑signal support**| Can be inserted into **traces**, **metrics**, **logs** pipelines – one config powers all three. |
26
26
|**Per‑entity modelling**|`features` config lets you maintain a separate model per unique combination of resource / attribute keys (e.g. per‑pod, per‑service). |
27
+
|**Adaptive Window Sizing**| Automatically adjusts window size based on traffic patterns, memory usage, and model stability for optimal performance and resource utilization. |
27
28
|**Flexible output**| • Add an attribute `iforest.is_anomaly=true` <br>• Emit a gauge metric `iforest.anomaly_score` <br>• Drop anomalous telemetry entirely. |
28
29
|**Config‑driven**| Tune tree count, subsample size, contamination rate, sliding‑window length, retraining interval, target metrics, and more – all in `collector.yml`. |
29
30
|**Zero external deps**| Pure Go implementation; runs wherever the Collector does (edge, gateway, or backend). |
@@ -35,7 +36,8 @@ The **Isolation Forest processor** adds inline, unsupervised anomaly detection t
35
36
1.**Training window** – The processor keeps up to `window_size` of the most recent data points for every feature‑group.
36
37
2.**Periodic (re‑)training** – Every `training_interval`, it draws `subsample_size` points from that window and grows `forest_size` random isolation trees.
37
38
3.**Scoring** – Each new point is pushed through the forest. Shorter average path length ⇒ higher anomaly score.
38
-
4.**Post‑processing** –
39
+
4.**Adaptive sizing** – When enabled, window size automatically adjusts based on traffic velocity, memory usage, and model stability.
40
+
5.**Post‑processing** –
39
41
40
42
* If `add_anomaly_score: true`, a gauge metric `iforest.anomaly_score` is emitted with identical attributes/timestamp.
41
43
* If the score ≥ `anomaly_threshold`, the original span/metric/log is flagged with `iforest.is_anomaly=true`.
@@ -61,6 +63,21 @@ Performance is linear in `forest_size` and logarithmic in `window_size`; a defau
61
63
|`metrics_to_analyze`|\[]string |`[]`| Only these metric names are scored (metrics pipeline only). Blank ⇒ all. |
|**Traces**| Span **duration** (ns) |`service.name`, `k8s.pod.name`|`iforest.is_anomaly` attr + optional drop | Use a span/trace exporter to route anomalies. |
126
143
|**Metrics**| Only `system.cpu.utilization`, `system.memory.utilization`| Same | Attribute + score metric | The score appears as `iforest.anomaly_score` gauge. |
@@ -133,6 +150,7 @@ service:
133
150
***Tune `forest_size` vs. latency** – start with 100 trees; raise to 200–300 if scores look noisy.
134
151
***Use per‑entity models** – add `features` (service, pod, host) to avoid global comparisons across very different series.
135
152
***Let contamination drive threshold** – set `contamination_rate` to the % of traffic you’re comfortable labelling outlier; avoid hand‑tuning `anomaly_threshold`.
153
+
***Use adaptive window sizing** – enable for dynamic workloads; the processor will automatically grow windows during high traffic and shrink under memory pressure.
136
154
***Route anomalies** – keep `drop_anomalous_data=false` and add a simple \[routing‑processor] downstream to ship anomalies to a dedicated exporter or topic.
137
155
***Monitor model health** – the emitted `iforest.anomaly_score` metric is perfect for a Grafana panel; watch its distribution and adapt window / contamination accordingly.
*Training cost*: **O(window\_size × forest\_size × log subsample\_size)** every `training_interval`
154
-
*Scoring cost*: **O(forest\_size × log subsample\_size)** per item
172
+
173
+
*Training cost*: **O(current_window_size × forest_size × log subsample_size)** every `training_interval`
174
+
*Scoring cost*: **O(forest_size × log subsample_size)** per item
175
+
176
+
**Note:** With adaptive window sizing enabled, `current_window_size` dynamically adjusts between `min_window_size` and `max_window_size` based on traffic patterns and memory constraints, making training costs adaptive to workload conditions.
177
+
155
178
156
179
---
157
180
158
181
## 🤝 Contributing
159
182
160
183
***Bugs / Questions** – please open an issue in the fork first.
184
+
***Recently added**: Adaptive window sizing for dynamic traffic patterns.
161
185
***Planned enhancements**
162
186
163
187
* Multivariate scoring (multiple numeric attributes per point).
164
-
* Adaptive window size.
165
188
* Expose Prometheus counters for training time / CPU cost.
166
189
167
190
PRs welcome – please include unit tests and doc updates.
0 commit comments