Skip to content

Commit 5f4333a

Browse files
committed
Sec -> second
1 parent e2a3e52 commit 5f4333a

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

articles/stream-analytics/stream-analytics-machine-learning-anomaly-detection.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ ms.date: 10/05/2022
1010

1111
Available in both the cloud and Azure IoT Edge, Azure Stream Analytics offers built-in machine learning based anomaly detection capabilities that can be used to monitor the two most commonly occurring anomalies: temporary and persistent. With the **AnomalyDetection_SpikeAndDip** and **AnomalyDetection_ChangePoint** functions, you can perform anomaly detection directly in your Stream Analytics job.
1212

13-
The machine learning models assume a uniformly sampled time series. If the time series is not uniform, you may insert an aggregation step with a tumbling window prior to calling anomaly detection.
13+
The machine learning models assume a uniformly sampled time series. If the time series isn't uniform, you may insert an aggregation step with a tumbling window prior to calling anomaly detection.
1414

15-
The machine learning operations do not support seasonality trends or multi-variate correlations at this time.
15+
The machine learning operations don't support seasonality trends or multi-variate correlations at this time.
1616

1717
## Anomaly detection using machine learning in Azure Stream Analytics
1818

@@ -24,9 +24,9 @@ The following video demonstrates how to detect an anomaly in real time using mac
2424

2525
Generally, the model's accuracy improves with more data in the sliding window. The data in the specified sliding window is treated as part of its normal range of values for that time frame. The model only considers event history over the sliding window to check if the current event is anomalous. As the sliding window moves, old values are evicted from the model's training.
2626

27-
The functions operate by establishing a certain normal based on what they have seen so far. Outliers are identified by comparing against the established normal, within the confidence level. The window size should be based on the minimum events required to train the model for normal behavior so that when an anomaly occurs, it would be able to recognize it.
27+
The functions operate by establishing a certain normal based on what they've seen so far. Outliers are identified by comparing against the established normal, within the confidence level. The window size should be based on the minimum events required to train the model for normal behavior so that when an anomaly occurs, it would be able to recognize it.
2828

29-
The model's response time increases with history size because it needs to compare against a higher number of past events. It is recommended to only include the necessary number of events for better performance.
29+
The model's response time increases with history size because it needs to compare against a higher number of past events. It's recommended to only include the necessary number of events for better performance.
3030

3131
Gaps in the time series can be a result of the model not receiving events at certain points in time. This situation is handled by Stream Analytics using imputation logic. The history size, as well as a time duration, for the same sliding window is used to calculate the average rate at which events are expected to arrive.
3232

@@ -68,7 +68,7 @@ FROM AnomalyDetectionStep
6868

6969
Persistent anomalies in a time series event stream are changes in the distribution of values in the event stream, like level changes and trends. In Stream Analytics, such anomalies are detected using the Machine Learning based [AnomalyDetection_ChangePoint](/stream-analytics-query/anomalydetection-changepoint-azure-stream-analytics) operator.
7070

71-
Persistent changes last much longer than spikes and dips and could indicate catastrophic event(s). Persistent changes are not usually visible to the naked eye, but can be detected with the **AnomalyDetection_ChangePoint** operator.
71+
Persistent changes last much longer than spikes and dips and could indicate catastrophic event(s). Persistent changes aren't usually visible to the naked eye, but can be detected with the **AnomalyDetection_ChangePoint** operator.
7272

7373
The following image is an example of a level change:
7474

@@ -114,22 +114,22 @@ The performance of these models depends on the history size, window duration, ev
114114
### Relationship
115115
The history size, window duration, and total event load are related in the following way:
116116

117-
windowDuration (in ms) = 1000 * historySize / (Total Input Events Per Sec / Input Partition Count)
117+
windowDuration (in ms) = 1000 * historySize / (total input events per second / Input Partition Count)
118118

119119
When partitioning the function by deviceId, add "PARTITION BY deviceId" to the anomaly detection function call.
120120

121121
### Observations
122122
The following table includes the throughput observations for a single node (6 SU) for the non-partitioned case:
123123

124-
| History size (events) | Window duration (ms) | Total input events per sec |
124+
| History size (events) | Window duration (ms) | Total input events per second |
125125
| --------------------- | -------------------- | -------------------------- |
126126
| 60 | 55 | 2,200 |
127127
| 600 | 728 | 1,650 |
128128
| 6,000 | 10,910 | 1,100 |
129129

130130
The following table includes the throughput observations for a single node (6 SU) for the partitioned case:
131131

132-
| History size (events) | Window duration (ms) | Total input events per sec | Device count |
132+
| History size (events) | Window duration (ms) | Total input events per second | Device count |
133133
| --------------------- | -------------------- | -------------------------- | ------------ |
134134
| 60 | 1,091 | 1,100 | 10 |
135135
| 600 | 10,910 | 1,100 | 10 |
@@ -138,13 +138,13 @@ The following table includes the throughput observations for a single node (6 SU
138138
| 600 | 218,182 | 550 | 100 |
139139
| 6,000 | 2,181,819 | <550 | 100 |
140140

141-
Sample code to run the non-partitioned configurations above is located in the [Streaming At Scale repo](https://github.com/Azure-Samples/streaming-at-scale/blob/f3e66fa9d8c344df77a222812f89a99b7c27ef22/eventhubs-streamanalytics-eventhubs/anomalydetection/create-solution.sh) of Azure Samples. The code creates a stream analytics job with no function level partitioning, which uses Event Hub as input and output. The input load is generated using test clients. Each input event is a 1KB json document. Events simulate an IoT device sending JSON data (for up to 1K devices). The history size, window duration, and total event load are varied over 2 input partitions.
141+
Sample code to run the non-partitioned configurations above is located in the [Streaming At Scale repo](https://github.com/Azure-Samples/streaming-at-scale/blob/f3e66fa9d8c344df77a222812f89a99b7c27ef22/eventhubs-streamanalytics-eventhubs/anomalydetection/create-solution.sh) of Azure Samples. The code creates a stream analytics job with no function level partitioning, which uses Event Hubs as input and output. The input load is generated using test clients. Each input event is a 1KB json document. Events simulate an IoT device sending JSON data (for up to 1K devices). The history size, window duration, and total event load are varied over 2 input partitions.
142142

143143
> [!Note]
144144
> For a more accurate estimate, customize the samples to fit your scenario.
145145
146146
### Identifying bottlenecks
147-
Use the Metrics pane in your Azure Stream Analytics job to identify bottlenecks in your pipeline. Review **Input/Output Events** for throughput and ["Watermark Delay"](https://azure.microsoft.com/blog/new-metric-in-azure-stream-analytics-tracks-latency-of-your-streaming-pipeline/) or **Backlogged Events** to see if the job is keeping up with the input rate. For Event Hub metrics, look for **Throttled Requests** and adjust the Threshold Units accordingly. For Cosmos DB metrics, review **Max consumed RU/s per partition key range** under Throughput to ensure your partition key ranges are uniformly consumed. For Azure SQL DB, monitor **Log IO** and **CPU**.
147+
Use the Metrics pane in your Azure Stream Analytics job to identify bottlenecks in your pipeline. Review **Input/Output Events** for throughput and ["Watermark Delay"](https://azure.microsoft.com/blog/new-metric-in-azure-stream-analytics-tracks-latency-of-your-streaming-pipeline/) or **Backlogged Events** to see if the job is keeping up with the input rate. For Event Hubs metrics, look for **Throttled Requests** and adjust the Threshold Units accordingly. For Cosmos DB metrics, review **Max consumed RU/s per partition key range** under Throughput to ensure your partition key ranges are uniformly consumed. For Azure SQL DB, monitor **Log IO** and **CPU**.
148148

149149
## Next steps
150150

0 commit comments

Comments
 (0)