Skip to content

Commit 7aa2a93

Browse files
authored
Merge pull request #97617 from orspod/links-to-queries
Add link to queries in 3 TS and ML docs
2 parents d73f57e + 197a970 commit 7aa2a93

File tree

3 files changed

+56
-6
lines changed

3 files changed

+56
-6
lines changed

articles/data-explorer/anomaly-detection.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Time series anomaly detection and forecasting in Azure Data Explorer
33
description: Learn how to analyze time series data for anomaly detection and forecasting using Azure Data Explorer.
44
author: orspod
55
ms.author: orspodek
6-
ms.reviewer: jasonh
6+
ms.reviewer: adieldar
77
ms.service: data-explorer
88
ms.topic: conceptual
99
ms.date: 04/24/2019
@@ -26,6 +26,8 @@ To create a decomposition model, use the function [`series_decompose()`](/azure/
2626

2727
For example, you can decompose traffic of an internal web service by using the following query:
2828

29+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA3WQ3WrDMAyF7/sUukvCnDXJGIOVPEULuwxqoixm/gm2+jf28JObFjbYrmyho3M+yRCD1a5jaGFAJtaW8qaqX8qqLqvnYrMySYHnvxRNWT1B07xW1U03JFEzbVYDWd9Z/KAuUtAUm9UXpLJcSnAH2+LxPZe3AO9gJ6ZbRjvDGLy9EbG/BUemOXnvLxD1AOJ1mijQtWhbyHbbOgOA9RogkqGeAaXn3g1BooVb6OiDNHpD6CjAUccDGv2JrL0TSzozuQHyPYqHdqRkDKN3aBRwkJaCQJIoQ4VsuXh2A/Xezj5SWkVBWSvI0vSoOSsWpLtEpyDwY4KTW8nnJ5ws+2+eAhSyOxjkd+HDVVcIfHplp2TYTxgYTpqnnDUbarM32gPO86PY4jjqfmGw3vGkftNlCi5xNprbWW5kYvENQQnqDh8CAAA=)**\]**
30+
2931
```kusto
3032
let min_t = datetime(2017-01-05);
3133
let max_t = datetime(2017-02-03 22:00);
@@ -51,6 +53,8 @@ The function [`series_decompose_anomalies()`](/azure/kusto/query/series-decompos
5153

5254
The following query allows you to detect anomalies in internal web service traffic:
5355

56+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA3WR3W7CMAyF73mKI25KpRbaTmjSUJ8CpF1WoXVptPxUifmb9vBLoGO7GFeR7ePv2I4ihpamYdToBBNLTYuqKF/zosyLdbqZqagQl/8UVV68oKreimLSdVFUDZtZR9o2WnxQ48lJ8tXsCzHM7yHMUdfidFiEN4U12AXoloUe0Turp4nYTsaeaYzs/RVedgis80CObkFdI9ltywTAagV4UtQyRKiZgyLEaTGZ9taFQqtIGHI4SX8USn4KltYEJF2YTIeFMFaHPPkMvrWOMuxFoEpDaVjujmo6aq0erafmIY+7ZCiX6wx5mSGJHb3kJA1sF8jB8q69toNwjLPkYfGTseqoja//eLNkRXXyTnuIcVyCneh72cL2YQdtDQ8ZHvIkDcsfPWH+3AvPvObx0FMXD/RLhfDYW9VhtNKwj/8U69M1b2S//AbRUQMWQQIAAA==)**\]**
57+
5458
```kusto
5559
let min_t = datetime(2017-01-05);
5660
let max_t = datetime(2017-02-03 22:00);
@@ -74,6 +78,8 @@ The function [`series_decompose_forecast()`](/azure/kusto/query/series-decompose
7478

7579
The following query allows you to predict next week's web service traffic:
7680

81+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA22QzW6DMBCE73mKuQFqKISqitSIW98gkXpEDl5iK9hG9uanUR++dqE99YRGO8x845EYRtuO0UIKJtaG8qbebMt6U9avxW41Joe4/+doyvoFTfNW14tPJlOjZqGc1w9n263crSQZ1xlxpi6Q1xSa1ReSLGcJezGtuJ7y+C3gLA6xZM/CTBi8MwshuxnkaUlGYJpS5/ETQUvEzJsiTz+ibZEd9psMQFUBgUbqGSLe7GkkpBVYygfn46EfSVjyuOpwEaN+CNbOxki6M1mZTNSLkAbOv3WSemcmF6j7vSX8dcTUlvOFsZJcFDHFx4wYnmp7JTzjplnlrHmkNvugI8Q0PYO9GAbdww0RyDjLav1XHLnBimAjEG5E5zQ7vRP284x36hOOTtxZ8Q3The8P2QEAAA==)**\]**
82+
7783
```kusto
7884
let min_t = datetime(2017-01-05);
7985
let max_t = datetime(2017-02-03 22:00);
@@ -83,7 +89,7 @@ demo_make_series2
8389
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
8490
| where sid == 'TS1' // select a single time series for a cleaner visualization
8591
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
86-
| render timechart with(title='Web app. traffic of a month, forecasting the next week by Time Series Decmposition')
92+
| render timechart with(title='Web app. traffic of a month, forecasting the next week by Time Series Decomposition')
8793
```
8894

8995
![Time series forecasting](media/anomaly-detection/series-forecasting.png)
@@ -97,6 +103,8 @@ Azure Data Explorer query language syntax enables a single call to process multi
97103

98104
The following query shows the processing of three time series simultaneously:
99105

106+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA21Qy26DMBC85yvmFlChcUirSI34ikTqETl4KVawjfDmqX587UCaHuqLtePxPLYjhtG2YpRQkom1oaQQy3Uulrl4TzezLjLk5T9GkYsViuJDiImnIqlox6F1g745W67VZqbIuMrIA1WeBk2+mH0jjvk4wh5NKU9fSbhTOItdMNmyND2awZkpIbsxyMukDM/UR8/9FV6rIEkXJqvgmsYTl7X0lISHspzvtqt5hjdxPxkeYBHA4gGKFMBiAUilIAfWja617CY1NG4ASX/FSfuj7PRNsg4ZXANz7Fj3HSGuBmOjZ5hYbcSqIBwbZpNk+iQFcQpx4/omrqLamd55qh5v41d22nIybWChOI0qQ9Cg4e5ftyE6zprbhDV3VM4/aQ/Z96/gQTahU4wsYZzlNvs11vYL3BJsCIQz0eHed/W30jz9AUEBI0ktAgAA)**\]**
107+
100108
```kusto
101109
let min_t = datetime(2017-01-05);
102110
let max_t = datetime(2017-02-03 22:00);

articles/data-explorer/machine-learning-clustering.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Machine learning capability in Azure Data Explorer
33
description: Use machine learning clustering for Root Cause Analysis in Azure Data Explorer.
44
author: orspod
55
ms.author: orspodek
6-
ms.reviewer: jasonh
6+
ms.reviewer: adieldar
77
ms.service: data-explorer
88
ms.topic: conceptual
99
ms.date: 04/29/2019
@@ -19,7 +19,9 @@ Azure Data Explorer has three Machine Learning plugins: [`autocluster`](/azure/k
1919

2020
## Clustering a single record set
2121

22-
A common scenario includes a data set selected by a specific criteria such as time window that exhibits anomalous behavior, high temperature device readings, long duration commands, and top spending users. We would like a simple and fast way to find common patterns (segments) in the data. Patterns are a subset of the data set whose records share the same values over multiple dimensions (categorical columns). The following query builds and shows a time series of service exceptions over a week in ten-minute bins:
22+
A common scenario includes a data set selected by a specific criteria such as time window that exhibits anomalous behavior, high temperature device readings, long duration commands, and top spending users. We would like an easy and fast way to find common patterns (segments) in the data. Patterns are a subset of the data set whose records share the same values over multiple dimensions (categorical columns). The following query builds and shows a time series of service exceptions over a week in ten-minute bins:
23+
24+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA5XPsaoCQQyF4d6nCFa7oHCtZd9B0F6G8ajByWTJZHS5+PDOgpVgYRn485EkOAnno9NAriWGFKw7QfQYUy0O43zZ0JNKFQnG/5jrbmeIXHBgwd6DjH2/JVqk2QrTL1aYvlifa4tni29YlzaiUK4yRK3Zu54006dBZ1N5/+X6PqpRI23+pFGGfIKRtz5egzk92K+dsycMyz3szhGEKWJ01lxI760O9ABuq0bMcvV2hqFoqnOz7F9BdSHlSgEAAA==)**\]**
2325

2426
```kusto
2527
let min_t = toscalar(demo_clustering1 | summarize min(PreciseTimeStamp));
@@ -35,6 +37,8 @@ The service exception count correlates with the overall service traffic. You can
3537

3638
The second spike in the data occurs on Tuesday afternoon. The following query is used to further diagnose this spike. Use the query to redraw the chart around the spike in higher resolution (eight hours in one-minute bins) to verify whether it’s a sharp spike, and view its borders.
3739

40+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAAyXNwQrCMBAE0Hu/YvHUooWkghSl/yDoyUsJyWpCk2xJNnjx403pbeYwbzwyBBdnnoxiZBewHYS89GLshzNIeRWiuzUGA83al8yYXPzI5gdBLdjnWjFDLGHSVCK3HVCEe0LtMj4r9mAVVngnCvsLMO3hOFqo2goyVCxhNJhgu9dWJYavY9uyY4/T4UV1XVm2CEM0kFe34AnkBhXGOs7kCzuKh+4P3/XM5M8AAAA=)**\]**
41+
3842
```kusto
3943
let min_t=datetime(2016-08-23 11:00);
4044
demo_clustering1
@@ -46,6 +50,8 @@ demo_clustering1
4650

4751
We see a narrow two-minute spike from 15:00 to 15:02. In the following query, count the exceptions in this two-minute window:
4852

53+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA8tJLVHIzcyLL0hNzI4vsU1JLEktycxN1TAyMDTTNbDQNTJWMDS1MjDQtObKASlNrCCk1AioNCU1Nz8+Oae0uCS1KDMv3ZCrRqE8I7UoVSGgKDU5szg1BKgvuCQxt0AhKbWkPDU1TwPhBj09hCWaQI3J+aV5JQACnQoRpwAAAA==)**\]**
54+
4955
```kusto
5056
let min_peak_t=datetime(2016-08-23 15:00);
5157
let max_peak_t=datetime(2016-08-23 15:02);
@@ -60,6 +66,8 @@ demo_clustering1
6066

6167
In the following query, sample 20 exceptions out of 972:
6268

69+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA4XOsQrCMBSF4b1Pccd2aLmJKKL4DoLu4doeNDSJJb1SBx/eOHV0/37OCVCKPrkJMjo9DaJQH1FbNruW963dkNkemJtjFX5U3v+oLXRAfLo+vGZF9uluqg8tD2TQOaP3M66lu6jEiW7QBUj1+qHr1pGmhCojyPIX7QHvzakAAAA=)**\]**
70+
6371
```kusto
6472
let min_peak_t=datetime(2016-08-23 15:00);
6573
let max_peak_t=datetime(2016-08-23 15:02);
@@ -95,6 +103,8 @@ demo_clustering1
95103

96104
Even though there are less than a thousand exceptions, it’s still hard to find common segments, as there are multiple values in each column. You can use [`autocluster()`](/azure/kusto/query/autoclusterplugin) plugin to instantly extract a small list of common segments and find the interesting clusters within the spike's two minutes as seen in the following query:
97105

106+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA4WOsQrCMBRF937FG5OhJYkoovQfBN1DbC8aTNqSvlgHP94IQkf3c+65AUzRD3aCe1hue8dgHyGM0rta7WuzIb09KCWPVfii7vUPNQXtEUfbhTwzkh9uunrTckcCnRI6P+NSvDO7ONEVvACDWD80zRqRRcTThVxa5DKPv00hP81KL1+4AAAA)**\]**
107+
98108
```kusto
99109
let min_peak_t=datetime(2016-08-23 15:00);
100110
let max_peak_t=datetime(2016-08-23 15:02);
@@ -119,6 +129,8 @@ Autocluster uses a proprietary algorithm for mining multiple dimensions and extr
119129

120130
You can also use the [`basket()`](/azure/kusto/query/basketplugin) plugin as seen in the following query:
121131

132+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA4WOsQ6CMBgGd57iH9sB0tZojMZ3MNG9KfBFG1og7Y84+PDWidH9LncBTNGPdoYbLF96x2AfIYzSh1oda7MjvT8pJc9V+KHu/Q81Be0RJ9uFJTOSHx+6+tD6RAJdEzqfcS/ejV2cqQWvwCi2h6bZIrKIeLmwlBa1Lg9gIb9KJv2TswAAAA==)**\]**
133+
122134
```kusto
123135
let min_peak_t=datetime(2016-08-23 15:00);
124136
let max_peak_t=datetime(2016-08-23 15:02);
@@ -145,14 +157,16 @@ demo_clustering1
145157

146158
Basket implements the Apriori algorithm for item set mining and extracts all segments whose coverage of the record set is above a threshold (default 5%). You can see that more segments were extracted with similar ones (for example, segments 0,1 or 2,3).
147159

148-
Both plugins are powerful and easy to use, but their significant limitation is due to the fact that they cluster a single record set in an unsupervised manner (with no labels). It's therefore unclear whether the extracted patterns characterize the selected record set (the anomalous records) or the global record set.
160+
Both plugins are powerful and easy to use, but their significant limitation is that they cluster a single record set in an unsupervised manner (with no labels). It's therefore unclear whether the extracted patterns characterize the selected record set (the anomalous records) or the global record set.
149161

150162
## Clustering the difference between two records sets
151163

152164
The [`diffpatterns()`](/azure/kusto/query/diffpatternsplugin) plugin overcomes the limitation of `autocluster` and `basket`. `Diffpatterns` takes two record sets and extracts the main segments that are different between them. One set usually contains the anomalous record set being investigated (one analyzed by `autocluster` and `basket`). The other set contains the reference record set (baseline).
153165

154166
In the query below, we use `diffpatterns` to find interesting clusters within the spike's two minutes, which are different than clusters within the baseline. We define the baseline window as the eight minutes before 15:00 (when the spike started). We also need to extend by a binary column (AB) specifying whether a specific record belongs to the baseline or to the anomalous set. `Diffpatterns` implements a supervised learning algorithm, where the two class labels were generated by the anomalous versus the baseline flag (AB).
155167

168+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA42QzU+DQBDF7/wVcwOi5UtrmhJM4OzBRO9kWqbtpssuYacfGv94t0CrxFTd02by5jfvPUkMtVBlQ7gtOauQiUVNXhLFD5NoNknuIJ7Oo8hPHXmS4vEvaXKWWuoCDUmh6Jr8fj79Tv6HfOanEIbwRLgnQFhjAwviA5EC3hCcCYCq6gamEVsC1oB7LfoRt6iMYKEVvGtFQXfeNFKc7mXe2MjNVzl+mARR6lRU63Ipd4apFWodOx9w2FBL4D23tBSGXi3mhbG+OPPGVQTB+ITvg24dGN7vlN5JTxhc+dYAHZls4LzIxGr1k/B4iXcLbq50jfLNtd9i8OB2jD3KnW0dKstokG08Zby8uLbyCfX/tG46AgAA)**\]**
169+
156170
```kusto
157171
let min_peak_t=datetime(2016-08-23 15:00);
158172
let max_peak_t=datetime(2016-08-23 15:02);
@@ -178,6 +192,8 @@ demo_clustering1
178192

179193
The most dominant segment is the same segment that was extracted by `autocluster`, its coverage on the two-minute anomalous window is also 65.74%. But its coverage on the eight-minute baseline window is only 1.7%. The difference is 64.04%. This difference seems to be related to the anomalous spike. You can verify this assumption by splitting the original chart into the records belonging to this problematic segment versus the other segments as seen in the query below:
180194

195+
**\[**[**Click to run query**](https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA5WRsWrDMBCG9zzF4cmGGuJUjh2Ktw7tUkLTzuEsnRNRnRQkuSQlD185yRTo0EWIO913/J8MRWBttxE6iC5INOhzRey20owhktd2V8EZwsiMXv/Q9Dpfe5I60Idm2kTkQ1E8AczMxMLjf1h4/IN1PzY7Ax0jWQWBdomvhyF/p512FroOMsIxA0zdTdpKn1bHSzmMzbX8TAfjTkw2vqpLp69VpYQaatEogXOBsqrbtl5WDake6yabXWjkv7WkFxeuPGqG5VzWqhQrIUqx6B/L1WKB6aBViy01imT2ANnau94QT9c35xlNVqQAjF9UhpSHAtiRO+lGG/MCUoZ7CTB4x7ePie5mNbk4QDVn6E+ThUT0SQh5iGlM7tHHX4WFgLHOAQAA)**\]**
196+
181197
```kusto
182198
let min_t = toscalar(demo_clustering1 | summarize min(PreciseTimeStamp));
183199
let max_t = toscalar(demo_clustering1 | summarize max(PreciseTimeStamp));

0 commit comments

Comments
 (0)