|
| 1 | +--- |
| 2 | +title: "Tutorial: Use Univariate Anomaly Detector in Azure Data Explorer" |
| 3 | +titleSuffix: Azure Cognitive Services |
| 4 | +description: Learn how to use the Univariate Anomaly Detector with Azure Data Explorer. |
| 5 | +services: cognitive-services |
| 6 | +author: jr-MS |
| 7 | +manager: nitinme |
| 8 | +ms.service: cognitive-services |
| 9 | +ms.subservice: anomaly-detector |
| 10 | +ms.topic: tutorial |
| 11 | +ms.date: 12/19/2022 |
| 12 | +ms.author: mbullwin |
| 13 | +--- |
| 14 | + |
| 15 | +# Tutorial: Use Univariate Anomaly Detector in Azure Data Explorer |
| 16 | + |
| 17 | +## Introduction |
| 18 | + |
| 19 | +The [Anomaly Detector API](/azure/cognitive-services/anomaly-detector/overview-multivariate) enables you to check and detect abnormalities in your time series data without having to know machine learning. The Anomaly Detector API's algorithms adapt by automatically finding and applying the best-fitting models to your data, regardless of industry, scenario, or data volume. Using your time series data, the API decides boundaries for anomaly detection, expected values, and which data points are anomalies. |
| 20 | + |
| 21 | +[Azure Data Explorer](/azure/data-explorer/data-explorer-overview) is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real-time. The Azure Data Explorer toolbox gives you an end-to-end solution for data ingestion, query, visualization, and management. |
| 22 | + |
| 23 | +## Anomaly Detection functions in Azure Data Explorer |
| 24 | + |
| 25 | +### Function 1: series_uv_anomalies_fl() |
| 26 | + |
| 27 | +The function **[series_uv_anomalies_fl()](/azure/data-explorer/kusto/functions-library/series-uv-anomalies-fl?tabs=adhoc)** detects anomalies in time series by calling the [Univariate Anomaly Detection API](/azure/cognitive-services/anomaly-detector/overview). The function accepts a limited set of time series as numerical dynamic arrays and the required anomaly detection sensitivity level. Each time series is converted into the required JSON (JavaScript Object Notation) format and posts it to the Anomaly Detector service endpoint. The service response has dynamic arrays of high/low/all anomalies, the modeled baseline time series, its normal high/low boundaries (a value above or below the high/low boundary is an anomaly) and the detected seasonality. |
| 28 | + |
| 29 | +### Function 2: series_uv_change_points_fl() |
| 30 | + |
| 31 | +The function **[series_uv_change_points_fl()](/azure/data-explorer/kusto/functions-library/series-uv-change-points-fl?tabs=adhoc)** finds change points in time series by calling the Univariate Anomaly Detection API. The function accepts a limited set of time series as numerical dynamic arrays, the change point detection threshold, and the minimum size of the stable trend window. Each time series is converted into the required JSON format and posts it to the Anomaly Detector service endpoint. The service response has dynamic arrays of change points, their respective confidence, and the detected seasonality. |
| 32 | + |
| 33 | +These two functions are user-defined [tabular functions](/azure/data-explorer/kusto/query/functions/user-defined-functions#tabular-function) applied using the [invoke operator](/azure/data-explorer/kusto/query/invokeoperator). You can either embed its code in your query or you can define it as a stored function in your database. |
| 34 | + |
| 35 | +## Where to use these new capabilities? |
| 36 | + |
| 37 | +These two functions are available to use either in Azure Data Explorer website or in the Kusto Explorer application. |
| 38 | + |
| 39 | + |
| 40 | + |
| 41 | +## Create resources |
| 42 | + |
| 43 | +1. [Create an Azure Data Explorer Cluster](https://portal.azure.com/#create/Microsoft.AzureKusto) in the Azure portal, after the resource is created successfully, go to the resource and create a database. |
| 44 | +2. [Create an Anomaly Detector](https://portal.azure.com/#create/Microsoft.CognitiveServicesAnomalyDetector) resource in the Azure portal and check the keys and endpoints that you’ll need later. |
| 45 | +3. Enable plugins in Azure Data Explorer |
| 46 | + * These new functions have inline Python and require [enabling the python() plugin](/azure/data-explorer/kusto/query/pythonplugin#enable-the-plugin) on the cluster. |
| 47 | + * These new functions call the anomaly detection service endpoint and require: |
| 48 | + * Enable the [http_request plugin / http_request_post plugin](/azure/data-explorer/kusto/query/http-request-plugin) on the cluster. |
| 49 | + * Modify the [callout policy](/azure/data-explorer/kusto/management/calloutpolicy) for type `webapi` to allow accessing the service endpoint. |
| 50 | + |
| 51 | +## Download sample data |
| 52 | + |
| 53 | +This quickstart uses the `request-data.csv` file that can be downloaded from our [GitHub sample data](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/anomalydetector/azure-ai-anomalydetector/samples/sample_data/request-data.csv) |
| 54 | + |
| 55 | + You can also download the sample data by running: |
| 56 | + |
| 57 | +```cmd |
| 58 | +curl "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/anomalydetector/azure-ai-anomalydetector/samples/sample_data/request-data.csv" --output request-data.csv |
| 59 | +``` |
| 60 | + |
| 61 | +Then ingest the sample data to Azure Data Explorer by following the [ingestion guide](/azure/data-explorer/ingest-sample-data?tabs=ingestion-wizard). Name the new table for the ingested data **univariate**. |
| 62 | + |
| 63 | +Once ingested, your data should look as follows: |
| 64 | + |
| 65 | +:::image type="content" source="../media/data-explorer/project.png" alt-text="Screenshot of Kusto query with sample data." lightbox="../media/data-explorer/project.png"::: |
| 66 | + |
| 67 | +## Detect anomalies in an entire time series |
| 68 | + |
| 69 | +In Azure Data Explorer, run the following query to make an anomaly detection chart with your onboarded data. You could also [create a function](/azure/data-explorer/kusto/functions-library/series-uv-change-points-fl?tabs=persistent) to add the code to a stored function for persistent usage. |
| 70 | + |
| 71 | +```kusto |
| 72 | +let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid') |
| 73 | +{ |
| 74 | + let uri = '[Your-Endpoint]anomalydetector/v1.0/timeseries/entire/detect'; |
| 75 | + let headers=dynamic({'Ocp-Apim-Subscription-Key': h'[Your-key]'}); |
| 76 | + let kwargs = pack('y_series', y_series, 'sensitivity', sensitivity); |
| 77 | + let code = ```if 1: |
| 78 | + import json |
| 79 | + y_series = kargs["y_series"] |
| 80 | + sensitivity = kargs["sensitivity"] |
| 81 | + json_str = [] |
| 82 | + for i in range(len(df)): |
| 83 | + row = df.iloc[i, :] |
| 84 | + ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))] |
| 85 | + json_data = {'series': ts, "sensitivity":sensitivity} # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies |
| 86 | + json_str = json_str + [json.dumps(json_data)] |
| 87 | + result = df |
| 88 | + result['json_str'] = json_str |
| 89 | + ```; |
| 90 | + tbl |
| 91 | + | evaluate python(typeof(*, json_str:string), code, kwargs) |
| 92 | + | extend _tsid = column_ifexists(tsid, 1) |
| 93 | + | partition by _tsid ( |
| 94 | + project json_str |
| 95 | + | evaluate http_request_post(uri, headers, dynamic(null)) |
| 96 | + | project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly) |
| 97 | + , neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins) |
| 98 | + | extend _tsid=toscalar(_tsid) |
| 99 | + ) |
| 100 | +} |
| 101 | +; |
| 102 | +let stime=datetime(2018-03-01); |
| 103 | +let etime=datetime(2018-04-16); |
| 104 | +let dt=1d; |
| 105 | +let ts = univariate |
| 106 | +| make-series value=avg(Column2) on Column1 from stime to etime step dt |
| 107 | +| extend _tsid='TS1'; |
| 108 | +ts |
| 109 | +| invoke series_uv_anomalies_fl('value') |
| 110 | +| lookup ts on _tsid |
| 111 | +| render anomalychart with(xcolumn=Column1, ycolumns=value, anomalycolumns=ad_ama) |
| 112 | +``` |
| 113 | + |
| 114 | +After you run the code, you'll render a chart like this: |
| 115 | + |
| 116 | +:::image type="content" source="../media/data-explorer/anomaly.png" alt-text="Screenshot of line chart of anomalies." lightbox="../media/data-explorer/anomaly.png"::: |
| 117 | + |
| 118 | +## Next steps |
| 119 | + |
| 120 | +* [Best practices of Univariate Anomaly Detection](../concepts/anomaly-detection-best-practices.md) |
0 commit comments