Skip to content

Commit 6c37e59

Browse files
authored
Merge pull request #216668 from mrbullwinkle/mrb_10_31_2022_prepare_data
[Cognitive Services] [Anomaly Detector] Prepare data
2 parents 861af83 + 1578567 commit 6c37e59

File tree

10 files changed

+109
-0
lines changed

10 files changed

+109
-0
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
title: Prepare your data and upload to Storage Account
3+
titleSuffix: Azure Cognitive Services
4+
description: Prepare your data and upload to Storage Account
5+
services: cognitive-services
6+
author: mrbullwinkle
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: anomaly-detector
10+
ms.topic: conceptual
11+
ms.date: 11/01/2022
12+
ms.author: mbullwin
13+
---
14+
15+
16+
# Prepare your data and upload to Storage Account
17+
18+
Multivariate Anomaly Detection requires training to process your data, and an Azure Storage Account to store your data for further training and inference steps.
19+
20+
## Data preparation
21+
22+
First you need to prepare your data for training and inference.
23+
24+
### Input data schema
25+
26+
Multivariate Anomaly Detection supports two types of data schemas: **OneTable** and **MultiTable**. You could use either of these schemas to prepare your data and upload to Storage Account for further training and inference.
27+
28+
:::image type="content" source="../media/prepare-data/two-schemas.png" alt-text="Diagram of two data schemas with three steps: data preparation, training, inference." lightbox="../media/prepare-data/two-schemas.png":::
29+
30+
#### Schema 1: OneTable
31+
**OneTable** is one CSV file that contains all the variables that you want to train a Multivariate Anomaly Detection model and one `timestamp` column. Download [One Table sample data](https://mvaddataset.blob.core.windows.net/public-sample-data/sample_data_5_3000.csv)
32+
* The `timestamp` values should conform to *ISO 8601*; the values of other variables in other columns could be *integers* or *decimals* with any number of decimal places.
33+
34+
* Variables for training and variables for inference should be consistent. For example, if you're using `series_1`, `series_2`, `series_3`, `series_4`, and `series_5` for training, you should provide exactly the same variables for inference.
35+
36+
***Example:***
37+
38+
![Diagram of one table schema.](../media/prepare-data/onetable-schema.png)
39+
40+
#### Schema 2: MultiTable
41+
42+
**MultiTable** is multiple CSV files in one file folder, and each CSV file contains only two columns of one variable, with the exact column names of: **timestamp** and **value**. Download [Multiple Tables sample data](https://mvaddataset.blob.core.windows.net/public-sample-data/sample_data_5_3000.zip) and unzip it.
43+
44+
* The `timestamp` values should conform to *ISO 8601*; the `value` could be *integers* or *decimals* with any number of decimal places.
45+
46+
* The name of the csv file will be used as the variable name and should be unique. For example, *temperature.csv* and *humidity.csv*.
47+
48+
* Variables for training and variables for inference should be consistent. For example, if you're using `series_1`, `series_2`, `series_3`, `series_4`, and `series_5` for training, you should provide exactly the same variables for inference.
49+
50+
***Example:***
51+
52+
> [!div class="mx-imgBorder"]
53+
> ![Diagram of multi table schema.](../media/prepare-data/multitable.png)
54+
55+
> [!NOTE]
56+
> If your timestamps have hours, minutes, and/or seconds, ensure that they're properly rounded up before calling the APIs.
57+
> For example, if your data frequency is supposed to be one data point every 30 seconds, but you're seeing timestamps like "12:00:01" and "12:00:28", it's a strong signal that you should pre-process the timestamps to new values like "12:00:00" and "12:00:30".
58+
> For details, please refer to the ["Timestamp round-up" section](../concepts/best-practices-multivariate.md#timestamp-round-up) in the best practices document.
59+
60+
## Upload your data to Storage Account
61+
62+
Once you prepare your data with either of the two schemas above, you could upload your CSV file (OneTable) or your data folder (MultiTable) to your Storage Account.
63+
64+
1. [Create a Storage Account](https://portal.azure.com/#create/Microsoft.StorageAccount-ARM), fill out the fields, which are similar to the steps when creating Anomaly Detector resource.
65+
66+
> [!div class="mx-imgBorder"]
67+
> ![Screenshot of Azure Storage account setup page.](../media/prepare-data/create-blob.png)
68+
69+
2. Select **Container** to the left in your Storage Account resource and select **+Container** to create one that will store your data.
70+
71+
3. Upload your data to the container.
72+
73+
**Upload *OneTable* data**
74+
75+
Go to the container that you created, and select **Upload**, then choose your prepared CSV file and upload.
76+
77+
Once your data is uploaded, select your CSV file and copy the **blob URL** through the small blue button. (Please paste the URL somewhere convenient for further steps.)
78+
79+
> [!div class="mx-imgBorder"]
80+
> ![Screenshot of copy blob url for one table.](../media/prepare-data/onetable-copy-url.png)
81+
82+
**Upload *MultiTable* data**
83+
84+
Go to the container that you created, and select **Upload**, then select **Advanced**, and initiate a folder name in **Upload to folder**, and select all the variables in separate CSV files and upload.
85+
86+
Once your data is uploaded, go into the folder, and select one CSV file in the folder, copy the **blob URL** and only keep the part before the name of this CSV file, so the final blob URL should ***link to the folder***. (Please paste the URL somewhere convenient for further steps.)
87+
88+
> [!div class="mx-imgBorder"]
89+
> ![Screenshot of copy blob url for multi table.](../media/prepare-data/multitable-copy-url.png)
90+
91+
4. Grant Anomaly Detector access to read the data in your Storage Account.
92+
* In your container, select **Access Control(IAM)** to the left, select **+ Add** to **Add role assignment**. If you see the add role assignment is disabled, please contact your Storage Account owner to add Owner role to your Container.
93+
94+
> [!div class="mx-imgBorder"]
95+
> ![Screenshot of set access control UI.](../media/prepare-data/add-role-assignment.png)
96+
97+
* Search role of **Storage Blob Data Reader**, **click on it** and then select **Next**. Technically, the roles highlighted below and the *Owner* role all should work.
98+
99+
> [!div class="mx-imgBorder"]
100+
> ![Screenshot of add role assignment with reader roles selected.](../media/prepare-data/add-reader-role.png)
101+
102+
* Select assign access to **Managed identity**, and **Select Members**, then choose the anomaly detector resource that you created earlier, then select **Review + assign**.
103+
104+
## Next steps
105+
106+
* [Train a multivariate anomaly detection model](train-model.md)
107+
* [Best practices of multivariate anomaly detection](../concepts/best-practices-multivariate.md)
167 KB
Loading
210 KB
Loading
65.9 KB
Loading
230 KB
Loading
15.7 KB
Loading
226 KB
Loading
9.01 KB
Loading
837 KB
Loading

articles/cognitive-services/Anomaly-Detector/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@
5454
items:
5555
- name: Create an Anomaly Detector Resource
5656
href: how-to/create-resource.md
57+
- name: Prepare and upload your data
58+
href: how-to/prepare-data.md
5759
- name: Train a model
5860
href: how-to/train-model.md
5961
- name: Batch inference

0 commit comments

Comments
 (0)