|
| 1 | +--- |
| 2 | +title: Prepare your data and upload to Storage Account |
| 3 | +titleSuffix: Azure Cognitive Services |
| 4 | +description: Prepare your data and upload to Storage Account |
| 5 | +services: cognitive-services |
| 6 | +author: mrbullwinkle |
| 7 | +manager: nitinme |
| 8 | +ms.service: cognitive-services |
| 9 | +ms.subservice: anomaly-detector |
| 10 | +ms.topic: conceptual |
| 11 | +ms.date: 11/01/2022 |
| 12 | +ms.author: mbullwin |
| 13 | +--- |
| 14 | + |
| 15 | + |
| 16 | +# Prepare your data and upload to Storage Account |
| 17 | + |
| 18 | +Multivariate Anomaly Detection requires training to process your data, and an Azure Storage Account to store your data for further training and inference steps. |
| 19 | + |
| 20 | +## Data preparation |
| 21 | + |
| 22 | +First you need to prepare your data for training and inference. |
| 23 | + |
| 24 | +### Input data schema |
| 25 | + |
| 26 | +Multivariate Anomaly Detection supports two types of data schemas: **OneTable** and **MultiTable**. You could use either of these schemas to prepare your data and upload to Storage Account for further training and inference. |
| 27 | + |
| 28 | +:::image type="content" source="../media/prepare-data/two-schemas.png" alt-text="Diagram of two data schemas with three steps: data preparation, training, inference." lightbox="../media/prepare-data/two-schemas.png"::: |
| 29 | + |
| 30 | +#### Schema 1: OneTable |
| 31 | +**OneTable** is one CSV file that contains all the variables that you want to train a Multivariate Anomaly Detection model and one `timestamp` column. Download [One Table sample data](https://mvaddataset.blob.core.windows.net/public-sample-data/sample_data_5_3000.csv) |
| 32 | +* The `timestamp` values should conform to *ISO 8601*; the values of other variables in other columns could be *integers* or *decimals* with any number of decimal places. |
| 33 | + |
| 34 | +* Variables for training and variables for inference should be consistent. For example, if you're using `series_1`, `series_2`, `series_3`, `series_4`, and `series_5` for training, you should provide exactly the same variables for inference. |
| 35 | + |
| 36 | + ***Example:*** |
| 37 | + |
| 38 | + |
| 39 | + |
| 40 | +#### Schema 2: MultiTable |
| 41 | + |
| 42 | +**MultiTable** is multiple CSV files in one file folder, and each CSV file contains only two columns of one variable, with the exact column names of: **timestamp** and **value**. Download [Multiple Tables sample data](https://mvaddataset.blob.core.windows.net/public-sample-data/sample_data_5_3000.zip) and unzip it. |
| 43 | + |
| 44 | +* The `timestamp` values should conform to *ISO 8601*; the `value` could be *integers* or *decimals* with any number of decimal places. |
| 45 | + |
| 46 | +* The name of the csv file will be used as the variable name and should be unique. For example, *temperature.csv* and *humidity.csv*. |
| 47 | + |
| 48 | +* Variables for training and variables for inference should be consistent. For example, if you're using `series_1`, `series_2`, `series_3`, `series_4`, and `series_5` for training, you should provide exactly the same variables for inference. |
| 49 | + |
| 50 | + ***Example:*** |
| 51 | + |
| 52 | +> [!div class="mx-imgBorder"] |
| 53 | +>  |
| 54 | +
|
| 55 | +> [!NOTE] |
| 56 | +> If your timestamps have hours, minutes, and/or seconds, ensure that they're properly rounded up before calling the APIs. |
| 57 | +> For example, if your data frequency is supposed to be one data point every 30 seconds, but you're seeing timestamps like "12:00:01" and "12:00:28", it's a strong signal that you should pre-process the timestamps to new values like "12:00:00" and "12:00:30". |
| 58 | +> For details, please refer to the ["Timestamp round-up" section](../concepts/best-practices-multivariate.md#timestamp-round-up) in the best practices document. |
| 59 | +
|
| 60 | +## Upload your data to Storage Account |
| 61 | + |
| 62 | +Once you prepare your data with either of the two schemas above, you could upload your CSV file (OneTable) or your data folder (MultiTable) to your Storage Account. |
| 63 | + |
| 64 | +1. [Create a Storage Account](https://portal.azure.com/#create/Microsoft.StorageAccount-ARM), fill out the fields, which are similar to the steps when creating Anomaly Detector resource. |
| 65 | + |
| 66 | + > [!div class="mx-imgBorder"] |
| 67 | + >  |
| 68 | +
|
| 69 | +2. Select **Container** to the left in your Storage Account resource and select **+Container** to create one that will store your data. |
| 70 | + |
| 71 | +3. Upload your data to the container. |
| 72 | + |
| 73 | + **Upload *OneTable* data** |
| 74 | + |
| 75 | + Go to the container that you created, and select **Upload**, then choose your prepared CSV file and upload. |
| 76 | + |
| 77 | + Once your data is uploaded, select your CSV file and copy the **blob URL** through the small blue button. (Please paste the URL somewhere convenient for further steps.) |
| 78 | + |
| 79 | + > [!div class="mx-imgBorder"] |
| 80 | + >  |
| 81 | +
|
| 82 | + **Upload *MultiTable* data** |
| 83 | + |
| 84 | + Go to the container that you created, and select **Upload**, then select **Advanced**, and initiate a folder name in **Upload to folder**, and select all the variables in separate CSV files and upload. |
| 85 | + |
| 86 | + Once your data is uploaded, go into the folder, and select one CSV file in the folder, copy the **blob URL** and only keep the part before the name of this CSV file, so the final blob URL should ***link to the folder***. (Please paste the URL somewhere convenient for further steps.) |
| 87 | + |
| 88 | + > [!div class="mx-imgBorder"] |
| 89 | + >  |
| 90 | +
|
| 91 | +4. Grant Anomaly Detector access to read the data in your Storage Account. |
| 92 | + * In your container, select **Access Control(IAM)** to the left, select **+ Add** to **Add role assignment**. If you see the add role assignment is disabled, please contact your Storage Account owner to add Owner role to your Container. |
| 93 | + |
| 94 | + > [!div class="mx-imgBorder"] |
| 95 | + >  |
| 96 | +
|
| 97 | + * Search role of **Storage Blob Data Reader**, **click on it** and then select **Next**. Technically, the roles highlighted below and the *Owner* role all should work. |
| 98 | + |
| 99 | + > [!div class="mx-imgBorder"] |
| 100 | + >  |
| 101 | +
|
| 102 | + * Select assign access to **Managed identity**, and **Select Members**, then choose the anomaly detector resource that you created earlier, then select **Review + assign**. |
| 103 | + |
| 104 | +## Next steps |
| 105 | + |
| 106 | +* [Train a multivariate anomaly detection model](train-model.md) |
| 107 | +* [Best practices of multivariate anomaly detection](../concepts/best-practices-multivariate.md) |
0 commit comments