|
| 1 | +--- |
| 2 | +title: Azure Key Vault monitoring and alerting | Microsoft Docs |
| 3 | +description: Create a dashboard to monitor the health of your key vault and configure alerts. |
| 4 | +services: key-vault |
| 5 | +author: ShaneBala-keyvault |
| 6 | +manager: ravijan |
| 7 | +tags: azure-resource-manager |
| 8 | + |
| 9 | +ms.service: key-vault |
| 10 | +ms.subservice: general |
| 11 | +ms.topic: conceptual |
| 12 | +ms.date: 04/06/2020 |
| 13 | +ms.author: sudbalas |
| 14 | +Customer intent: As a key vault administrator, I want to learn the options available to monitor the health of my vaults |
| 15 | +--- |
| 16 | + |
| 17 | + |
| 18 | +# Monitoring and alerting for Azure Key Vault |
| 19 | + |
| 20 | +## Overview |
| 21 | + |
| 22 | +Once you have started to use key vault to store your production secrets, it is important to monitor the health of your key vault to make sure your service operates as intended. As you start to scale your service the number of requests sent to your key vault will rise. This has a potential to increase the latency of your requests and in extreme cases, cause your requests to be throttled which will impact the performance of your service. You also need to be alerted if your key vault is sending an unusual number of error codes, so you can be quickly notified of any access policy or firewall configuration issues. |
| 23 | +This document will cover the following topics: |
| 24 | + |
| 25 | ++ Basic Key Vault metrics to monitor |
| 26 | ++ How to configure metrics and create a dashboard |
| 27 | ++ How to create alerts at specified thresholds |
| 28 | + |
| 29 | +## Basic Key Vault metrics to monitor |
| 30 | + |
| 31 | ++ Vault Availability |
| 32 | ++ Vault Saturation |
| 33 | ++ Service API Latency |
| 34 | ++ Total Service API Hits (Filter by Activity Type) |
| 35 | ++ Error Codes (Filter by Status Code) |
| 36 | + |
| 37 | +**Vault Availability** - This metric should always be at 100% this is an important metric to monitor, since it can quickly show you if your key vault experienced an outage. |
| 38 | + |
| 39 | +**Vault Saturation** – The number of requests per second that a key vault can serve is based on the type of operation being performed. Some vault operations have a lower requests-per-second threshold. This metric aggregates the total usage of your key vault across all operation types to come up with a percentage value that indicates your current key vault usage. For a full list of key vault service limits, see the following document. [Azure Key Vault Service Limits](service-limits.md) |
| 40 | + |
| 41 | +**Service API Latency** - This metric shows the average latency of a call to key vault. Although your key vault may be within service limits, a high utilization of key vault could introduce latency that causes applications downstream to fail. |
| 42 | + |
| 43 | +**Total API Hits** - This metric shows all of the calls made to your key vault. This will help you identify which applications are calling your key vault. |
| 44 | + |
| 45 | +**Error Codes** – This metric will show you if your key vault is experiencing an unusual amount of errors. For a full list of error codes and troubleshooting guidance, see the following document. [Azure Key Vault REST API Error Codes](rest-error-codes.md) |
| 46 | + |
| 47 | +## How to configure metrics and create a dashboard |
| 48 | + |
| 49 | +1. Login to the Azure portal |
| 50 | +2. Navigate to your Key Vault |
| 51 | +3. Select **Metrics** under **Monitoring** |
| 52 | + |
| 53 | +> [!div class="mx-imgBorder"] |
| 54 | +>  |
| 55 | +
|
| 56 | +4. Update the title of the chart to what you want to see on your dashboard. |
| 57 | +5. Select the scope. In this example we will select a single key vault. |
| 58 | +6. Select the Metric **Overall Vault Availability** and Aggregation **Avg** |
| 59 | +7. Update the time range to the Last 24 Hours and update the time granularity to 1 minute. |
| 60 | + |
| 61 | +> [!div class="mx-imgBorder"] |
| 62 | +>  |
| 63 | +
|
| 64 | +8. Repeat the steps above for the Vault Saturation and Service API Latency metrics. Select **Pin to Dashboard** to save your metrics into a dashboard. |
| 65 | + |
| 66 | +> [!IMPORTANT] |
| 67 | +> Select "Pin to Dashboard" and save every metric you configure. If you leave the page and return to it without saving, your configuration changes will be lost. |
| 68 | +
|
| 69 | +9. To monitor all of the types of operations on the key vault, use the **Total Service API Hits** Metric, and Select **Apply Splitting by Activity Type** |
| 70 | + |
| 71 | +> [!div class="mx-imgBorder"] |
| 72 | +>  |
| 73 | +
|
| 74 | +10. To monitor for error codes on the key vault, use the **Total Service API Results** Metric, and Select **Apply Splitting by Activity Type** |
| 75 | + |
| 76 | +> [!div class="mx-imgBorder"] |
| 77 | +>  |
| 78 | +
|
| 79 | +Now you will have a dashboard that looks like this. You can click the 3 dots on the top right of each tile and you can rearrange and resize the tiles as you need. |
| 80 | + |
| 81 | +Once you save and publish the dashboard, it will create a new resource in your Azure subscription. You will be able to see it at anytime by searching for "shared dashboard". |
| 82 | + |
| 83 | +> [!div class="mx-imgBorder"] |
| 84 | +>  |
| 85 | +
|
| 86 | +## How to configure alerts on your Key Vault |
| 87 | + |
| 88 | +This section will show you how to configure alerts on your key vault so you can alert your team to take action immediately if your key vault is in an unhealthy state. You can configure alerts that send an email, preferably to a team DL, fire an event grid notification, or call or text a phone number. You can also choose static alerts based on a fixed value, or a dynamic alert that will alert you if a monitored metric exceeds the average limit of your key vault a certain number of times within a defined time range. |
| 89 | + |
| 90 | +> [!IMPORTANT] |
| 91 | +> Please note it can take up to 10 minutes for newly configured alerts to start sending notifications. |
| 92 | +
|
| 93 | +### Configure an action group |
| 94 | + |
| 95 | +An action group is a configurable list of notifications and properties. |
| 96 | + |
| 97 | +1. Login to the Azure portal |
| 98 | +2. Search for **Alerts** in the search box |
| 99 | +3. Select **Manage Actions** |
| 100 | + |
| 101 | +> [!div class="mx-imgBorder"] |
| 102 | +>  |
| 103 | +
|
| 104 | +4. Select **+ Add Action Group** |
| 105 | + |
| 106 | +> [!div class="mx-imgBorder"] |
| 107 | +>  |
| 108 | +
|
| 109 | +5. Choose the **Action Type** for your Action Group. In this example, we will create an email alert. |
| 110 | + |
| 111 | +> [!div class="mx-imgBorder"] |
| 112 | +>  |
| 113 | +
|
| 114 | +> [!div class="mx-imgBorder"] |
| 115 | +>  |
| 116 | +
|
| 117 | +6. Click **OK** at the bottom of the page. You have successfully created an action group. |
| 118 | + |
| 119 | +Now that you have configured an action group, we will configure the the key vault alert thresholds. |
| 120 | + |
| 121 | +### Configure alert thresholds |
| 122 | + |
| 123 | +1. Select your key vault resource in the Azure portal and select **Alerts** under **Monitoring** |
| 124 | + |
| 125 | +> [!div class="mx-imgBorder"] |
| 126 | +>  |
| 127 | +
|
| 128 | +2. Select **New Alert Rule** |
| 129 | + |
| 130 | +> [!div class="mx-imgBorder"] |
| 131 | +>  |
| 132 | +
|
| 133 | +3. Select the scope of your alert rule. You can select a single vault or multiple. |
| 134 | + |
| 135 | +> [!IMPORTANT] |
| 136 | +> Please note that when you are selecting multiple vaults for the scope of your alerts, all selected vaults must be in the same region. You will have to configure separate alert rules for vaults in different regions. |
| 137 | +
|
| 138 | +> [!div class="mx-imgBorder"] |
| 139 | +>  |
| 140 | +
|
| 141 | +4. Select the conditions for your alerts. You can choose any of the following signals and define your logic for alerting. The Key Vault team recommends configuring the following alerting thresholds. |
| 142 | + |
| 143 | + + Key Vault Availability drops below 100% (Static Threshold) |
| 144 | + + Key Vault Latency is greater than 500ms (Static Threshold) |
| 145 | + + Overall Vault Saturation is greater than 75% (Static Threshold) |
| 146 | + + Overall Vault Saturation exceeds average (Dynamic Threshold) |
| 147 | + + Total Error Codes higher than average (Dynamic Threshold) |
| 148 | + |
| 149 | +> [!div class="mx-imgBorder"] |
| 150 | +>  |
| 151 | +
|
| 152 | +### Example 1: Configuring a static alert threshold for latency |
| 153 | + |
| 154 | +Select **Overall Service API Latency** as the signal name |
| 155 | +> [!div class="mx-imgBorder"] |
| 156 | +>  |
| 157 | +
|
| 158 | +Please see the following configuration parameters. |
| 159 | + |
| 160 | ++ Set the Threshold to **Static** |
| 161 | ++ Set the Operator to **Greater Than** |
| 162 | ++ Set the Aggregation Type to **Average** |
| 163 | ++ Set the Threshold Value to **500** |
| 164 | ++ Set Aggregation Period to **5 minutes** |
| 165 | ++ Set the Evaluation Frequency to **1 minute** |
| 166 | ++ Select **Done** |
| 167 | + |
| 168 | +> [!div class="mx-imgBorder"] |
| 169 | +>  |
| 170 | +
|
| 171 | +### Example 2: Configuring a dynamic alert threshold for vault saturation |
| 172 | + |
| 173 | +When you use a dynamic alert, you will be able to see historical data of the key vault you have selected. The blue area represents the average usage of your key vault. The red area shows spikes that would have triggered an alert provided other criteria in the alert configuration are met. The red dots show instances of violations where the criteria for the alert was met during the aggregated time window. You can set an alert to fire after a certain number of violations within a set time. If you don't want to include past data, there is an option to exclude old data below in advanced settings. |
| 174 | + |
| 175 | +> [!div class="mx-imgBorder"] |
| 176 | +>  |
| 177 | +
|
| 178 | +Please see the following configuration parameters. |
| 179 | + |
| 180 | ++ Set the Threshold to **Dynamic** |
| 181 | ++ Set the Operator to **Greater Than** |
| 182 | ++ Set the Aggregation Type to **Average** |
| 183 | ++ Set the Threshold Sensitivity to **Medium** |
| 184 | ++ Set Aggregation Period to **5 minutes** |
| 185 | ++ Set the Evaluation Frequency to **1 minute** |
| 186 | ++ **Optional** Configure Advanced Settings |
| 187 | ++ Select **Done** |
| 188 | + |
| 189 | +> [!div class="mx-imgBorder"] |
| 190 | +>  |
| 191 | +
|
| 192 | +5. Add the action group that you have configured |
| 193 | + |
| 194 | +> [!div class="mx-imgBorder"] |
| 195 | +>  |
| 196 | +
|
| 197 | +6. Enable the alert and assign a severity |
| 198 | + |
| 199 | +> [!div class="mx-imgBorder"] |
| 200 | +>  |
| 201 | +
|
| 202 | +7. Create the alert |
| 203 | + |
| 204 | + |
| 205 | +## Next steps |
| 206 | + |
| 207 | +Congratulations, you have now successfully created a monitoring dashboard and configured alerts for your key vault! |
| 208 | +Once you have followed all of the steps above, you should receive email alerts when your key vault meets the alert criteria you configured. An example is shown below. Use the tools you have set up in this article to actively monitor the health of your key vault. |
| 209 | + |
| 210 | +### Example email alert |
| 211 | + |
| 212 | +> [!div class="mx-imgBorder"] |
| 213 | +>  |
0 commit comments