Skip to content

Commit 29e61a0

Browse files
Merge pull request #297181 from wmgries/consistent-media-tiers
Clean up analyze-file-metrics
2 parents acf1dd1 + 9d6eb0b commit 29e61a0

File tree

1 file changed

+22
-18
lines changed

1 file changed

+22
-18
lines changed

articles/storage/files/analyze-files-metrics.md

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,18 @@ Understanding how to monitor file share performance is critical to ensuring that
1717
See [Monitor Azure Files](storage-files-monitoring.md) for details on the monitoring data you can collect for Azure Files and how to use it.
1818

1919
## Applies to
20-
| File share type | SMB | NFS |
21-
|-|:-:|:-:|
22-
| Standard file shares (GPv2), LRS/ZRS | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
23-
| Standard file shares (GPv2), GRS/GZRS | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
24-
| Premium file shares (FileStorage), LRS/ZRS | ![Yes](../media/icons/yes-icon.png) | ![Yes](../media/icons/yes-icon.png) |
20+
| Management model | Billing model | Media tier | Redundancy | SMB | NFS |
21+
|-|-|-|-|:-:|:-:|
22+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | Local (LRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
23+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | Zone (ZRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
24+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | Geo (GRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
25+
| Microsoft.Storage | Provisioned v2 | HDD (standard) | GeoZone (GZRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
26+
| Microsoft.Storage | Provisioned v1 | SSD (premium) | Local (LRS) | ![Yes](../media/icons/yes-icon.png) | ![Yes](../media/icons/yes-icon.png) |
27+
| Microsoft.Storage | Provisioned v1 | SSD (premium) | Zone (ZRS) | ![Yes](../media/icons/yes-icon.png) | ![Yes](../media/icons/yes-icon.png)|
28+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | Local (LRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
29+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | Zone (ZRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
30+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | Geo (GRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
31+
| Microsoft.Storage | Pay-as-you-go | HDD (standard) | GeoZone (GZRS) | ![Yes](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
2532

2633
## Supported metrics
2734

@@ -266,7 +273,7 @@ Now you can select a metric depending on what you want to monitor.
266273

267274
In Azure Monitor, the **Availability** metric can be useful when something is visibly wrong from either an application or user perspective, or when troubleshooting alerts.
268275

269-
When using this metric with Azure Files, its important to always view the aggregation as **Average** as opposed to **Max** or **Min**. Using **Average** will help you understand what percentage of your requests are experiencing errors, and if they are within the [SLA for Azure Files](https://azure.microsoft.com/support/legal/sla/storage/).
276+
When using this metric with Azure Files, it's important to always view the aggregation as **Average** as opposed to **Max** or **Min**. Using **Average** shows you what percentage of your requests are experiencing errors, and if they are within the [SLA for Azure Files](https://azure.microsoft.com/support/legal/sla/storage/).
270277

271278
:::image type="content" source="media/analyze-files-metrics/transaction-metrics-menu.png" alt-text="Screenshot showing the available transaction metrics in Azure Monitor." lightbox="media/analyze-files-metrics/transaction-metrics-menu.png":::
272279

@@ -276,23 +283,23 @@ The two most important latency metrics are **Success E2E Latency** and **Success
276283

277284
In the following charts, the blue line indicates how much time is spent in total latency (Success E2E Latency), and the pink line indicates time spent only in the Azure Files service (Success Server Latency).
278285

279-
This chart is an example of a client machine that has mounted an Azure file share from an on-premises environment. This will likely represent a typical user connecting from either an office, home, or other remote location. You'll see that the physical distance between the client and Azure region is closely correlated to the corresponding client-side latency, which represents the difference between the E2E and Server latency.
286+
This chart shows an on-premises client with a mounted Azure file share, representing, for example, a typical user connecting from a remote location. The physical distance between the client and Azure region is closely correlated to the corresponding client-side latency, which represents the difference between the E2E and Server latency.
280287

281288
:::image type="content" source="media/analyze-files-metrics/latency-remote.png" alt-text="Screenshot showing latency metrics with a remote user connecting to an Azure file share." lightbox="media/analyze-files-metrics/latency-remote.png" border="false":::
282289

283290
In comparison, the following chart shows a situation where both the client and the Azure file share are located within the same region. Note that the client-side latency is only 0.17ms compared to 43.9ms in the first chart. This illustrates why minimizing client-side latency is imperative in order to achieve optimal performance.
284291

285292
:::image type="content" source="media/analyze-files-metrics/latency-same-region.png" alt-text="Screenshot showing latency metrics when the client and Azure file share are located in the same region." lightbox="media/analyze-files-metrics/latency-same-region.png" border="false":::
286293

287-
Another latency indicator to look that for might suggest a problem is an increased frequency or abnormal spikes in **Success Server Latency**. This is commonly due to throttling due to exceeding the Azure Files [scale limits](storage-files-scale-targets.md) for standard file shares, or an under-provisioned [Azure Files Premium Share](understanding-billing.md#provisioned-v1-model).
294+
Another latency indicator to look that for might suggest a problem is an increased frequency or abnormal spikes in **Success Server Latency**. This is commonly due to throttling due to exceeding the provisioned limit for a provisioned file share (or an overall scale limit a pay-as-you-go file share). See [Understanding Azure Files billing](./understanding-billing.md) and the [Scalability and performance targets for Azure Files](storage-files-scale-targets.md).
288295

289296
For more information, see [Troubleshoot high latency, low throughput, or low IOPS](/troubleshoot/azure/azure-storage/files-troubleshoot-performance?toc=%2Fazure%2Fstorage%2Ffiles%2Ftoc.json&tabs=windows#high-latency-low-throughput-or-low-iops).
290297

291298
### Monitor utilization
292299

293300
Utilization metrics that measure the amount of data being transmitted (throughput) or operations being serviced (IOPS) are commonly used to determine how much work is being performed by the application or workload. Transaction metrics can determine the number of operations or requests against the Azure Files service over various time granularity.
294301

295-
If you're using the **Egress** or **Ingress** metrics to determine the volume of inbound or outbound data, use the **Sum** aggregation to determine the total amount of data being transmitted to and from the file share over a 1 minute to 1 day time granularity. Other aggregations such as **Average**, **Max**, and **Min** only display the value of the individual I/O size. This is why most customers will typically see 1 MiB when using the **Max** aggregation. While it can be useful to understand the size of your largest, smallest, or even average I/O size, it isn't possible to display the distribution of I/O size generated by the workload's usage pattern.
302+
If you're using the **Egress** or **Ingress** metrics to determine the volume of inbound or outbound data, use the **Sum** aggregation to determine the total amount of data being transmitted to and from the file share over a 1 minute to 1 day time granularity. Other aggregations such as **Average**, **Max**, and **Min** only display the value of the individual I/O size. This is why most customers typically see 1 MiB when using the **Max** aggregation. While it can be useful to understand the size of your largest, smallest, or even average I/O size, it isn't possible to display the distribution of I/O size generated by the workload's usage pattern.
296303

297304
You can also select **Apply splitting** on response types (success, failures, errors) or API operations (read, write, create, close) to display additional details as shown in the following chart.
298305

@@ -302,11 +309,9 @@ To determine the average I/O per second (IOPS) for your workload, first determin
302309

303310
To determine the average throughput for your workload, take the total amount of transmitted data by combining the **Ingress** and **Egress** metrics (total throughput) and divide that by 60 seconds. For example, 1 GiB total throughput over 1 minute / 60 seconds = 17 MiB average throughput.
304311

305-
### Monitor utilization by maximum IOPS and bandwidth (premium only)
312+
### Monitor utilization by maximum IOPS and bandwidth (provisioned only)
306313

307-
Because Azure Premium file shares are billed on a provisioned model in which each GiB of storage capacity that you provision entitles you to more IOPS and throughput, it's often useful to determine maximum IOPS and bandwidth. Whereas throughput measures the actual amount of data successfully transmitted, bandwidth refers to the maximum data transfer rate.
308-
309-
With Azure Premium file shares, you can use **Transactions by Max IOPS** and **Bandwidth by Max MiB/s** metrics to display what your workload is achieving at peak times. Using these metrics to analyze your workload will help you understand true capability at scale, as well as establish a baseline to understand the impact of more throughput and IOPS so you can optimally provision your Azure Premium file share.
314+
Provisioned file shares provide **Transactions by Max IOPS** and **Bandwidth by Max MiB/s** metrics to display what your workload is achieving at peak times. Using these metrics to analyze your workload help you understand true capability at scale, as well as establish a baseline to understand the impact of more throughput and IOPS so you can optimally provision your Azure file share.
310315

311316
The following chart shows a workload that generated 2.63 million transactions over 1 hour. When 2.63 million transactions is divided by 3,600 seconds, we get an average of 730 IOPS.
312317

@@ -325,26 +330,25 @@ Compared against the **Bandwidth by Max MiB/s**, we achieved 123 MiB/s at peak.
325330
:::image type="content" source="media/analyze-files-metrics/bandwidth-by-max-mibs.png" alt-text="Screenshot showing bandwidth by max MIBS." lightbox="media/analyze-files-metrics/bandwidth-by-max-mibs.png" border="false":::
326331

327332
### Monitor utilization by metadata IOPS
328-
329-
On Premium SSD and Standard HDD file shares, our current metadata capabilities scale up to 12K metadata IOPS. This means that running a metadata-heavy workload with a high volume of open, close, or delete operations increases the likelihood of metadata IOPS throttling. This limitation is independent of the file share's overall IOPS capacity on Standard or IOPS provisioning on Premium.
333+
On Azure file shares scale up to 12K metadata IOPS. This means that running a metadata-heavy workload with a high volume of open, close, or delete operations increases the likelihood of metadata IOPS throttling. This limitation is independent of the file share's overall provisioned IOPS.
330334

331335
Because no two metadata-heavy workloads follow the same usage pattern, it can be challenging for customers to proactively monitor their workload and set accurate alerts.
332336

333337
To address this, we've introduced two metadata-specific metrics for Azure file shares:
334338

335339
- **Success with Metadata Warning:** Indicates that metadata IOPS are approaching their limit and might be throttled if they remain high or continue increasing. A rise in the volume or frequency of these warnings suggests an increasing risk of metadata throttling.
336340

337-
- **Success with Metadata Throttling:** Indicates that metadata IOPS have exceeded the file shares capacity, resulting in throttling. While IOPS operations will never fail and will eventually succeed after retries, latency will be impacted during throttling.
341+
- **Success with Metadata Throttling:** Indicates that metadata IOPS have exceeded the file share's capacity, resulting in throttling. While IOPS operations never fail and eventually succeed after retries, latency is impacted during throttling.
338342

339-
To view in Azure Monitor, select the **Transactions** metric and **Apply splitting** on response types. The Metadata response types will only appear in the drop-down if the activity occurs within the timeframe selected.
343+
To view in Azure Monitor, select the **Transactions** metric and **Apply splitting** on response types. The Metadata response types only appear in the drop-down if the activity occurs within the timeframe selected.
340344

341345
The following chart illustrates a workload that experienced a sudden increase in metadata IOPS (transactions), triggering Success with Metadata Warnings, which indicates a risk of metadata throttling. In this example, the workload subsequently reduced its transaction volume, preventing metadata throttling from occurring.
342346

343347
:::image type="content" source="media/analyze-files-metrics/metadata-warnings.png" alt-text="Screenshot showing Metadata Warnings by response type." lightbox="media/analyze-files-metrics/metadata-warnings.png" border="false":::
344348

345349
If your workload encounters **Success with Metadata Warnings** or **Success with Metadata Throttling** response types, consider implementing one or more of the following recommendations:
346350

347-
- For Premium SMB file shares, enable [Metadata Caching](smb-performance.md#metadata-caching-for-premium-smb-file-shares).
351+
- For SSD SMB file shares, enable [Metadata Caching](smb-performance.md#metadata-caching-for-premium-smb-file-shares).
348352
- Distribute (shard) your workload across multiple file shares.
349353
- Reduce the volume of metadata IOPS.
350354

0 commit comments

Comments
 (0)