Skip to content

Commit e358e82

Browse files
Merge pull request #99067 from dimitri-furman/dimitri-furman
Added Data IO resource stats
2 parents 4cf157d + 27e81e6 commit e358e82

File tree

1 file changed

+26
-26
lines changed

1 file changed

+26
-26
lines changed

articles/sql-database/sql-database-hyperscale-performance-diagnostics.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,12 @@ ms.reviewer: sstein
1212
ms.date: 10/18/2019
1313
---
1414

15-
1615
# SQL Hyperscale performance troubleshooting diagnostics
1716

18-
1917
To troubleshoot performance problems in a Hyperscale database, [general performance tuning methodologies](sql-database-monitor-tune-overview.md) on the Azure SQL database compute node is the starting point of a performance investigation. However, given the [distributed architecture](sql-database-service-tier-hyperscale.md#distributed-functions-architecture) of Hyperscale, additional diagnostics have been added to assist. This article describes Hyperscale-specific diagnostic data.
2018

21-
2219
## Log rate throttling waits
2320

24-
2521
Every Azure SQL Database service level has log generation rate limits enforced via [log rate governance](sql-database-resource-limits-database-server.md#transaction-log-rate-governance). In Hyperscale, the log generation limit is currently set to 100 MB/sec, regardless of the service level. However, there are times when the log generation rate on the primary compute replica has to be throttled to maintain recoverability SLAs. This throttling happens when a [page server or another compute replica](sql-database-service-tier-hyperscale.md#distributed-functions-architecture) is significantly behind applying new log records from the Log service.
2622

2723
The following wait types (in [sys.dm_os_wait_stats](/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql/)) describe the reasons why log rate can be throttled on the primary compute replica:
@@ -33,14 +29,13 @@ The following wait types (in [sys.dm_os_wait_stats](/sql/relational-databases/sy
3329
|RBIO_RG_REPLICA | Occurs when a Hyperscale database compute node log generation rate is being throttled due to delayed log consumption by the readable secondary replica(s). |
3430
|RBIO_RG_LOCALDESTAGE | Occurs when a Hyperscale database compute node log generation rate is being throttled due to delayed log consumption by the log service. |
3531

32+
## Page server reads
3633

37-
## Page Server Reads
38-
39-
The compute replicas do not cache a full copy of the database locally. The data local to the compute replica is stored in the Buffer Pool (in memory) and in the local Resilient Buffer Pool Extension (RBPEX) cache that is a partial (non-covering) cache of data pages. This local RBPEX cache is sized proportionally to the compute size and is 3 times the memory of the compute tier. RBPEX is similar to the Buffer Pool in that it has the most frequently accessed data. Each page server, on the other hand, has a covering RBPEX cache for the portion of the database it maintains.
34+
The compute replicas do not cache a full copy of the database locally. The data local to the compute replica is stored in the Buffer Pool (in memory) and in the local Resilient Buffer Pool Extension (RBPEX) cache that is a partial (non-covering) cache of data pages. This local RBPEX cache is sized proportionally to the compute size and is three times the memory of the compute tier. RBPEX is similar to the Buffer Pool in that it has the most frequently accessed data. Each page server, on the other hand, has a covering RBPEX cache for the portion of the database it maintains.
4035

4136
When a read is issued on a compute replica, if the data doesn't exist in the Buffer Pool or local RBPEX cache, a getPage(pageId, LSN) function call is issued, and the page is fetched from the corresponding page server. Reads from page servers are remote reads and are thus slower than reads from the local RBPEX. When troubleshooting IO-related performance problems, we need to be able to tell how many IOs were done via relatively slower remote page server reads.
4237

43-
Several DMVs and extended events have columns and fields that specify the number of remote reads from a page server which can be compared against the total reads. Query store also captures remote reads as part of the query run time stats.
38+
Several DMVs and extended events have columns and fields that specify the number of remote reads from a page server, which can be compared against the total reads. Query store also captures remote reads as part of the query run time stats.
4439

4540
- Columns to report page server reads are available in execution DMVs and catalog views, such as:
4641
- [sys.dm_exec_requests](/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-requests-transact-sql/)
@@ -61,45 +56,50 @@ Several DMVs and extended events have columns and fields that specify the number
6156
`<RunTimeCountersPerThread Thread="8" ActualRows="90466461" ActualRowsRead="90466461" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="133645" ActualCPUms="85105" ActualScans="1" ActualLogicalReads="6032256" ActualPhysicalReads="0" ActualPageServerReads="0" ActualReadAheads="6027814" ActualPageServerReadAheads="5687297" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobPageServerReads="0" ActualLobReadAheads="0" ActualLobPageServerReadAheads="0" />`
6257

6358
> [!NOTE]
64-
> To view these attributes in the query plan properties window in SSMS you will need SSMS 18.3 or later.
65-
66-
67-
## Virtual File Stats and IO accounting
59+
> To view these attributes in the query plan properties window, SSMS 18.3 or later is required.
6860
69-
In Azure SQL Database, the [sys.dm_io_virtual_file_stats()](/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql/) DMF is the primary way to monitor SQL Server IO. IO characteristics on Hyperscale are different due to its [distributed architecture](sql-database-service-tier-hyperscale.md#distributed-functions-architecture). In this section, we focus on IO (reads and writes) to data files as seen in this DMF. In Hyperscale, each data file visible in this DMF corresponds to a remote page server. The RBPEX cache mentioned here is a local SSD-based cache that is a non-covering cache on the compute replica.
61+
## Virtual file stats and IO accounting
7062

63+
In Azure SQL Database, the [sys.dm_io_virtual_file_stats()](/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql/) DMF is the primary way to monitor SQL Server IO. IO characteristics in Hyperscale are different due to its [distributed architecture](sql-database-service-tier-hyperscale.md#distributed-functions-architecture). In this section, we focus on IO (reads and writes) to data files as seen in this DMF. In Hyperscale, each data file visible in this DMF corresponds to a remote page server. The RBPEX cache mentioned here is a local SSD-based cache, that is a non-covering cache on the compute replica.
7164

7265
### Local RBPEX cache usage
7366

74-
Local RBPEX cache exists on the compute node on local SSD storage. Thus, IO on this RBPEX cache is faster than IO on remote page servers. Currently, [sys.dm_io_virtual_file_stats()](/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql/) in a Hyperscale database has a special row reporting the IO done on the local RBPEX cache on the compute replica. This row has the value of 0 for both `database_id` and `file_id` columns. For example, the query below returns RBPEX usage statistics since database startup.
67+
Local RBPEX cache exists on the compute replica, on local SSD storage. Thus, IO against this cache is faster than IO against remote page servers. Currently, [sys.dm_io_virtual_file_stats()](/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql/) in a Hyperscale database has a special row reporting the IO against the local RBPEX cache on the compute replica. This row has the value of 0 for both `database_id` and `file_id` columns. For example, the query below returns RBPEX usage statistics since database startup.
7568

7669
`select * from sys.dm_io_virtual_file_stats(0,NULL);`
7770

78-
A ratio of reads done on RBPEX to aggregated reads done on all other data files provides the RBPEX cache hit ratio.
71+
A ratio of reads done on RBPEX to aggregated reads done on all other data files provides RBPEX cache hit ratio.
7972

80-
81-
### Data Reads
73+
### Data reads
8274

8375
- When reads are issued by the SQL Server engine on a compute replica, they may be served either by the local RBPEX cache, or by remote page servers, or by a combination of the two if reading multiple pages.
8476
- When the compute replica reads some pages from a specific file, for example file_id 1, if this data resides solely on the local RBPEX cache, all IO for this read is accounted against file_id 0 (RBPEX). If some part of that data is in the local RBPEX cache, and some part is on a remote page server, then IO is accounted towards file_id 0 for the part served from RBPEX, and the part served from the remote page server is accounted towards file_id 1.
85-
- When a compute replica requests a page at a particular [LSN](/sql/relational-databases/sql-server-transaction-log-architecture-and-management-guide/) from a page server, if the page server has not caught up to the LSN requested, the read on the compute replica will wait until the page server catches up before the page is returned to the compute replica. For any read from a page server on the compute replica, you will see the PAGEIOLATCH_* wait type if it is waiting on that IO. This wait time includes both the time to catch up the requested page on the page server to the LSN required, and the time needed to transfer the page from the page server to the compute replica.
86-
- Large reads such as read-ahead are often done using ["Scatter-Gather" Reads](/sql/relational-databases/reading-pages/). This allows reads of up to 4 MB of pages at a time, considered a single read in the SQL Server engine. However, when data being read is in RBPEX, these reads are accounted as multiple individual 8 KB reads since the buffer pool and RBPEX always use 8 KB pages. As the result, the number of read IOs seen against RBPEX may be larger than the actual number of IOs performed by the engine.
87-
77+
- When a compute replica requests a page at a particular [LSN](/sql/relational-databases/sql-server-transaction-log-architecture-and-management-guide/) from a page server, if the page server has not caught up to the LSN requested, the read on the compute replica will wait until the page server catches up before the page is returned to the compute replica. For any read from a page server on the compute replica, you will see the PAGEIOLATCH_* wait type if it is waiting on that IO. In Hyperscale, this wait time includes both the time to catch up the requested page on the page server to the LSN required, and the time needed to transfer the page from the page server to the compute replica.
78+
- Large reads such as read-ahead are often done using ["Scatter-Gather" Reads](/sql/relational-databases/reading-pages/). This allows reads of up to 4 MB of pages at a time, considered a single read in the SQL Server engine. However, when data being read is in RBPEX, these reads are accounted as multiple individual 8 KB reads, since the buffer pool and RBPEX always use 8 KB pages. As the result, the number of read IOs seen against RBPEX may be larger than the actual number of IOs performed by the engine.
8879

89-
### Data Writes
80+
### Data writes
9081

9182
- The primary compute replica does not write directly to page servers. Instead, log records from the Log service are replayed on corresponding page servers.
92-
- Writes that happen on the compute replica are predominantly writes to the local RBPEX (file_id 0). For writes on logical files that are larger than 8 KB, i.e. those done using [Gather-write](/sql/relational-databases/writing-pages/), each write operation is translated into multiple 8 KB individual writes to RBPEX since the buffer pool and RBPEX always use 8 KB pages. As the result, the number of write IOs seen against RBPEX may be larger than the actual number of IOs performed by the engine.
83+
- Writes that happen on the compute replica are predominantly writes to the local RBPEX (file_id 0). For writes on logical files that are larger than 8 KB, in other words those done using [Gather-write](/sql/relational-databases/writing-pages/), each write operation is translated into multiple 8 KB individual writes to RBPEX since the buffer pool and RBPEX always use 8 KB pages. As the result, the number of write IOs seen against RBPEX may be larger than the actual number of IOs performed by the engine.
9384
- Non-RBPEX files, or data files other than file_id 0 that correspond to page servers, also show writes. In the Hyperscale service tier, these writes are simulated, because the compute replicas never write directly to page servers. Write IOPS and throughput are accounted as they occur on the compute replica, but latency for data files other than file_id 0 does not reflect the actual latency of page server writes.
9485

95-
### Log Writes
86+
### Log writes
9687

9788
- On the primary compute, a log write is accounted for in file_id 2 of sys.dm_io_virtual_file_stats. A log write on primary compute is a write to the log Landing Zone.
98-
- Log records are not hardened on the secondary replica on a commit. In Hyperscale, log is applied by the Xlog service to the remote replicas. Because log writes don't actually occur on secondary replicas, any accounting of Log IO's on the secondary replicas is for tracking purposes only.
89+
- Log records are not hardened on the secondary replica on a commit. In Hyperscale, log is applied by the Log service to the secondary replicas asynchronously. Because log writes don't actually occur on secondary replicas, any accounting of Log IOs on the secondary replicas is for tracking purposes only.
90+
91+
## Data IO in resource utilization statistics
92+
93+
In a non-Hyperscale database, combined read and write IOPS against data files, relative to the [resource governance](/sql-database-resource-limits-database-server#resource-governance) data IOPS limit, are reported in [sys.dm_db_resource_stats](/sql/relational-databases/system-dynamic-management-views/sys-dm-db-resource-stats-azure-sql-database) and [sys.resource_stats](/sql/relational-databases/system-catalog-views/sys-resource-stats-azure-sql-database) views, in the `avg_data_io_percent` column. The same value is reported in the portal as _Data IO Percentage_.
94+
95+
In a Hyperscale database, this column reports on data IOPS utilization relative to the limit for local storage on compute replica only, specifically IO against RBPEX and `tempdb`. A 100% value in this column indicates that resource governance is limiting local storage IOPS. If this is correlated with a performance problem, tune the workload to generate less IO, or increase database service objective to increase the resource governance _Max Data IOPS_ [limit](/sql-database-vcore-resource-limits-single-databases). For resource governance of RBPEX reads and writes, the system counts individual 8 KB IOs, rather than larger IOs that may be issued by the SQL Server engine.
96+
97+
Data IO against remote page servers is not reported in resource utilization views or in the portal, but is reported in the [sys.dm_io_virtual_file_stats()](/sql/relational-databases/system-dynamic-management-views/sys-dm-io-virtual-file-stats-transact-sql/) DMF, as noted earlier.
98+
9999

100-
## Additional Resources
100+
## Additional resources
101101

102-
- For vCore resource limits for a hyperscale single database see [Hyperscale service tier vCore Limits](sql-database-vcore-resource-limits-single-databases.md#hyperscale---provisioned-compute---gen5)
102+
- For vCore resource limits for a Hyperscale single database see [Hyperscale service tier vCore Limits](sql-database-vcore-resource-limits-single-databases.md#hyperscale---provisioned-compute---gen5)
103103
- For Azure SQL Database performance tuning, see [Query performance in Azure SQL Database](sql-database-performance-guidance.md)
104104
- For performance tuning using Query Store, see [Performance monitoring using Query store](/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store/)
105105
- For DMV monitoring scripts, see [Monitoring performance Azure SQL Database using dynamic management views](sql-database-monitoring-with-dmvs.md)

0 commit comments

Comments
 (0)