Skip to content

Commit cdfd322

Browse files
Merge pull request #251638 from varun-dhawan/varund-july
[PostgreSQL] updates the details for metrics visualization
2 parents 3a71a44 + a1129d6 commit cdfd322

File tree

3 files changed

+45
-29
lines changed

3 files changed

+45
-29
lines changed

articles/postgresql/flexible-server/concepts-monitoring.md

Lines changed: 42 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: varundhawan
66
ms.service: postgresql
77
ms.subservice: flexible-server
88
ms.topic: conceptual
9-
ms.date: 9/5/2023
9+
ms.date: 9/15/2023
1010
---
1111

1212
# Monitor metrics on Azure Database for PostgreSQL - Flexible Server
@@ -22,7 +22,7 @@ Azure Database for PostgreSQL provides various metrics that give insight into th
2222
> [!NOTE]
2323
> While metrics are stored for 93 days, you can only query (in the Metrics tile) for a maximum of 30 days' worth of data on any single chart. If you see a blank chart or your chart displays only part of metric data, verify that the difference between start and end dates in the time picker doesn't exceed the 30-day interval. After you've selected a 30-day interval, you can pan the chart to view the full retention window.
2424
25-
### List of metrics
25+
### Default Metrics
2626

2727
The following metrics are available for a flexible server instance of Azure Database for PostgreSQL:
2828

@@ -51,18 +51,16 @@ The following metrics are available for a flexible server instance of Azure Data
5151
|**Write IOPS** |`write_iops` |Count |Number of data disk I/O write operations per second. |Yes |
5252

5353

54-
## Enhanced metrics
54+
### Enhanced metrics
5555

56-
You can use enhanced metrics for Azure Database for PostgreSQL - Flexible Server to get fine-grained monitoring and alerting on databases. You can configure alerts on the metrics.
56+
You can use enhanced metrics for Azure Database for PostgreSQL - Flexible Server to get fine-grained monitoring and alerting on databases. You can configure alerts on the metrics. Some enhanced metrics include a `Dimension` parameter that you can use to split and filter metrics data by using a dimension like database name or state.
5757

58-
Some enhanced metrics include a `Dimension` parameter that you can use to split and filter metrics data by using a dimension like database name or state.
59-
60-
### Enable enhanced metrics
58+
#### Enabling enhanced metrics
6159

6260
- Most of these new metrics are *disabled* by default. A few exceptions are described in the next table.
6361
- To enable these metrics, set the server parameter `metrics.collector_database_activity` to `ON`. This parameter is dynamic and doesn't require an instance restart.
6462

65-
### List of enhanced metrics
63+
##### List of enhanced metrics
6664

6765
You can choose from the following categories of enhanced metrics:
6866

@@ -73,7 +71,7 @@ You can choose from the following categories of enhanced metrics:
7371
- Saturation
7472
- Traffic
7573

76-
#### Activity
74+
##### Activity
7775

7876
|Display name|Metric ID|Unit|Description|Dimension|Default enabled|
7977
|---|---|---|---|---|---|
@@ -85,7 +83,7 @@ You can choose from the following categories of enhanced metrics:
8583
|**Oldest xmin** |`oldest_backend_xmin`|Count|The actual value of the oldest `xmin`. If `xmin` isn't increasing, it indicates that there are some long-running transactions that can potentially hold dead tuples from being removed. |Doesn't apply|No|
8684
|**Oldest xmin Age** |`oldest_backend_xmin_age`|Count|Age in units of the oldest `xmin`. Indicates how many transactions passed since the oldest `xmin`. |Doesn't apply|No|
8785

88-
#### Database
86+
##### Database
8987

9088
|Display name |Metric ID |Unit |Description |Dimension |Default enabled|
9189
|---------------------------------|-------------|-----|----------------------------------------------------------------------------------------------------|------------|---------------|
@@ -105,52 +103,52 @@ You can choose from the following categories of enhanced metrics:
105103
|**Tuples Returned** |`tup_returned` |Count|Number of rows that were returned by queries in this database. |DatabaseName|No |
106104
|**Tuples Updated** |`tup_updated` |Count|Number of rows that were updated by queries in this database. |DatabaseName|No |
107105

108-
#### Logical replication
106+
##### Logical replication
109107

110108
|Display name|Metric ID|Unit|Description|Dimension|Default enabled|
111109
|---|---|---|---|---|---|
112110
|**Max Logical Replication Lag** |`logical_replication_delay_in_bytes`|Bytes|Maximum lag across all logical replication slots.|Doesn't apply|Yes |
113111

114-
#### Replication
112+
##### Replication
115113

116114
|Display name|Metric ID|Unit|Description|Dimension|Default enabled|
117115
|---|---|---|---|---|---|
118116
|**Max Physical Replication Lag** |`physical_replication_delay_in_bytes`|Bytes|Maximum lag across all asynchronous physical replication slots.|Doesn't apply|Yes |
119117
|**Read Replica Lag** |`physical_replication_delay_in_seconds`|Seconds|Read replica lag in seconds. |Doesn't apply|Yes |
120118

121-
#### Saturation
119+
##### Saturation
122120

123121
|Display name|Metric ID|Unit|Description|Dimension|Default enabled|
124122
|---|---|---|---|---|---|
125123
|**Disk Bandwidth Consumed Percentage**|`disk_bandwidth_consumed_percentage`|Percent|Percentage of data disk bandwidth consumed per minute.|Doesn't apply|Yes |
126124
|**Disk IOPS Consumed Percentage** |`disk_iops_consumed_percentage` |Percent|Percentage of data disk I/Os consumed per minute. |Doesn't apply|Yes |
127125

128-
#### Traffic
126+
##### Traffic
129127

130128
|Display name|Metric ID|Unit|Description|Dimension|Default enabled|
131129
|---|---|---|---|---|---|
132130
|**Max Connections** ^|`max_connections`|Count|Number of maximum connections. |Doesn't apply|Yes |
133131

134132
^ **Max Connections** represents the configured value for the `_max_connections_ server` parameter. This metric is pooled every 30 minutes.
135133

136-
#### Considerations for using enhanced metrics
134+
##### Considerations for using enhanced metrics
137135

138136
- Enhanced metrics that use the DatabaseName dimension have a *50-database* limit.
139137
- On the *Burstable* SKU, the limit is 10 databases for metrics that use the DatabaseName dimension.
140138
- The DatabaseName dimension limit is applied on the object identifier (OID) column, which reflects the order of creation for the database.
141139
- The DatabaseName in the metrics dimension is *case insensitive*. The metrics for database names that are the same except for case (for example, *contoso_database* and *Contoso_database*) will be merged and might not show accurate data.
142140

143-
## Autovacuum metrics
141+
### Autovacuum metrics
144142

145143
Autovaccum metrics can be used to monitor and tune autovaccum performance for Azure Database for PostgreSQL - Flexible Server. Each metric is emitted at a *30-minute* interval and has up to *93 days* of retention. You can create alerts for specific metrics, and you can split and filter metrics data by using the DatabaseName dimension.
146144

147-
### Enable autovacuum metrics
145+
#### How to enable autovacuum metrics
148146

149147
- Autovacuum metrics are disabled by default.
150148
- To enable these metrics, set the server parameter `metrics.autovacuum_diagnostics` to `ON`.
151149
- This parameter is dynamic, so an instance restart isn't required.
152150

153-
### List of autovacuum metrics
151+
#### List of autovacuum metrics
154152

155153
|Display name |Metric ID |Unit |Description |Dimension |Default enabled|
156154
|---------------------------------------|---------------------------------|-------|-----------------------------------------------------------------------------------------------------------|------------|---------------|
@@ -168,23 +166,23 @@ Autovaccum metrics can be used to monitor and tune autovaccum performance for Az
168166
|**User Tables Vacuumed** |`tables_vacuumed_user_tables` |Count |Number of user-only tables that have been vacuumed in this database. |DatabaseName|No |
169167
|**Vacuum Counter User Tables** |`vacuum_count_user_tables` |Count |Number of times user-only tables have been manually vacuumed in this database (not counting `VACUUM FULL`).|DatabaseName|No |
170168

171-
### Considerations for using autovacuum metrics
169+
#### Considerations for using autovacuum metrics
172170

173171
- Autovacuum metrics that use the DatabaseName dimension have a *30-database* limit.
174172
- On the *Burstable* SKU, the limit is 10 databases for metrics that use the DatabaseName dimension.
175173
- The DatabaseName dimension limit is applied on the OID column, which reflects the order of creation for the database.
176174

177-
## PgBouncer metrics
175+
### PgBouncer metrics
178176

179177
You can use PgBouncer metrics to monitor the performance of the PgBouncer process, including details for active connections, idle connections, total pooled connections, and the number of connection pools. Each metric is emitted at a *30-minute* interval and has up to *93 days* of history. Customers can configure alerts on the metrics and also access the new metrics dimensions to split and filter metrics data by database name.
180178

181-
### Enable PgBouncer metrics
179+
#### How to enable PgBouncer metrics
182180

183181
- PgBouncer metrics are disabled by default.
184182
- For PgBouncer metrics to work, both the server parameters `pgbouncer.enabled` and `metrics.pgbouncer_diagnostics` must be enabled.
185183
- These parameters are dynamic and don't require an instance restart.
186184

187-
### List of PgBouncer metrics
185+
#### List of PgBouncer metrics
188186

189187
|Display name|Metric ID|Unit|Description|Dimension|Default enabled|
190188
|---|---|---|---|---|---|
@@ -195,13 +193,13 @@ You can use PgBouncer metrics to monitor the performance of the PgBouncer proces
195193
|**Total pooled connections** |`total_pooled_connections`|Count|Current number of pooled connections. |DatabaseName|No |
196194
|**Number of connection pools** |`num_pools` |Count|Total number of connection pools. |DatabaseName|No |
197195

198-
### Considerations for using the PgBouncer metrics
196+
#### Considerations for using the PgBouncer metrics
199197

200198
- PgBouncer metrics that use the DatabaseName dimension have a *30-database* limit.
201199
- On the *Burstable* SKU, the limit is 10 databases that have the DatabaseName dimension.
202200
- The DatabaseName dimension limit is applied to the OID column, which reflects the order of creation for the database.
203201

204-
## Database availability metric
202+
### Database availability metric
205203

206204
Is-db-alive is an database server availability metric for Azure Postgres Flexible Server, that returns `[1 for available]` and `[0 for not-available]`. Each metric is emitted at a *1 minute* frequency, and has up to *93 days* of retention. Customers can configure alerts on the metric.
207205

@@ -215,7 +213,7 @@ Is-db-alive is an database server availability metric for Azure Postgres Flexibl
215213
- Customers have option to further aggregate these metrics with any desired frequency (5m, 10m, 30m etc.) to suit their alerting requirements and avoid any false positive.
216214
- Other possible aggregations are `AVG()` and `MIN()`
217215

218-
## Filter and split on dimension metrics
216+
### Filter and split on dimension metrics
219217

220218
In the preceding tables, some metrics have dimensions like DatabaseName or State. You can use [filtering](../../azure-monitor/essentials/metrics-charts.md#filters) and [splitting](../../azure-monitor/essentials/metrics-charts.md#apply-splitting) for the metrics that have dimensions. These features show how various metric segments (or *dimension values*) affect the overall value of the metric. You can use them to identify possible outliers.
221219

@@ -228,10 +226,28 @@ The following example demonstrates splitting by the State dimension and filterin
228226

229227
For more information about setting up charts for dimensional metrics, see [Metric chart examples](../../azure-monitor/essentials/metric-chart-samples.md).
230228

231-
## Server logs
229+
### Metrics visualization
230+
231+
There are several options to visualize Azure Monitor metrics
232+
233+
|Component |Description | Required training and/or configuration|
234+
|---------|---------|--------|
235+
|Overview page|Most Azure services have an **Overview** page in the Azure portal that includes a **Monitor** section with charts that show recent critical metrics. This information is intended for owners of individual services to quickly assess the performance of the resource. |This page is based on platform metrics that are collected automatically. No configuration is required. |
236+
|[Metrics Explorer](../../azure-monitor/essentials/metrics-getting-started.md)|You can use Metrics Explorer to interactively work with metric data and create metric alerts. You need minimal training to use Metrics Explorer, but you must be familiar with the metrics you want to analyze. |- Once data collection is configured, no other configuration is required.<br>- Platform metrics for Azure resources are automatically available.<br>- Guest metrics for virtual machines are available after an Azure Monitor agent is deployed to the virtual machine.<br>- Application metrics are available after Application Insights is configured. |
237+
| [Grafana](https://grafana.com/grafana/dashboards/19556-azure-azure-postgresql-flexible-server-monitoring/) | You can use Grafana for visualizing and alerting on metrics. All versions of Grafana include the [Azure Monitor datasource plug-in](../../azure-monitor/visualize/grafana-plugin.md) to visualize your Azure Monitor metrics and logs. | Some training is required for you to become familiar with Grafana dashboards, although you can download prebuilt [Azure PostgreSQL grafana monitoring dashboard](https://grafana.com/grafana/dashboards/19556-azure-azure-postgresql-flexible-server-monitoring/) to easily all Auzre PostgreSQL srevers in your organzation. |
238+
239+
240+
## Logs
232241

233242
In addition to the metrics, you can use Azure Database for PostgreSQL to configure and access Azure Database for PostgreSQL standard logs. For more information, see [Logging concepts](concepts-logging.md).
234243

244+
### Logs visualization
245+
246+
|Component |Description | Required training and/or configuration|
247+
|---------|---------|--------|
248+
|[Log Analytics](../../azure-monitor/logs/log-analytics-overview.md)|With Log Analytics, you can create log queries to interactively work with log data and create log query alerts.| Some training is required for you to become familiar with the query language, although you can use prebuilt queries for common requirements. |
249+
250+
235251
## Next steps
236252

237253
- Learn more about how to [configure and access logs](howto-configure-and-access-logs.md).

articles/postgresql/flexible-server/concepts-read-replicas.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ When you start the create replica workflow, a blank Azure Database for PostgreSQ
5656

5757
In Azure Database for PostgreSQL - Flexible Server, the create operation of replicas is considered successful only when the entire backup of the primary instance has been copied to the replica destination along with the transaction logs have been synchronized up to the threshold of maximum 1GB lag.
5858

59-
To ensure the success of the create operation, it's recommended to avoid creating replicas during periods of high transactional load. For example, it's best to avoid creating replicas during migrations from other sources to Azure Database for PostgreSQL - Flexible Server, or during excessive bulk load operations. If you are currently in the process of performing a migration or bulk load operation, it's recommended that you wait until the operation has completed before proceeding with the creation of replicas. Once the migration or bulk load operation has finished, check whether the transaction log size has returned to its normal size. Typically, the transaction log size should be close to the value defined in the max_wal_size server parameter for your instance. You can track the transaction log storage footprint using the [Transaction Log Storage Used](concepts-monitoring.md#list-of-metrics) metric, which provides insights into the amount of storage used by the transaction log. By monitoring this metric, you can ensure that the transaction log size is within the expected range and that the replica creation process might be started.
59+
To ensure the success of the create operation, it's recommended to avoid creating replicas during periods of high transactional load. For example, it's best to avoid creating replicas during migrations from other sources to Azure Database for PostgreSQL - Flexible Server, or during excessive bulk load operations. If you are currently in the process of performing a migration or bulk load operation, it's recommended that you wait until the operation has completed before proceeding with the creation of replicas. Once the migration or bulk load operation has finished, check whether the transaction log size has returned to its normal size. Typically, the transaction log size should be close to the value defined in the max_wal_size server parameter for your instance. You can track the transaction log storage footprint using the [Transaction Log Storage Used](concepts-monitoring.md#default-metrics) metric, which provides insights into the amount of storage used by the transaction log. By monitoring this metric, you can ensure that the transaction log size is within the expected range and that the replica creation process might be started.
6060

6161
> [!IMPORTANT]
6262
> Read Replicas are currently supported for the General Purpose and Memory Optimized server compute tiers, Burstable server compute tier is not supported.
@@ -86,7 +86,7 @@ At the prompt, enter the password for the user account.
8686
Read replica feature in Azure Database for PostgreSQL - Flexible Server relies on replication slots mechanism. The main advantage of replication slots is the ability to automatically adjust the number of transaction logs (WAL segments) needed by all replica servers and therefore avoid situations when one or more replicas going out of sync because WAL segments that were not yet sent to the replicas are being removed on the primary. The disadvantage of the approach is the risk of going out of space on the primary in case replication slot remains inactive for a long period of time. In such situations primary will accumulate WAL files causing incremental growth of the storage usage. When the storage usage reaches 95% or if the available capacity is less than 5 GiB, the server is automatically switched to read-only mode to avoid errors associated with disk-full situations.
8787
Therefore, monitoring the replication lag and replication slots status is crucial for read replicas.
8888

89-
We recommend setting alert rules for storage used or storage percentage, as well as for replication lags, when they exceed certain thresholds so that you can proactively act, increase the storage size and delete lagging read replicas. For example, you can set an alert if the storage percentage exceeds 80% usage, as well on the replica lag being higher than 1h. The [Transaction Log Storage Used](concepts-monitoring.md#list-of-metrics) metric will show you if the WAL files accumulation is the main reason of the excessive storage usage.
89+
We recommend setting alert rules for storage used or storage percentage, as well as for replication lags, when they exceed certain thresholds so that you can proactively act, increase the storage size and delete lagging read replicas. For example, you can set an alert if the storage percentage exceeds 80% usage, as well on the replica lag being higher than 1h. The [Transaction Log Storage Used](concepts-monitoring.md#default-metrics) metric will show you if the WAL files accumulation is the main reason of the excessive storage usage.
9090

9191
Azure Database for PostgreSQL - Flexible Server provides [two metrics](concepts-monitoring.md#replication) for monitoring replication. The two metrics are **Max Physical Replication Lag** and **Read Replica Lag**. To learn how to view these metrics, see the **Monitor a replica** section of the [read replica how-to article](how-to-read-replicas-portal.md#monitor-a-replica).
9292

0 commit comments

Comments
 (0)