Skip to content

Commit f2ef789

Browse files
authored
Merge pull request #196484 from sarat0681/updateToC
Added New Troubleshooting documents for postges-flexible server
2 parents 3e22667 + 40e9924 commit f2ef789

File tree

4 files changed

+638
-3
lines changed

4 files changed

+638
-3
lines changed

articles/postgresql/TOC.yml

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -529,9 +529,22 @@
529529
- name: Azure Advisor recommendations
530530
href: flexible-server/concepts-azure-advisor-recommendations.md
531531
- name: Troubleshooting
532-
items:
533-
- name: Troubleshoot CLI errors
534-
href: flexible-server/how-to-troubleshoot-cli-errors.md
532+
items:
533+
- name: Functional troubleshooting
534+
items:
535+
- name: Troubleshoot CLI errors
536+
href: flexible-server/how-to-troubleshoot-cli-errors.md
537+
- name: Performance troubleshooting
538+
items:
539+
- name: Troubleshoot high CPU utilization
540+
href: flexible-server/how-to-high-cpu-utilization.md
541+
displayName: High CPU Utilization
542+
- name: Troubleshoot high memory utilization
543+
href: flexible-server/how-to-high-memory-utilization.md
544+
displayName: High Memory Utilization
545+
- name: Troubleshoot autovacuum
546+
href: flexible-server/how-to-autovacuum-tuning.md
547+
displayName: Autovacuum troubleshooting, tuning
535548
- name: How-to guides
536549
items:
537550
- name: Manage a server
Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
---
2+
title: Autovacuum Tuning
3+
description: Troubleshooting guide for autovacuum in Azure Database for PostgreSQL - Flexible Server
4+
ms.author: sbalijepalli
5+
author: sarat0681
6+
ms.service: postgresql
7+
ms.subservice: flexible-server
8+
ms.topic: conceptual
9+
ms.date: 08/03/2022
10+
---
11+
12+
# Autovacuum Tuning in Azure Database for PostgreSQL - Flexible Server
13+
14+
This article provides an overview of the autovacuum feature for [Azure Database for PostgreSQL - Flexible Server](overview.md).
15+
16+
## What is autovacuum
17+
18+
Internal data consistency in PostgreSQL is based on the Multi-Version Concurrency Control (MVCC) mechanism, which allows the database engine to maintain multiple versions of a row and provides greater concurrency with minimal blocking between the different processes.
19+
20+
PostgreSQL databases need appropriate maintenance. For example, when a row is deleted, it is not removed physically. Instead, the row is marked as “dead”. Similarly for updates, the row is marked as "dead" and a new version of the row is inserted. These operations leave behind dead records, called dead tuples, even after all the transactions that might see those versions finish. Unless cleaned up, dead tuples remain, consuming disk space and bloating tables and indexes which result in slow query performance.
21+
22+
PostgreSQL uses a process called autovacuum to automatically clean up dead tuples.
23+
24+
25+
## Autovacuum internals
26+
27+
Autovacuum reads pages looking for dead tuples, and if none are found, autovacuum discard the page. When autovacuum finds dead tuples, it removes them. The cost is based on:
28+
29+
- `vacuum_cost_page_hit`: Cost of reading a page that is already in shared buffers and does not need a disk read. The default value is set to 1.
30+
- `vacuum_cost_page_miss`: Cost of fetching a page that is not in shared buffers. The default value is set to 10.
31+
- `vacuum_cost_page_dirty`: Cost of writing to a page when dead tuples are found in it. The default value is set to 20.
32+
33+
The amount of work autovacuum does depends on two parameters:
34+
35+
- `autovacuum_vacuum_cost_limit` is the amount of work autovacuum does in one go and once the cleanup process is done, the amount of time autovacuum is asleep.
36+
- `autovacuum_vacuum_cost_delay` number of milliseconds.
37+
38+
39+
In Postgres versions 9.6, 10 and 11 the default for `autovacuum_vacuum_cost_limit` is 200 and `autovacuum_vacuum_cost_delay` is 20 milliseconds.
40+
In Postgres versions 12 and above the default `autovacuum_vacuum_cost_limit` is 200 and `autovacuum_vacuum_cost_delay` is 2 milliseconds.
41+
42+
Autovacuum wakes up 50 times (50*20 ms=1000 ms) every second. Every time it wakes up, autovacuum reads 200 pages.
43+
44+
That means in one-second autovacuum can do:
45+
46+
- ~80 MB/Sec [ (200 pages/`vacuum_cost_page_hit`) * 50 * 8 KB per page] if all pages with dead tuples are found in shared buffers.
47+
- ~8 MB/Sec [ (200 pages/`vacuum_cost_page_miss`) * 50 * 8 KB per page] if all pages with dead tuples are read from disk.
48+
- ~4 MB/Sec [ (200 pages/`vacuum_cost_page_dirty`) * 50 * 8 KB per page] autovacuum can write up to 4 MB/sec.
49+
50+
51+
52+
## Monitoring autovacuum
53+
54+
Use the following queries to monitor autovacuum:
55+
56+
```postgresql
57+
select schemaname,relname,n_dead_tup,n_live_tup,round(n_dead_tup::float/n_live_tup::float*100) dead_pct,autovacuum_count,last_vacuum,last_autovacuum,last_autoanalyze,last_analyze from pg_stat_all_tables where n_live_tup >0;
58+
```
59+
60+
61+
The following columns help determine if autovacuum is catching up to table activity:
62+
63+
64+
- **Dead_pct**: percentage of dead tuples when compared to live tuples.
65+
- **Last_autovacuum**: The date of the last time the table was autovacuumed.
66+
- **Last_autoanalyze**: The date of the last time the table was automatically analyzed.
67+
68+
69+
## When does PostgreSQL trigger autovacuum
70+
71+
An autovacuum action (either *ANALYZE* or *VACUUM*) triggers when the number of dead tuples exceeds a particular number that is dependent on two factors: the total count of rows in a table, plus a fixed threshold. *ANALYZE*, by default, triggers when 10% of the table plus 50 rows changes, while *VACUUM* triggers when 20% of the table plus 50 rows changes. Since the *VACUUM* threshold is twice as high as the *ANALYZE* threshold, *ANALYZE* gets triggered much earlier than *VACUUM*.
72+
73+
The exact equations for each action are:
74+
75+
- **Autoanalyze** = autovacuum_analyze_scale_factor * tuples + autovacuum_analyze_threshold
76+
- **Autovacuum** = autovacuum_vacuum_scale_factor * tuples + autovacuum_vacuum_threshold
77+
78+
79+
For example, analyze triggers after 60 rows change on a table that contains 100 rows, and vacuum triggers when 70 rows change on the table, using the following equations:
80+
81+
`Autoanalyze = 0.1 * 100 + 50 = 60`
82+
`Autovacuum = 0.2 * 100 + 50 = 70`
83+
84+
85+
Use the following query to list the tables in a database and identify the tables that qualify for the autovacuum process:
86+
87+
88+
```postgresql
89+
SELECT *
90+
,n_dead_tup > av_threshold AS av_needed
91+
,CASE
92+
WHEN reltuples > 0
93+
THEN round(100.0 * n_dead_tup / (reltuples))
94+
ELSE 0
95+
END AS pct_dead
96+
FROM (
97+
SELECT N.nspname
98+
,C.relname
99+
,pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins
100+
,pg_stat_get_tuples_updated(C.oid) AS n_tup_upd
101+
,pg_stat_get_tuples_deleted(C.oid) AS n_tup_del
102+
,pg_stat_get_live_tuples(C.oid) AS n_live_tup
103+
,pg_stat_get_dead_tuples(C.oid) AS n_dead_tup
104+
,C.reltuples AS reltuples
105+
,round(current_setting('autovacuum_vacuum_threshold')::INTEGER + current_setting('autovacuum_vacuum_scale_factor')::NUMERIC * C.reltuples) AS av_threshold
106+
,date_trunc('minute', greatest(pg_stat_get_last_vacuum_time(C.oid), pg_stat_get_last_autovacuum_time(C.oid))) AS last_vacuum
107+
,date_trunc('minute', greatest(pg_stat_get_last_analyze_time(C.oid), pg_stat_get_last_analyze_time(C.oid))) AS last_analyze
108+
FROM pg_class C
109+
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
110+
WHERE C.relkind IN (
111+
'r'
112+
,'t'
113+
)
114+
AND N.nspname NOT IN (
115+
'pg_catalog'
116+
,'information_schema'
117+
)
118+
AND N.nspname ! ~ '^pg_toast'
119+
) AS av
120+
ORDER BY av_needed DESC ,n_dead_tup DESC;
121+
```
122+
123+
> [!NOTE]
124+
> The query does not take into consideration that autovacuum can be configured on a per-table basis using the "alter table" DDL command. 
125+
126+
127+
## Common autovacuum problems
128+
129+
Review the possible common problems with the autovacuum process.
130+
131+
### Not keeping up with busy server
132+
133+
The autovacuum process estimates the cost of every I/O operation, accumulates a total for each operation it performs and pauses once the upper limit of the cost is reached. `autovacuum_vacuum_cost_delay` and `autovacuum_vacuum_cost_limit` are the two server parameters that are used in the process.
134+
135+
136+
By default, `autovacuum_vacuum_cost_limit` is set to –1, meaning autovacuum cost limit is the same value as the parameter `vacuum_cost_limit`, which defaults to 200. `vacuum_cost_limit` is the cost of a manual vacuum.
137+
138+
If `autovacuum_vacuum_cost_limit` is set to `-1` then autovacuum uses the `vacuum_cost_limit` parameter, but if `autovacuum_vacuum_cost_limit` itself is set to greater than `-1` then `autovacuum_vacuum_cost_limit` parameter is considered.
139+
140+
In case the autovacuum is not keeping up, the following parameters may be changed:
141+
142+
|Parameter |Description |
143+
|---------|---------|
144+
|`autovacuum_vacuum_scale_factor`| Default: `0.2`, range: `0.05 - 0.1`. The scale factor is workload-specific and should be set depending on the amount of data in the tables. Before changing the value, investigate the workload and individual table volumes. |
145+
|`autovacuum_vacuum_cost_limit`|Default: `200`. Cost limit may be increased. CPU and I/O utilization on the database should be monitored before and after making changes. |
146+
|`autovacuum_vacuum_cost_delay` | **Postgres Versions 9.6,10,11** - Default: `20 ms`. The parameter may be decreased to `2-10 ms`. </br> **Postgres Versions 12 and above** - Default: `2 ms`. |
147+
148+
> [!NOTE]
149+
> The `autovacuum_vacuum_cost_limit` value is distributed proportionally among the running autovacuum workers, so that if there is more than one, the sum of the limits for each worker does not exceed the value of the `autovacuum_vacuum_cost_limit` parameter
150+
151+
### Autovacuum constantly running
152+
153+
Continuously running autovacuum may affect CPU and IO utilization on the server. The following might be possible reasons:
154+
155+
#### `maintenance_work_mem`
156+
157+
Autovacuum daemon uses `autovacuum_work_mem` that is by default set to `-1` meaning `autovacuum_work_mem` would have the same value as the parameter `maintenance_work_mem`. This document assumes `autovacuum_work_mem` is set to `-1` and `maintenance_work_mem` is used by the autovacuum daemon.
158+
159+
If `maintenance_work_mem` is low, it may be increased to up to 2 GB on Flexible Server. A general rule of thumb is to allocate 50 MB to `maintenance_work_mem` for every 1 GB of RAM. 
160+
161+
162+
#### Large number of databases
163+
164+
Autovacuum tries to start a worker on each database every `autovacuum_naptime` seconds.
165+
166+
For example, if a server has 60 databases and `autovacuum_naptime` is set to 60 seconds, then the autovacuum worker starts every second [autovacuum_naptime/Number of DBs].
167+
168+
It is a good idea to increase `autovacuum_naptime` if there are more databases in a cluster. At the same time, the autovacuum process can be made more aggressive by increasing the `autovacuum_cost_limit` and decreasing the `autovacuum_cost_delay` parameters and increasing the `autovacuum_max_workers` from the default of 3 to 4 or 5.
169+
170+
171+
### Out of memory errors
172+
173+
Overly aggressive `maintenance_work_mem` values could periodically cause out-of-memory errors in the system. It is important to understand available RAM on the server before any change to the `maintenance_work_mem` parameter is made.
174+
175+
176+
### Autovacuum is too disruptive
177+
178+
If autovacuum is consuming a lot of resources, the following can be done:
179+
180+
#### Autovacuum parameters
181+
182+
Evaluate the parameters `autovacuum_vacuum_cost_delay`, `autovacuum_vacuum_cost_limit`, `autovacuum_max_workers`. Improperly setting autovacuum parameters may lead to scenarios where autovacuum becomes too disruptive.
183+
184+
If autovacuum is too disruptive, consider the following:
185+
186+
- Increase `autovacuum_vacuum_cost_delay` and reduce `autovacuum_vacuum_cost_limit` if set higher than the default of 200.
187+
- Reduce the number of `autovacuum_max_workers` if it is set higher than the default of 3. 
188+
189+
#### Too many autovacuum workers 
190+
191+
Increasing the number of autovacuum workers will not necessarily increase the speed of vacuum. Having a high number of autovacuum workers is not recommended.
192+
193+
Increasing the number of autovacuum workers will result in more memory consumption, and depending on the value of `maintenance_work_mem` , could cause performance degradation.
194+
195+
Each autovacuum worker process only gets (1/autovacuum_max_workers) of the total `autovacuum_cost_limit`, so having a high number of workers causes each one to go slower.
196+
197+
If the number of workers is increased, `autovacuum_vacuum_cost_limit` should also be increased and/or `autovacuum_vacuum_cost_delay` should be decreased to make the vacuum process faster.
198+
199+
However, if we have changed table level `autovacuum_vacuum_cost_delay` or `autovacuum_vacuum_cost_limit` parameters then the workers running on those tables are exempted from being considered in the balancing algorithm [autovacuum_cost_limit/autovacuum_max_workers].
200+
201+
### Autovacuum transaction ID (TXID) wraparound protection
202+
203+
When a database runs into transaction ID wraparound protection, an error message like the following can be observed:
204+
205+
```
206+
Database is not accepting commands to avoid wraparound data loss in database ‘xx’
207+
Stop the postmaster and vacuum that database in single-user mode.
208+
```
209+
210+
> [!NOTE]
211+
> This error message is a long-standing oversight. Usually, you do not need to switch to single-user mode. Instead, you can run the required VACUUM commands and perform tuning for VACUUM to run fast. While you cannot run any data manipulation language (DML), you can still run VACUUM.
212+
213+
214+
The wraparound problem occurs when the database is either not vacuumed or there are too many dead tuples that could not be removed by autovacuum. The reasons for this might be:
215+
216+
#### Heavy workload
217+
218+
The workload could cause too many dead tuples in a brief period that makes it difficult for autovacuum to catch up. The dead tuples in the system add up over a period leading to degradation of query performance and leading to wraparound situation. One reason for this situation to arise might be because autovacuum parameters aren't adequately set and it is not keeping up with a busy server.
219+
220+
221+
#### Long-running transactions
222+
223+
Any long-running transactions in the system will not allow dead tuples to be removed while autovacuum is running. They're a blocker to the vacuum process. Removing the long running transactions frees up dead tuples for deletion when autovacuum runs.
224+
225+
Long-running transactions can be detected using the following query:
226+
227+
```postgresql
228+
SELECT pid, age(backend_xid) AS age_in_xids,
229+
now () - xact_start AS xact_age,
230+
now () - query_start AS query_age,
231+
state,
232+
query
233+
FROM pg_stat_activity
234+
WHERE state != 'idle'
235+
ORDER BY 2 DESC
236+
LIMIT 10;
237+
```
238+
239+
#### Prepared statements
240+
241+
If there are prepared statements that are not committed, they would prevent dead tuples from being removed.
242+
The following query helps find non-committed prepared statements:
243+
244+
```postgresql
245+
SELECT gid, prepared, owner, database, transaction
246+
FROM pg_prepared_xacts
247+
ORDER BY age(transaction) DESC;
248+
```
249+
250+
Use COMMIT PREPARED or ROLLBACK PREPARED to commit or roll back these statements.
251+
252+
#### Unused replication slots
253+
254+
Unused replication slots prevent autovacuum from claiming dead tuples. The following query helps identify unused replication slots:
255+
256+
```postgresql
257+
SELECT slot_name, slot_type, database, xmin
258+
FROM pg_replication_slots
259+
ORDER BY age(xmin) DESC;
260+
```
261+
262+
Use `pg_drop_replication_slot()` to delete unused replication slots.
263+
264+
When the database runs into transaction ID wraparound protection, check for any blockers as mentioned previously, and remove those manually for autovacuum to continue and complete. You can also increase the speed of autovacuum by setting `autovacuum_cost_delay` to 0 and increasing the `autovacuum_cost_limit` to a value much greater than 200. However, changes to these parameters will not be applied to existing autovacuum workers. Either restart the database or kill existing workers manually to apply parameter changes.
265+
266+
267+
### Table-specific requirements 
268+
269+
Autovacuum parameters may be set for individual tables. It is especially important for small and big tables. For example, for a small table that contains only 100 rows, autovacuum triggers VACUUM operation when 70 rows change (as calculated previously). If this table is frequently updated, you might see hundreds of autovacuum operations a day. This will prevent autovacuum from maintaining other tables on which the percentage of changes aren't as big. Alternatively, a table containing a billion rows needs to change 200 million rows to trigger autovacuum operations. Setting autovacuum parameters appropriately prevents such scenarios.
270+
271+
To set autovacuum setting per table, change the server parameters as the following examples:
272+
273+
```postgresql
274+
ALTER TABLE <table name> SET (autovacuum_analyze_scale_factor = xx);
275+
ALTER TABLE <table name> SET (autovacuum_analyze_threshold = xx);
276+
ALTER TABLE <table name> SET (autovacuum_vacuum_scale_factor =xx); 
277+
ALTER TABLE <table name> SET (autovacuum_vacuum_threshold = xx); 
278+
ALTER TABLE <table name> SET (autovacuum_vacuum_cost_delay = xx); 
279+
ALTER TABLE <table name> SET (autovacuum_vacuum_cost_limit = xx); 
280+
```
281+
282+
### Insert-only workloads 
283+
284+
In versions of PostgreSQL prior to 13, autovacuum will not run on tables with an insert-only workload, because if there are no updates or deletes, there are no dead tuples and no free space that needs to be reclaimed. However, autoanalyze will run for insert-only workloads since there is new data. The disadvantages of this are:
285+
286+
- The visibility map of the tables is not updated, and thus query performance, especially where there are Index Only Scans, starts to suffer over time.
287+
- The database can run into transaction ID wraparound protection.
288+
- Hint bits will not be set.
289+
290+
#### Solutions 
291+
292+
##### Postgres versions prior to 13 
293+
294+
Using the **pg_cron** extension, a cron job can be set up to schedule a periodic vacuum analyze on the table. The frequency of the cron job depends on the workload.  
295+
296+
For step-by-step guidance using pg_cron, review [Extensions](./concepts-extensions.md).
297+
298+
299+
##### Postgres 13 and higher versions
300+
301+
Autovacuum will run on tables with an insert-only workload. Two new server parameters `autovacuum_vacuum_insert_threshold` and  `autovacuum_vacuum_insert_scale_factor` help control when autovacuum can be triggered on insert-only tables. 
302+
303+
## Next steps
304+
305+
- Troubleshoot high CPU utilization [High CPU Utilization](./how-to-high-cpu-utilization.md).
306+
- Troubleshoot high memory utilization [High Memory Utilization](./how-to-high-memory-utilization.md).
307+
- Configure server parameters [Server Parameters](./howto-configure-server-parameters-using-portal.md).

0 commit comments

Comments
 (0)