Skip to content

Commit 815d932

Browse files
Merge pull request #209691 from sarat0681/perfdocs
Perfdocs
2 parents 3a714a9 + c22c835 commit 815d932

File tree

6 files changed

+562
-28
lines changed

6 files changed

+562
-28
lines changed

articles/postgresql/TOC.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -546,9 +546,18 @@
546546
- name: Troubleshoot high memory utilization
547547
href: flexible-server/how-to-high-memory-utilization.md
548548
displayName: High Memory Utilization
549+
- name: Troubleshoot High IO utilization
550+
href: flexible-server/how-to-high-io-utilization.md
551+
displayName: High IOPS Utilization
549552
- name: Troubleshoot autovacuum
550553
href: flexible-server/how-to-autovacuum-tuning.md
551554
displayName: Autovacuum troubleshooting, tuning
555+
- name: Best practices for bulk data upload
556+
href: flexible-server/how-to-bulk-load-data.md
557+
displayName: Best practices for bulk data upload
558+
- name: Best practices for pg_dump and restore
559+
href: flexible-server/how-to-pgdump-restore.md
560+
displayName: Best practices for pg_dump and restore
552561
- name: How-to guides
553562
items:
554563
- name: Manage a server
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
---
2+
title: Bulk data uploads For Azure Database for PostgreSQL - Flexible Server
3+
description: Best practices to bulk load data in Azure Database for PostgreSQL - Flexible Server
4+
author: sarat0681
5+
ms.author: sbalijepalli
6+
ms.reviewer: maghan
7+
ms.service: postgresql
8+
ms.topic: conceptual
9+
ms.date: 08/16/2022
10+
ms.custom: template-how-to #Required; leave this attribute/value as-is.
11+
---
12+
13+
14+
# Best practices for bulk data upload for Azure Database for PostgreSQL - Flexible Server
15+
16+
There are two types of bulk loads:
17+
- Initial data load of an empty database
18+
- Incremental data loads
19+
20+
This article discusses various loading techniques along with best practices when it comes to initial data loads and incremental data loads.
21+
22+
## Loading methods
23+
24+
Performance-wise, the data loading methods arranged in the order of most time consuming to least time consuming is as follows:
25+
- Single record Insert
26+
- Batch into 100-1000 rows per commit. One can use transaction block to wrap multiple records per commit
27+
- INSERT with multi row values
28+
- COPY command
29+
30+
The preferred method to load the data into the database is by copy command. If the copy command isn't possible, batch INSERTs is the next best method. Multi-threading with a COPY command is the optimal method for bulk data loads.
31+
32+
## Best practices for initial data loads
33+
34+
#### Drop indexes
35+
36+
Before an initial data load, it's advised to drop all the indexes in the tables. It's always more efficient to create the indexes after the data load.
37+
38+
#### Drop constraints
39+
40+
##### Unique key constraints
41+
42+
To achieve strong performance, it's advised to drop unique key constraints before an initial data load, and recreate it once the data load is completed. However, dropping unique key constraints cancels the safeguards against duplicated data.
43+
44+
##### Foreign key constraints
45+
46+
It's advised to drop foreign key constraints before initial data load and recreate once data load is completed.
47+
48+
Changing the `session_replication_role` parameter to replica also disables all foreign key checks. However, be aware making the change can leave data in an inconsistent state if not properly used.
49+
50+
#### Unlogged tables
51+
52+
Use of unlogged tables will make data load faster. Data written to unlogged tables isn't written to the write-ahead log.
53+
54+
The disadvantages of using unlogged tables are
55+
- They aren't crash-safe. An unlogged table is automatically truncated after a crash or unclean shutdown.
56+
- Data from unlogged tables can't be replicated to standby servers.
57+
58+
The pros and cons of using unlogged tables should be considered before using in initial data loads.
59+
60+
Use the following options to create an unlogged table or change an existing table to unlogged table:
61+
62+
Create a new unlogged table by using the following syntax:
63+
```
64+
CREATE UNLOGGED TABLE <tablename>;
65+
```
66+
67+
Convert an existing logged table to an unlogged table by using the following syntax:
68+
```
69+
ALTER TABLE <tablename> SET UNLOGGED;
70+
```
71+
72+
#### Server parameter tuning
73+
74+
`Autovacuum`
75+
76+
During the initial data load, it's best to turn off the autovacuum. Once the initial load is completed, it's advised to run a manual VACUUM ANALYZE on all tables in the database, and then turn on autovacuum.
77+
78+
> [!NOTE]
79+
> Please follow the recommendations below only if there is enough memory and disk space.
80+
81+
`maintenance_work_mem`
82+
83+
The maintenance_work_mem can be set to a maximum of 2 GB on a flexible server. `maintenance_work_mem` helps in speeding up autovacuum, index, and foreign key creation.
84+
85+
`checkpoint_timeout`
86+
87+
On the flexible server, the checkpoint_timeout can be increased to maximum 24 h from default 5 minutes. It's advised to increase the value to 1 hour before initial data loads on Flexible server.
88+
89+
`checkpoint_completion_target`
90+
91+
A value of 0.9 is always recommended.
92+
93+
`max_wal_size`
94+
95+
The max_wal_size can be set to the maximum allowed value on the Flexible server, which is 64 GB while we do the initial data load.
96+
97+
`wal_compression`
98+
99+
wal_compression can be turned on. Enabling the parameter can have some extra CPU cost spent on the compression during WAL logging and on the decompression during WAL replay.
100+
101+
102+
#### Flexible server recommendations
103+
104+
Before the start of initial data load on a Flexible server, it's recommended to
105+
106+
- Disable high availability [HA] on the server. You can enable HA once initial load is completed on master/primary.
107+
- Create read replicas after initial data load is completed.
108+
- Make logging minimal or disable all together during initial data loads. Example: disable pgaudit, pg_stat_statements, query store.
109+
110+
111+
#### Recreating indexes and adding constraints
112+
113+
Assuming the indexes and constraints were dropped before the initial load, it's recommended to have high values of maintenance_work_mem (as recommended above) for creating indexes and adding constraints. In addition, starting with Postgres version 11, the following parameters can be modified for faster parallel index creation after initial data load:
114+
115+
`max_parallel_workers`
116+
117+
Sets the maximum number of workers that the system can support for parallel queries.
118+
119+
`max_parallel_maintenance_workers`
120+
121+
Controls the maximum number of worker processes, which can be used to CREATE INDEX.
122+
123+
One could also create the indexes by making recommended settings at the session level. An example of how it can be done at the session level is shown below:
124+
125+
```sql
126+
SET maintenance_work_mem = '2GB';
127+
SET max_parallel_workers = 16;
128+
SET max_parallel_maintenance_workers = 8;
129+
CREATE INDEX test_index ON test_table (test_column);
130+
```
131+
132+
## Best practices for incremental data loads
133+
134+
#### Table partitioning
135+
136+
It's always recommended to partition large tables. Some advantages of partitioning, especially during incremental loads:
137+
- Creation of new partitions based on the new deltas makes it efficient to add new data to the table.
138+
- Maintenance of tables becomes easier. One can drop a partition during incremental data loads avoiding time-consuming deletes on large tables.
139+
- Autovacuum would be triggered only on partitions that were changed or added during incremental loads, which make maintaining statistics on the table easier.
140+
141+
#### Maintain up-to-date table statistics
142+
143+
Monitoring and maintaining table statistics is important for query performance on the database. This also includes scenarios where you have incremental loads. PostgreSQL uses the autovacuum daemon process to clean up dead tuples and analyze the tables to keep the statistics updated. For more details on autovacuum monitoring and tuning, review [Autovacuum Tuning](./how-to-autovacuum-tuning.md).
144+
145+
#### Index creation on foreign key constraints
146+
147+
Creating indexes on foreign keys in the child tables would be beneficial in the following scenarios:
148+
- Data updates or deletions in the parent table. When data is updated or deleted in the parent table lookups would be performed on the child table. To make lookups faster, you could index foreign keys on the child table.
149+
- Queries, where we see join between parent and child tables on key columns.
150+
151+
#### Unused indexes
152+
153+
Identify unused indexes in the database and drop them. Indexes are an overhead on data loads. The fewer the indexes on a table the better the performance is during data ingestion.
154+
Unused indexes can be identified in two ways - by Query Store and an index usage query.
155+
156+
##### Query store
157+
158+
Query Store helps identify indexes, which can be dropped based on query usage patterns on the database. For step-by-step guidance, see [Query Store](./concepts-query-store.md).
159+
Once Query Store is enabled on the server, the following query can be used to identify indexes that can be dropped by connecting to azure_sys database.
160+
161+
```sql
162+
SELECT * FROM IntelligentPerformance.DropIndexRecommendations;
163+
```
164+
165+
##### Index usage
166+
167+
The below query can also be used to identify unused indexes:
168+
169+
```sql
170+
SELECT
171+
t.schemaname,
172+
t.tablename,
173+
c.reltuples::bigint AS num_rows,
174+
pg_size_pretty(pg_relation_size(c.oid)) AS table_size,
175+
psai.indexrelname AS index_name,
176+
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
177+
CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique",
178+
psai.idx_scan AS number_of_scans,
179+
psai.idx_tup_read AS tuples_read,
180+
psai.idx_tup_fetch AS tuples_fetched
181+
FROM
182+
pg_tables t
183+
LEFT JOIN pg_class c ON t.tablename = c.relname
184+
LEFT JOIN pg_index i ON c.oid = i.indrelid
185+
LEFT JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
186+
WHERE
187+
t.schemaname NOT IN ('pg_catalog', 'information_schema')
188+
ORDER BY 1, 2;
189+
```
190+
191+
Number_of_scans, tuples_read, and tuples_fetched columns would indicate index usage.number_of_scans column value of zero points to index not being used.
192+
193+
#### Server parameter tuning
194+
195+
> [!NOTE]
196+
> Please follow the recommendations below only if there is enough memory and disk space.
197+
198+
`maintenance_work_mem`
199+
200+
The maintenance_work_mem parameter can be set to a maximum of 2 GB on Flexible Server. `maintenance_work_mem` helps speed up index creation and foreign key additions.
201+
202+
`checkpoint_timeout`
203+
204+
On the Flexible Server, the checkpoint_timeout parameter can be increased to 10 minutes or 15 minutes from the default 5 minutes. Increasing `checkpoint_timeout` to a larger value, such as 15 minutes, can reduce the I/O load, but the downside is that it takes longer to recover if there was a crash. Careful consideration is recommended before making the change.
205+
206+
`checkpoint_completion_target`
207+
208+
A value of 0.9 is always recommended.
209+
210+
`max_wal_size`
211+
212+
The max_wal_size depends on SKU, storage, and workload.
213+
214+
One way to arrive at the correct value for max_wal_size is shown below.
215+
216+
During peak business hours, follow the below steps to arrive at a value:
217+
218+
- Take the current WAL LSN by executing the below query:
219+
220+
```sql
221+
SELECT pg_current_wal_lsn ();
222+
```
223+
224+
- Wait for checkpoint_timeout number of seconds. Take the current WAL LSN by executing the below query:
225+
226+
```sql
227+
SELECT pg_current_wal_lsn ();
228+
```
229+
230+
- Use the two results to check the difference in GB:
231+
232+
```sql
233+
SELECT round (pg_wal_lsn_diff('LSN value when run second time','LSN value when run first time')/1024/1024/1024,2) WAL_CHANGE_GB;
234+
```
235+
236+
`wal_compression`
237+
238+
wal_compression can be turned on. Enabling the parameter can have some extra CPU cost spent on the compression during WAL logging and on the decompression during WAL replay.
239+
240+
241+
## Next steps
242+
- Troubleshoot high CPU utilization [High CPU Utilization](./how-to-high-CPU-utilization.md).
243+
- Troubleshoot high memory utilization [High Memory Utilization](./how-to-high-memory-utilization.md).
244+
- Configure server parameters [Server Parameters](./howto-configure-server-parameters-using-portal.md).
245+
- Troubleshoot and tune Autovacuum [Autovacuum Tuning](./how-to-autovacuum-tuning.md).
246+
- Troubleshoot high CPU utilization [High IOPS Utilization](./how-to-high-io-utilization.md).

articles/postgresql/flexible-server/how-to-high-cpu-utilization.md

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: High CPU Utilization
33
description: Troubleshooting guide for high cpu utilization in Azure Database for PostgreSQL - Flexible Server
44
ms.author: sbalijepalli
55
author: sarat0681
6+
ms.reviewer: maghan
67
ms.service: postgresql
78
ms.subservice: flexible-server
89
ms.topic: conceptual
@@ -11,15 +12,16 @@ ms.date: 08/03/2022
1112

1213
# Troubleshoot high CPU utilization in Azure Database for PostgreSQL - Flexible Server
1314

15+
[!INCLUDE [applies-to-postgresql-flexible-server](../includes/applies-to-postgresql-flexible-server.md)]
16+
1417
This article shows you how to quickly identify the root cause of high CPU utilization, and possible remedial actions to control CPU utilization when using [Azure Database for PostgreSQL - Flexible Server](overview.md).
1518

16-
In this article, you will learn:
19+
In this article, you'll learn:
1720

1821
- About tools to identify high CPU utilization such as Azure Metrics, Query Store, and pg_stat_statements.
1922
- How to identify root causes, such as long running queries and total connections.
2023
- How to resolve high CPU utilization by using Explain Analyze, Connection Pooling, and Vacuuming tables.
2124

22-
2325
## Tools to identify high CPU utilization
2426

2527
Consider these tools to identify high CPU utilization.
@@ -28,7 +30,6 @@ Consider these tools to identify high CPU utilization.
2830

2931
Azure Metrics is a good starting point to check the CPU utilization for the definite date and period. Metrics give information about the time duration during which the CPU utilization is high. Compare the graphs of Write IOPs, Read IOPs, Read Throughput, and Write Throughput with CPU utilization to find out times when the workload caused high CPU. For proactive monitoring, you can configure alerts on the metrics. For step-by-step guidance, see [Azure Metrics](./howto-alert-on-metrics.md).
3032

31-
3233
### Query Store
3334

3435
Query Store automatically captures the history of queries and runtime statistics, and it retains them for your review. It slices the data by time so that you can see temporal usage patterns. Data for all users, databases and queries is stored in a database named azure_sys in the Azure Database for PostgreSQL instance. For step-by-step guidance, see [Query Store](./concepts-query-store.md).
@@ -41,7 +42,6 @@ The pg_stat_statements extension helps identify queries that consume time on the
4142

4243
##### [Postgres v13 & above](#tab/postgres-13)
4344

44-
4545
For Postgres versions 13 and above, use the following statement to view the top five SQL statements by mean or average execution time:
4646

4747
```postgresql
@@ -51,19 +51,16 @@ ORDER BY mean_exec_time
5151
DESC LIMIT 5;
5252
```
5353

54-
5554
##### [Postgres v9.6-12](#tab/postgres9-12)
5655

5756
For Postgres versions 9.6, 10, 11, and 12, use the following statement to view the top five SQL statements by mean or average execution time:
5857

59-
6058
```postgresql
6159
SELECT userid::regrole, dbid, query
6260
FROM pg_stat_statements
6361
ORDER BY mean_time
6462
DESC LIMIT 5;
6563
```
66-
6764
---
6865

6966
#### Total execution time
@@ -115,7 +112,7 @@ ORDER BY duration DESC;
115112

116113
### Total number of connections and number connections by state
117114

118-
A large number of connections to the database is also another issue that might lead to increased CPU as well as memory utilization.
115+
A large number of connections to the database is also another issue that might lead to increased CPU and memory utilization.
119116

120117

121118
The following query gives information about the number of connections by state:
@@ -139,7 +136,7 @@ For more information about the **EXPLAIN** command, review [Explain Plan](https:
139136

140137
### PGBouncer and connection pooling
141138

142-
In situations where there are lots of idle connections or lot of connections which are consuming the CPU consider use of a connection pooler like PgBouncer.
139+
In situations where there are lots of idle connections or lot of connections, which are consuming the CPU consider use of a connection pooler like PgBouncer.
143140

144141
For more details about PgBouncer, review:
145142

@@ -149,12 +146,11 @@ For more details about PgBouncer, review:
149146

150147
Azure Database for Flexible Server offers PgBouncer as a built-in connection pooling solution. For more information, see [PgBouncer](./concepts-pgbouncer.md)
151148

152-
153149
### Terminating long running transactions
154150

155151
You could consider killing a long running transaction as an option.
156152

157-
To terminate a session's PID, you will need to detect the PID using the following query:
153+
To terminate a session's PID, you'll need to detect the PID using the following query:
158154

159155
```postgresql
160156
SELECT pid, usename, datname, query, now() - xact_start as duration
@@ -165,7 +161,7 @@ ORDER BY duration DESC;
165161

166162
You can also filter by other properties like `usename` (username), `datname` (database name) etc.
167163

168-
Once you have the session's PID you can terminate using the following query:
164+
Once you have the session's PID, you can terminate using the following query:
169165

170166
```postgresql
171167
SELECT pg_terminate_backend(pid);
@@ -175,15 +171,14 @@ SELECT pg_terminate_backend(pid);
175171

176172
Keeping table statistics up to date helps improve query performance. Monitor whether regular autovacuuming is being carried out.
177173

178-
179174
The following query helps to identify the tables that need vacuuming:
180175

181176
```postgresql
182177
select schemaname,relname,n_dead_tup,n_live_tup,last_vacuum,last_analyze,last_autovacuum,last_autoanalyze
183178
from pg_stat_all_tables where n_live_tup > 0;  
184179
```
185180

186-
`last_autovacuum` and `last_autoanalyze` columns give the date and time when the table was last autovacuumed or analyzed. If the tables are not being vacuumed regularly, take steps to tune autovacuum. For more information about autovacuum troubleshooting and tuning, see [Autovacuum Troubleshooting](./how-to-autovacuum-tuning.md).
181+
`last_autovacuum` and `last_autoanalyze` columns give the date and time when the table was last autovacuumed or analyzed. If the tables aren't being vacuumed regularly, take steps to tune autovacuum. For more information about autovacuum troubleshooting and tuning, see [Autovacuum Troubleshooting](./how-to-autovacuum-tuning.md).
187182

188183

189184
A short-term solution would be to do a manual vacuum analyze of the tables where slow queries are seen:

0 commit comments

Comments
 (0)