Skip to content

Commit 6290d5d

Browse files
tglsfdctuhaihe
authored andcommitted
Fix pg_dump for hash partitioning on enum columns.
Hash partitioning on an enum is problematic because the hash codes are derived from the OIDs assigned to the enum values, which will almost certainly be different after a dump-and-reload than they were before. This means that some rows probably end up in different partitions than before, causing restore to fail because of partition constraint violations. (pg_upgrade dodges this problem by using hacks to force the enum values to keep the same OIDs, but that's not possible nor desirable for pg_dump.) Users can work around that by specifying --load-via-partition-root, but since that's a dump-time not restore-time decision, one might find out the need for it far too late. Instead, teach pg_dump to apply that option automatically when dealing with a partitioned table that has hash-on-enum partitioning. Also deal with a pre-existing issue for --load-via-partition-root mode: in a parallel restore, we try to TRUNCATE target tables just before loading them, in order to enable some backend optimizations. This is bad when using --load-via-partition-root because (a) we're likely to suffer deadlocks from restore jobs trying to restore rows into other partitions than they came from, and (b) if we miss getting a deadlock we might still lose data due to a TRUNCATE removing rows from some already-completed restore job. The fix for this is conceptually simple: just don't TRUNCATE if we're dealing with a --load-via-partition-root case. The tricky bit is for pg_restore to identify those cases. In dumps using COPY commands we can inspect each COPY command to see if it targets the nominal target table or some ancestor. However, in dumps using INSERT commands it's pretty impractical to examine the INSERTs in advance. To provide a solution for that going forward, modify pg_dump to mark TABLE DATA items that are using --load-via-partition-root with a comment. (This change also responds to a complaint from Robert Haas that the dump output for --load-via-partition-root is pretty confusing.) pg_restore checks for the special comment as well as checking the COPY command if present. This will fail to identify the combination of --load-via-partition-root and --inserts in pre-existing dump files, but that should be a pretty rare case in the field. If it does happen you will probably get a deadlock failure that you can work around by not using parallel restore, which is the same as before this bug fix. Having done this, there seems no remaining reason for the alarmism in the pg_dump man page about combining --load-via-partition-root with parallel restore, so remove that warning. Patch by me; thanks to Julien Rouhaud for review. Back-patch to v11 where hash partitioning was introduced. Discussion: https://postgr.es/m/[email protected]
1 parent 54a402c commit 6290d5d

File tree

8 files changed

+392
-60
lines changed

8 files changed

+392
-60
lines changed

doc/src/sgml/ref/pg_dump.sgml

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -863,16 +863,6 @@ PostgreSQL documentation
863863
and the two systems have different definitions of the collation used
864864
to sort the partitioning column.
865865
</para>
866-
867-
<para>
868-
It is best not to use parallelism when restoring from an archive made
869-
with this option, because <application>pg_restore</application> will
870-
not know exactly which partition(s) a given archive data item will
871-
load data into. This could result in inefficiency due to lock
872-
conflicts between parallel jobs, or perhaps even reload failures due
873-
to foreign key constraints being set up before all the relevant data
874-
is loaded.
875-
</para>
876866
</listitem>
877867
</varlistentry>
878868

doc/src/sgml/ref/pg_dumpall.sgml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -395,10 +395,6 @@ PostgreSQL documentation
395395
and the two systems have different definitions of the collation used
396396
to sort the partitioning column.
397397
</para>
398-
399-
<!-- Currently, we don't need pg_dump's warning about parallelism here,
400-
since parallel restore from a pg_dumpall script is impossible.
401-
-->
402398
</listitem>
403399
</varlistentry>
404400

src/bin/pg_dump/common.c

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,9 @@ getSchemaData(Archive *fout, int *numTablesPtr)
250250
pg_log_info("flagging inherited columns in subtables");
251251
flagInhAttrs(fout->dopt, tblinfo, numTables);
252252

253+
pg_log_info("reading partitioning data");
254+
getPartitioningInfo(fout);
255+
253256
pg_log_info("reading indexes");
254257
getIndexes(fout, tblinfo, numTables);
255258

@@ -302,7 +305,6 @@ static void
302305
flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
303306
InhInfo *inhinfo, int numInherits)
304307
{
305-
DumpOptions *dopt = fout->dopt;
306308
int i,
307309
j;
308310

@@ -318,26 +320,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
318320
continue;
319321

320322
/*
321-
* FIXME: In PostgreSQL, foreign tables can be inherited. But
322-
* pg_dump chokes on external tables, if an external table is
323-
* used as a partition, and a column has attislocal=false.
324-
*/
325-
if (tblinfo[i].relstorage == 'x' /* RELSTORAGE_EXTERNAL */)
326-
continue;
327-
328-
/*
329-
* Normally, we don't bother computing anything for non-target tables,
330-
* but if load-via-partition-root is specified, we gather information
331-
* on every partition in the system so that getRootTableInfo can trace
332-
* from any given to leaf partition all the way up to the root. (We
333-
* don't need to mark them as interesting for getTableAttrs, though.)
323+
* Normally, we don't bother computing anything for non-target tables.
324+
* However, we must find the parents of non-root partitioned tables in
325+
* any case, so that we can trace from leaf partitions up to the root
326+
* (in case a leaf is to be dumped but its parents are not). We need
327+
* not mark such parents interesting for getTableAttrs, though.
334328
*/
335329
if (!tblinfo[i].dobj.dump)
336330
{
337331
mark_parents = false;
338332

339-
if (!dopt->load_via_partition_root ||
340-
!tblinfo[i].ispartition)
333+
if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE &&
334+
tblinfo[i].ispartition))
341335
find_parents = false;
342336
}
343337

src/bin/pg_dump/meson.build

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Copyright (c) 2022-2023, PostgreSQL Global Development Group
2+
3+
pg_dump_common_sources = files(
4+
'compress_gzip.c',
5+
'compress_io.c',
6+
'compress_lz4.c',
7+
'compress_none.c',
8+
'dumputils.c',
9+
'parallel.c',
10+
'pg_backup_archiver.c',
11+
'pg_backup_custom.c',
12+
'pg_backup_db.c',
13+
'pg_backup_directory.c',
14+
'pg_backup_null.c',
15+
'pg_backup_tar.c',
16+
'pg_backup_utils.c',
17+
)
18+
19+
pg_dump_common = static_library('libpgdump_common',
20+
pg_dump_common_sources,
21+
c_pch: pch_postgres_fe_h,
22+
dependencies: [frontend_code, libpq, lz4, zlib],
23+
kwargs: internal_lib_args,
24+
)
25+
26+
27+
pg_dump_sources = files(
28+
'common.c',
29+
'pg_dump.c',
30+
'pg_dump_sort.c',
31+
)
32+
33+
if host_system == 'windows'
34+
pg_dump_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
35+
'--NAME', 'pg_dump',
36+
'--FILEDESC', 'pg_dump - backup one PostgreSQL database',])
37+
endif
38+
39+
pg_dump = executable('pg_dump',
40+
pg_dump_sources,
41+
link_with: [pg_dump_common],
42+
dependencies: [frontend_code, libpq, zlib],
43+
kwargs: default_bin_args,
44+
)
45+
bin_targets += pg_dump
46+
47+
48+
pg_dumpall_sources = files(
49+
'pg_dumpall.c',
50+
)
51+
52+
if host_system == 'windows'
53+
pg_dumpall_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
54+
'--NAME', 'pg_dumpall',
55+
'--FILEDESC', 'pg_dumpall - backup PostgreSQL databases'])
56+
endif
57+
58+
pg_dumpall = executable('pg_dumpall',
59+
pg_dumpall_sources,
60+
link_with: [pg_dump_common],
61+
dependencies: [frontend_code, libpq, zlib],
62+
kwargs: default_bin_args,
63+
)
64+
bin_targets += pg_dumpall
65+
66+
67+
pg_restore_sources = files(
68+
'pg_restore.c',
69+
)
70+
71+
if host_system == 'windows'
72+
pg_restore_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
73+
'--NAME', 'pg_restore',
74+
'--FILEDESC', 'pg_restore - restore PostgreSQL databases'])
75+
endif
76+
77+
pg_restore = executable('pg_restore',
78+
pg_restore_sources,
79+
link_with: [pg_dump_common],
80+
dependencies: [frontend_code, libpq, zlib],
81+
kwargs: default_bin_args,
82+
)
83+
bin_targets += pg_restore
84+
85+
tests += {
86+
'name': 'pg_dump',
87+
'sd': meson.current_source_dir(),
88+
'bd': meson.current_build_dir(),
89+
'tap': {
90+
'env': {
91+
'GZIP_PROGRAM': gzip.path(),
92+
'LZ4': program_lz4.found() ? program_lz4.path() : '',
93+
'with_icu': icu.found() ? 'yes' : 'no',
94+
},
95+
'tests': [
96+
't/001_basic.pl',
97+
't/002_pg_dump.pl',
98+
't/003_pg_dump_with_server.pl',
99+
't/004_pg_dump_parallel.pl',
100+
't/010_dump_connstr.pl',
101+
],
102+
},
103+
}
104+
105+
subdir('po', if_found: libintl)

src/bin/pg_dump/pg_backup_archiver.c

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ static RestorePass _tocEntryRestorePass(TocEntry *te);
9393
static bool _tocEntryIsACL(TocEntry *te);
9494
static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
9595
static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
96+
static bool is_load_via_partition_root(TocEntry *te);
9697
static void buildTocEntryArrays(ArchiveHandle *AH);
9798
static void _moveBefore(TocEntry *pos, TocEntry *te);
9899
static int _discoverArchiveFormat(ArchiveHandle *AH);
@@ -892,6 +893,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
892893
}
893894
else
894895
{
896+
bool use_truncate;
897+
895898
_disableTriggersIfNecessary(AH, te);
896899

897900
/* Select owner and schema as necessary */
@@ -903,13 +906,24 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
903906

904907
/*
905908
* In parallel restore, if we created the table earlier in
906-
* the run then we wrap the COPY in a transaction and
907-
* precede it with a TRUNCATE. If archiving is not on
908-
* this prevents WAL-logging the COPY. This obtains a
909-
* speedup similar to that from using single_txn mode in
910-
* non-parallel restores.
909+
* this run (so that we know it is empty) and we are not
910+
* restoring a load-via-partition-root data item then we
911+
* wrap the COPY in a transaction and precede it with a
912+
* TRUNCATE. If wal_level is set to minimal this prevents
913+
* WAL-logging the COPY. This obtains a speedup similar
914+
* to that from using single_txn mode in non-parallel
915+
* restores.
916+
*
917+
* We mustn't do this for load-via-partition-root cases
918+
* because some data might get moved across partition
919+
* boundaries, risking deadlock and/or loss of previously
920+
* loaded data. (We assume that all partitions of a
921+
* partitioned table will be treated the same way.)
911922
*/
912-
if (is_parallel && te->created)
923+
use_truncate = is_parallel && te->created &&
924+
!is_load_via_partition_root(te);
925+
926+
if (use_truncate)
913927
{
914928
/*
915929
* Parallel restore is always talking directly to a
@@ -950,7 +964,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
950964
AH->outputKind = OUTPUT_SQLCMDS;
951965

952966
/* close out the transaction started above */
953-
if (is_parallel && te->created)
967+
if (use_truncate)
954968
CommitTransaction(&AH->public);
955969

956970
_enableTriggersIfNecessary(AH, te);
@@ -1048,6 +1062,43 @@ _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te)
10481062
fmtQualifiedId(te->namespace, te->tag));
10491063
}
10501064

1065+
/*
1066+
* Detect whether a TABLE DATA TOC item is performing "load via partition
1067+
* root", that is the target table is an ancestor partition rather than the
1068+
* table the TOC item is nominally for.
1069+
*
1070+
* In newer archive files this can be detected by checking for a special
1071+
* comment placed in te->defn. In older files we have to fall back to seeing
1072+
* if the COPY statement targets the named table or some other one. This
1073+
* will not work for data dumped as INSERT commands, so we could give a false
1074+
* negative in that case; fortunately, that's a rarely-used option.
1075+
*/
1076+
static bool
1077+
is_load_via_partition_root(TocEntry *te)
1078+
{
1079+
if (te->defn &&
1080+
strncmp(te->defn, "-- load via partition root ", 27) == 0)
1081+
return true;
1082+
if (te->copyStmt && *te->copyStmt)
1083+
{
1084+
PQExpBuffer copyStmt = createPQExpBuffer();
1085+
bool result;
1086+
1087+
/*
1088+
* Build the initial part of the COPY as it would appear if the
1089+
* nominal target table is the actual target. If we see anything
1090+
* else, it must be a load-via-partition-root case.
1091+
*/
1092+
appendPQExpBuffer(copyStmt, "COPY %s ",
1093+
fmtQualifiedId(te->namespace, te->tag));
1094+
result = strncmp(te->copyStmt, copyStmt->data, copyStmt->len) != 0;
1095+
destroyPQExpBuffer(copyStmt);
1096+
return result;
1097+
}
1098+
/* Assume it's not load-via-partition-root */
1099+
return false;
1100+
}
1101+
10511102
/*
10521103
* This is a routine that is part of the dumper interface, hence the 'Archive*' parameter.
10531104
*/
@@ -3020,8 +3071,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
30203071
res = res & ~REQ_DATA;
30213072
}
30223073

3023-
/* If there's no definition command, there's no schema component */
3024-
if (!te->defn || !te->defn[0])
3074+
/*
3075+
* If there's no definition command, there's no schema component. Treat
3076+
* "load via partition root" comments as not schema.
3077+
*/
3078+
if (!te->defn || !te->defn[0] ||
3079+
strncmp(te->defn, "-- load via partition root ", 27) == 0)
30253080
res = res & ~REQ_SCHEMA;
30263081

30273082
/*

0 commit comments

Comments
 (0)