Skip to content

Commit 3f58c32

Browse files
author
Wen Lin
authored
Use ExclusiveLock for accessing the table pg_resgroupcapability when CREATE/ALTER resouce group.
In some scenarios, the AccessExclusiveLock for table pg_resgroupcapability may cause database setup/recovery pending. Below is why we need change the AccessExclusiveLock to ExclusiveLock. This lock on table pg_resgroupcapability is used to concurrent update this table when run "Create/Alter resource group" statement. There is a CPU limit, after modify one resource group, it has to check if the whole CPU usage of all resource groups doesn't exceed 100%. Before this fix, AccessExclusiveLock is used. Suppose one user is running "Alter resource group" statement, QD will dispatch this statement to all QEs, so it is a two phase commit(2PC) transaction. When QD dispatched "Alter resource group" statement and QE acquire the AccessExclusiveLock for table pg_resgroupcapability. Until the 2PC distributed transaction committed, QE can release the AccessExclusiveLock for this table. In the second phase, QD will call function doNotifyingCommitPrepared to broadcast "commit prepared" command to all QEs, QE has already finish prepared, this transation is a prepared transaction. Suppose at this point, there is a primary segment down and a mirror will be promoted to primary. The mirror got the "promoted" message from coordinator, and will recover based on xlog from primary, in order to recover the prepared transaction, it will read the prepared transaction log entry and acquire AccessExclusiveLock for table pg_resgroupcapability. The callstack is: #0 lock_twophase_recover (xid=, info=, recdata=, len=) at lock.c:4697 #1 ProcessRecords (callbacks=, xid=2933, bufptr=0x1d575a8 "") at twophase.c:1757 #2 RecoverPreparedTransactions () at twophase.c:2214 #3 StartupXLOG () at xlog.c:8013 #4 StartupProcessMain () at startup.c:231 #5 AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7fff84b94a70) at bootstrap.c:459 #6 StartChildProcess (type=StartupProcess) at postmaster.c:5917 #7 PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x1d555b0) at postmaster.c:1581 #8 main (argc=7, argv=0x1d555b0) at main.c:240 After that, the database instance will start up, all related initialization functions will be called. However, there is a function named "InitResGroups", it will acquire AccessShareLock for table pg_resgroupcapability and do some initialization stuff. The callstack is: #6 WaitOnLock (locallock=locallock@entry=0x1c7f248, owner=owner@entry=0x1ca0a40) at lock.c:1999 #7 LockAcquireExtended (locktag=locktag@entry=0x7ffd15d18d90, lockmode=lockmode@entry=1, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0x7ffd15d18d88) at lock.c:1192 #8 LockRelationOid (relid=6439, lockmode=1) at lmgr.c:126 #9 relation_open (relationId=relationId@entry=6439, lockmode=lockmode@entry=1) at relation.c:56 #10 table_open (relationId=relationId@entry=6439, lockmode=lockmode@entry=1) at table.c:47 #11 InitResGroups () at resgroup.c:581 #12 InitResManager () at resource_manager.c:83 #13 initPostgres (in_dbname=, dboid=dboid@entry=0, username=username@entry=0x1c5b730 "linw", useroid=useroid@entry=0, out_dbname=out_dbname@entry=0x0, override_allow_connections=override_allow_connections@entry=false) at postinit.c:1284 #14 PostgresMain (argc=1, argv=argv@entry=0x1c8af78, dbname=0x1c89e70 "postgres", username=0x1c5b730 "linw") at postgres.c:4812 #15 BackendRun (port=, port=) at postmaster.c:4922 #16 BackendStartup (port=0x1c835d0) at postmaster.c:4607 #17 ServerLoop () at postmaster.c:1963 #18 PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x1c595b0) at postmaster.c:1589 #19 in main (argc=7, argv=0x1c595b0) at main.c:240 The AccessExclusiveLock is not released, and it is not compatible with any other locks, so the startup process will be pending on this lock. So the mirror can't become primary successfully. Even users run "gprecoverseg" to recover the primary segment. the result is similar. The primary segment will recover from xlog, it will recover prepared transactions and acquire AccessExclusiveLock for table pg_resgroupcapability. Then the startup process is pending on this lock. Unless users change the resource type to "queue", the function InitResGroups will not be called, and won't be blocked, then the primary segment can startup normally. After this fix, ExclusiveLock is acquired when alter resource group. In above case, the startup process acquires AccessShareLock, ExclusiveLock and AccessShareLock are compatible. The startup process can run successfully. After startup, QE will get RECOVERY_COMMIT_PREPARED command from QD, it will finish the second phase of this distributed transaction and release ExclusiveLock on table pg_resgroupcapability. The callstack is: #0 lock_twophase_postcommit (xid=, info=, recdata=0x3303458, len=) at lock.c:4758 #1 ProcessRecords (callbacks=, xid=, bufptr=0x3303458 "") at twophase.c:1757 #2 FinishPreparedTransaction (gid=gid@entry=0x323caf5 "25", isCommit=isCommit@entry=true, raiseErrorIfNotFound=raiseErrorIfNotFound@entry=false) at twophase.c:1704 #3 in performDtxProtocolCommitPrepared (gid=gid@entry=0x323caf5 "25", raiseErrorIfNotFound=raiseErrorIfNotFound@entry=false) at cdbtm.c:2107 #4 performDtxProtocolCommand (dtxProtocolCommand=dtxProtocolCommand@entry=DTX_PROTOCOL_COMMAND_RECOVERY_COMMIT_PREPARED, gid=gid@entry=0x323caf5 "25", contextInfo=contextInfo@entry=0x10e1820 ) at cdbtm.c:2279 #5 exec_mpp_dtx_protocol_command (contextInfo=0x10e1820 , gid=0x323caf5 "25", loggingStr=0x323cad8 "Recovery Commit Prepared", dtxProtocolCommand=DTX_PROTOCOL_COMMAND_RECOVERY_COMMIT_PREPARED) at postgres.c:1570 #6 PostgresMain (argc=, argv=argv@entry=0x3268f98, dbname=0x3267e90 "postgres", username=) at postgres.c:5482 The test case of this commit simulates a repro of this bug.
1 parent 2fa7c06 commit 3f58c32

File tree

4 files changed

+187
-5
lines changed

4 files changed

+187
-5
lines changed

src/backend/commands/resgroupcmds.c

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -136,10 +136,10 @@ CreateResourceGroup(CreateResourceGroupStmt *stmt)
136136
/*
137137
* both CREATE and ALTER resource group need check the sum of cpu_rate_limit
138138
* and memory_limit and make sure the sum don't exceed 100. To make it simple,
139-
* acquire AccessExclusiveLock lock on pg_resgroupcapability at the beginning
139+
* acquire ExclusiveLock lock on pg_resgroupcapability at the beginning
140140
* of CREATE and ALTER
141141
*/
142-
pg_resgroupcapability_rel = table_open(ResGroupCapabilityRelationId, AccessExclusiveLock);
142+
pg_resgroupcapability_rel = table_open(ResGroupCapabilityRelationId, ExclusiveLock);
143143
pg_resgroup_rel = table_open(ResGroupRelationId, RowExclusiveLock);
144144

145145
/* Check if MaxResourceGroups limit is reached */
@@ -428,11 +428,18 @@ AlterResourceGroup(AlterResourceGroupStmt *stmt)
428428
/*
429429
* In validateCapabilities() we scan all the resource groups
430430
* to check whether the total cpu_rate_limit exceed 100 or not.
431-
* We need to use AccessExclusiveLock here to prevent concurrent
432-
* increase on different resource group.
431+
* We use ExclusiveLock here to prevent concurrent
432+
* increase on different resource group.
433+
* We can't use AccessExclusiveLock here, the reason is that,
434+
* if there is a database recovery happened when run "alter resource group"
435+
* and acquire this kind of lock, the initialization of resource group
436+
* in function InitResGroups will be pending during database startup,
437+
* since this function will open this table with AccessShareLock,
438+
* AccessExclusiveLock is not compatible with any other lock.
439+
* ExclusiveLock and AccessShareLock are compatible.
433440
*/
434441
pg_resgroupcapability_rel = heap_open(ResGroupCapabilityRelationId,
435-
AccessExclusiveLock);
442+
ExclusiveLock);
436443

437444
/* Load current resource group capabilities */
438445
GetResGroupCapabilities(pg_resgroupcapability_rel, groupid, &oldCaps);
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
-- This test performs segment reconfiguration when "alter resource group" is executed in the two phase commit.
2+
-- The steps are, when run "alter resource group", before QD broadcasts commit prepared command to QEs(the
3+
-- second phase of 2PC), we trigger an error and cause one primary segment down.
4+
-- The expectation is "alter resource group" can run successfully since the mirror segment is UP.
5+
-- After recover the segment, there is no error or blocking.
6+
7+
-- set these values purely to cut down test time, as default fts trigger is
8+
-- every min and 5 retries
9+
alter system set gp_fts_probe_interval to 10;
10+
ALTER
11+
alter system set gp_fts_probe_retries to 0;
12+
ALTER
13+
select pg_reload_conf();
14+
pg_reload_conf
15+
----------------
16+
t
17+
(1 row)
18+
19+
1:create resource group rgroup_seg_down with (CPU_RATE_LIMIT=35, MEMORY_LIMIT=35, CONCURRENCY=10);
20+
CREATE
21+
22+
-- inject an error in function dtm_broadcast_commit_prepared, that is before QD broadcasts commit prepared command to QEs
23+
2:select gp_inject_fault_infinite('dtm_broadcast_commit_prepared', 'suspend', dbid) from gp_segment_configuration where role='p' and content=-1;
24+
gp_inject_fault_infinite
25+
--------------------------
26+
Success:
27+
(1 row)
28+
-- this session will pend here since the above injected fault
29+
1&:alter resource group rgroup_seg_down set CONCURRENCY 20; <waiting ...>
30+
-- this injected fault can make dispatcher think the primary is down
31+
2:select gp_inject_fault('fts_conn_startup_packet', 'error', dbid) from gp_segment_configuration where role='p' and content=0;
32+
gp_inject_fault
33+
-----------------
34+
Success:
35+
(1 row)
36+
2:select gp_request_fts_probe_scan();
37+
gp_request_fts_probe_scan
38+
---------------------------
39+
t
40+
(1 row)
41+
-- make sure one primary segment is down.
42+
2:select status = 'd' from gp_segment_configuration where content = 0 and role = 'm';
43+
?column?
44+
----------
45+
t
46+
(1 row)
47+
-- reset the injected fault on QD and the "alter resource group" in session1 can continue
48+
2:select gp_inject_fault('dtm_broadcast_commit_prepared', 'reset', dbid) from gp_segment_configuration where role='p' and content=-1;
49+
gp_inject_fault
50+
-----------------
51+
Success:
52+
(1 row)
53+
-- reset the injected fault on primary segment
54+
2:select gp_inject_fault('fts_conn_startup_packet', 'reset', dbid) from gp_segment_configuration where content=0;
55+
gp_inject_fault
56+
-----------------
57+
Success:
58+
Success:
59+
(2 rows)
60+
1<: <... completed>
61+
ALTER
62+
-- make sure "alter resource group" has taken effect.
63+
1:select concurrency from gp_toolkit.gp_resgroup_config where groupname = 'rgroup_seg_down';
64+
concurrency
65+
-------------
66+
20
67+
(1 row)
68+
2q: ... <quitting>
69+
70+
!\retcode gprecoverseg -aF --no-progress;
71+
-- start_ignore
72+
-- end_ignore
73+
(exited with code 0)
74+
75+
-- loop while segments come in sync
76+
1:select wait_until_all_segments_synchronized();
77+
wait_until_all_segments_synchronized
78+
--------------------------------------
79+
OK
80+
(1 row)
81+
82+
!\retcode gprecoverseg -ar;
83+
-- start_ignore
84+
-- end_ignore
85+
(exited with code 0)
86+
87+
-- loop while segments come in sync
88+
1:select wait_until_all_segments_synchronized();
89+
wait_until_all_segments_synchronized
90+
--------------------------------------
91+
OK
92+
(1 row)
93+
94+
-- verify no segment is down after recovery
95+
1:select count(*) from gp_segment_configuration where status = 'd';
96+
count
97+
-------
98+
0
99+
(1 row)
100+
101+
-- verify resource group
102+
1:select concurrency from gp_toolkit.gp_resgroup_config where groupname = 'rgroup_seg_down';
103+
concurrency
104+
-------------
105+
20
106+
(1 row)
107+
1:drop resource group rgroup_seg_down;
108+
DROP
109+
110+
1:alter system reset gp_fts_probe_interval;
111+
ALTER
112+
1:alter system reset gp_fts_probe_retries;
113+
ALTER
114+
1:select pg_reload_conf();
115+
pg_reload_conf
116+
----------------
117+
t
118+
(1 row)
119+
1q: ... <quitting>
120+

src/test/isolation2/isolation2_resgroup_schedule

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ test: resgroup/resgroup_name_convention
1010
# fault injection tests
1111
test: resgroup/resgroup_assign_slot_fail
1212
test: resgroup/resgroup_unassign_entrydb
13+
test: resgroup/resgroup_seg_down_2pc
1314

1415
# functions
1516
test: resgroup/resgroup_concurrency
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
-- This test performs segment reconfiguration when "alter resource group" is executed in the two phase commit.
2+
-- The steps are, when run "alter resource group", before QD broadcasts commit prepared command to QEs(the
3+
-- second phase of 2PC), we trigger an error and cause one primary segment down.
4+
-- The expectation is "alter resource group" can run successfully since the mirror segment is UP.
5+
-- After recover the segment, there is no error or blocking.
6+
7+
-- set these values purely to cut down test time, as default fts trigger is
8+
-- every min and 5 retries
9+
alter system set gp_fts_probe_interval to 10;
10+
alter system set gp_fts_probe_retries to 0;
11+
select pg_reload_conf();
12+
13+
1:create resource group rgroup_seg_down with (CPU_RATE_LIMIT=35, MEMORY_LIMIT=35, CONCURRENCY=10);
14+
15+
-- inject an error in function dtm_broadcast_commit_prepared, that is before QD broadcasts commit prepared command to QEs
16+
2:select gp_inject_fault_infinite('dtm_broadcast_commit_prepared', 'suspend', dbid) from gp_segment_configuration where role='p' and content=-1;
17+
-- this session will pend here since the above injected fault
18+
1&:alter resource group rgroup_seg_down set CONCURRENCY 20;
19+
-- this injected fault can make dispatcher think the primary is down
20+
2:select gp_inject_fault('fts_conn_startup_packet', 'error', dbid) from gp_segment_configuration where role='p' and content=0;
21+
2:select gp_request_fts_probe_scan();
22+
-- make sure one primary segment is down.
23+
2:select status = 'd' from gp_segment_configuration where content = 0 and role = 'm';
24+
-- reset the injected fault on QD and the "alter resource group" in session1 can continue
25+
2:select gp_inject_fault('dtm_broadcast_commit_prepared', 'reset', dbid) from gp_segment_configuration where role='p' and content=-1;
26+
-- reset the injected fault on primary segment
27+
2:select gp_inject_fault('fts_conn_startup_packet', 'reset', dbid) from gp_segment_configuration where content=0;
28+
1<:
29+
-- make sure "alter resource group" has taken effect.
30+
1:select concurrency from gp_toolkit.gp_resgroup_config where groupname = 'rgroup_seg_down';
31+
2q:
32+
33+
!\retcode gprecoverseg -aF --no-progress;
34+
35+
-- loop while segments come in sync
36+
1:select wait_until_all_segments_synchronized();
37+
38+
!\retcode gprecoverseg -ar;
39+
40+
-- loop while segments come in sync
41+
1:select wait_until_all_segments_synchronized();
42+
43+
-- verify no segment is down after recovery
44+
1:select count(*) from gp_segment_configuration where status = 'd';
45+
46+
-- verify resource group
47+
1:select concurrency from gp_toolkit.gp_resgroup_config where groupname = 'rgroup_seg_down';
48+
1:drop resource group rgroup_seg_down;
49+
50+
1:alter system reset gp_fts_probe_interval;
51+
1:alter system reset gp_fts_probe_retries;
52+
1:select pg_reload_conf();
53+
1q:
54+

0 commit comments

Comments
 (0)