Skip to content

Commit 27468f8

Browse files
PS-9453: percona_telemetry causes a long wait on COND_thd_list due to the absence of the root user
https://perconadev.atlassian.net/browse/PS-9453 Problem: If there is no 'root' user (it was renamed) and Percona Telemetry is enabled, server shutdown stuck. Additionally SHOW PROCESSLIST reports increasing number of processes with root user in state 'login'. Cause: Percona Telemetry component used hard codded 'root' user used for querying the server for metrics. If the user is not present, mysql_command_factory service connect() method fails, however it leaves opened/orphaned internal MYSQL_SESSION. It is because after opening MYSQL_SESSION we check if the user exists. It doesn't exist, so the method returns with STATE_MACHINE_FAILED error. Caller does not know anything about underlying session, does cleanup, frees memory. Moreover, the same problem potentially exists even if the user exists, but cssm_begin_connect() exits with error for any reason. Creation of session registers THD object in Global_THD_manager. That's why increasing number of processes is visible in SHOW PROCESSLIST. Why it hangs during shutdown: During shutdown, the server waits for signal handler thread (signal_hand()) to join the main thread. Then the component infrastructure is deinitialized. At the same time signal_hand() waits for all threads to finish Global_THD_manager::wait_till_no_thd(). The above described bug related to orphaned mysql sessions cause that there are orphaned THDs that never end. This causes server to wait infinitively. Solution: 1. Handle the error in cssm_begin_connect() and close opened MYSQL_SESSION. This part fixes hangs caused by nonexistent user. 2. Percona Telemetry Component uses mysql.session user. When Percona Telemetry Component is enabled, the user is granted a few more permissions during the server startup. Note that even without this permissions, the component is able to work, but not able to report some metrics. Additionally fixed potential memory leak if connection setup in Percona Telemetry Component fails.
1 parent a634d87 commit 27468f8

File tree

6 files changed

+167
-14
lines changed

6 files changed

+167
-14
lines changed

components/percona_telemetry/data_provider.cc

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,20 @@
2323
namespace {
2424
inline const char *b2s(bool val) { return val ? "1" : "0"; }
2525

26+
/*
27+
mysql.session user is mostly enough, but it lacks the following privileges:
28+
29+
1. REPLICATION SLAVE
30+
2. REPLICATION CLIENT
31+
3. SELECT on mysql.component
32+
4. SELECT on performance_schema.replication_group_members
33+
34+
These privileges are added at server startup in setup_percona_telemetry()
35+
if Percona telemetry is enabled.
36+
*/
37+
constexpr const char default_command_user_name[] = "mysql.session";
38+
constexpr const char default_command_host_name[] = "localhost";
39+
2640
namespace JSONKey {
2741
const char *pillar_version = "pillar_version";
2842
const char *db_instance_id = "db_instance_id";
@@ -120,25 +134,29 @@ bool DataProvider::do_query(const std::string &query, QueryResult *result,
120134
}
121135
result->clear();
122136

137+
/* command_factory_service_.init() allocates memory for mysql_h
138+
We need to call close() always.
139+
Even if init() fails, becaues it doesn't allocate anything, calling close()
140+
is safe, because internally it checks if provided pointer is valid
141+
*/
142+
std::shared_ptr<MYSQL_H> mysql_h_close_guard(
143+
&mysql_h, [&srv = command_factory_service_](MYSQL_H *ptr) {
144+
srv.close(*ptr);
145+
});
146+
123147
mysql_service_status_t sstatus = command_factory_service_.init(&mysql_h);
148+
124149
if (!sstatus)
125150
sstatus |=
126151
command_options_service_.set(mysql_h, MYSQL_COMMAND_PROTOCOL, nullptr);
127152
if (!sstatus)
128-
sstatus |=
129-
command_options_service_.set(mysql_h, MYSQL_COMMAND_USER_NAME, "root");
153+
sstatus |= command_options_service_.set(mysql_h, MYSQL_COMMAND_USER_NAME,
154+
default_command_user_name);
130155
if (!sstatus)
131-
sstatus |=
132-
command_options_service_.set(mysql_h, MYSQL_COMMAND_HOST_NAME, nullptr);
156+
sstatus |= command_options_service_.set(mysql_h, MYSQL_COMMAND_HOST_NAME,
157+
default_command_host_name);
133158
if (!sstatus) sstatus |= command_factory_service_.connect(mysql_h);
134159

135-
// starting from this point, if the above succeeded we need to close mysql_h.
136-
std::shared_ptr<void> mysql_h_close_guard(
137-
mysql_h,
138-
[&srv = command_factory_service_, do_close = !sstatus](void *ptr) {
139-
if (do_close && ptr) srv.close(static_cast<MYSQL_H>(ptr));
140-
});
141-
142160
// if any of the above failed, just exit
143161
if (sstatus) {
144162
goto err;
@@ -212,7 +230,7 @@ bool DataProvider::collect_db_instance_id_info(rapidjson::Document *document) {
212230
so the SQL query failed. It will recover next time.
213231
2. Some other reason that caused selecting server_id to fail. */
214232
if (id.length() == 0) {
215-
logger_.warning(
233+
logger_.info(
216234
"Collecting db_instance_id failed. It may be caused by server still "
217235
"initializing.");
218236
return true;
@@ -346,7 +364,7 @@ bool DataProvider::collect_se_usage_info(rapidjson::Document *document) {
346364
QueryResult result;
347365
if (do_query("SELECT DISTINCT ENGINE FROM information_schema.tables WHERE "
348366
"table_schema NOT IN('mysql', 'information_schema', "
349-
"'performance_schema', 'sys');",
367+
"'performance_schema', 'sys')",
350368
&result)) {
351369
return true;
352370
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
SELECT * FROM information_schema.table_privileges WHERE grantee = "'mysql.session'@'localhost'" ORDER BY table_schema, table_name;
2+
GRANTEE TABLE_CATALOG TABLE_SCHEMA TABLE_NAME PRIVILEGE_TYPE IS_GRANTABLE
3+
'mysql.session'@'localhost' def mysql component SELECT NO
4+
'mysql.session'@'localhost' def mysql user SELECT NO
5+
'mysql.session'@'localhost' def performance_schema replication_group_members SELECT NO
6+
SHOW GRANTS FOR 'mysql.session'@'localhost';
7+
Grants for mysql.session@localhost
8+
GRANT SHUTDOWN, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO `mysql.session`@`localhost`
9+
GRANT AUDIT_ABORT_EXEMPT,AUTHENTICATION_POLICY_ADMIN,BACKUP_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,FIREWALL_EXEMPT,PERSIST_RO_VARIABLES_ADMIN,SESSION_VARIABLES_ADMIN,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN ON *.* TO `mysql.session`@`localhost`
10+
GRANT SELECT ON `performance_schema`.* TO `mysql.session`@`localhost`
11+
GRANT SELECT ON `mysql`.`component` TO `mysql.session`@`localhost`
12+
GRANT SELECT ON `mysql`.`user` TO `mysql.session`@`localhost`
13+
GRANT SELECT ON `performance_schema`.`replication_group_members` TO `mysql.session`@`localhost`
14+
# restart:--percona_telemetry.grace_interval=30 --percona_telemetry.scrape_interval=30 --percona_telemetry.telemetry_root_dir=<telemetry_root_dir>
15+
RENAME USER 'root'@'localhost' to 'root.tmp'@'localhost';
16+
Warnings:
17+
Warning 4005 User 'root'@'localhost' is referenced as a definer account in a stored routine.
18+
Warning 4005 User 'root'@'localhost' is referenced as a definer account in a trigger.
19+
'root' user used by component's 1st verison does not exist. Telemetry dir should contain 1 file.
20+
1
21+
RENAME USER 'root.tmp'@'localhost' to 'root'@'localhost';
22+
Warnings:
23+
Warning 4005 User 'root'@'localhost' is referenced as a definer account in a stored routine.
24+
Warning 4005 User 'root'@'localhost' is referenced as a definer account in a trigger.
25+
# restart:--percona_telemetry.grace_interval=30 --percona_telemetry.scrape_interval=30 --percona_telemetry.telemetry_root_dir=<telemetry_root_dir>
26+
RENAME USER 'mysql.session'@'localhost' to 'mysql.session.tmp'@'localhost';
27+
include/assert.inc [No orphaned sessions expected in processlist]
28+
'mysql.session' user used by component does not exist. Telemetry dir should still contain 1 file.
29+
1
30+
RENAME USER 'mysql.session.tmp'@'localhost' to 'mysql.session'@'localhost';
31+
# restart:--percona_telemetry.grace_interval=30 --percona_telemetry.scrape_interval=30 --percona_telemetry.telemetry_root_dir=<telemetry_root_dir>
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Test that lack of the user used by Percona Telemetry Component
2+
# doesn't cause hangs during server restart and no orphaned sessions are created.
3+
4+
--source include/have_percona_telemetry.inc
5+
6+
--let $telemetry_root_dir = $MYSQL_TMP_DIR/telemetry_dir
7+
--let $grace_interval = 30
8+
--let $scrape_interval = 30
9+
10+
--mkdir $telemetry_root_dir
11+
12+
# Record mysql.session user privileges
13+
SELECT * FROM information_schema.table_privileges WHERE grantee = "'mysql.session'@'localhost'" ORDER BY table_schema, table_name;
14+
SHOW GRANTS FOR 'mysql.session'@'localhost';
15+
16+
# restart the server with custom telemetry file path and timeouts
17+
--let $restart_parameters = "restart:--percona_telemetry.grace_interval=$grace_interval --percona_telemetry.scrape_interval=$scrape_interval --percona_telemetry.telemetry_root_dir=$telemetry_root_dir"
18+
--replace_regex /telemetry_root_dir=.*telemetry_dir/telemetry_root_dir=<telemetry_root_dir>/
19+
--source include/restart_mysqld.inc
20+
21+
# Rename 'root' user (1st version of Percona Telemetry Component used 'root' user)
22+
# 1st version will not collect any data and will not create telemetry file and the restart will hang.
23+
# Fixed version will work properly as it doesn't use 'root' user.
24+
RENAME USER 'root'@'localhost' to 'root.tmp'@'localhost';
25+
26+
# sleep more than grace_interval and check that telemetry file was created
27+
--let $timeout = `select $grace_interval + 10`
28+
--sleep $timeout
29+
30+
--echo 'root' user used by component's 1st verison does not exist. Telemetry dir should contain 1 file.
31+
--exec ls -1 $telemetry_root_dir | wc -l
32+
33+
#
34+
# It should be possible to restart the server.
35+
#
36+
RENAME USER 'root.tmp'@'localhost' to 'root'@'localhost';
37+
--let $restart_parameters = "restart:--percona_telemetry.grace_interval=$grace_interval --percona_telemetry.scrape_interval=$scrape_interval --percona_telemetry.telemetry_root_dir=$telemetry_root_dir"
38+
--replace_regex /telemetry_root_dir=.*telemetry_dir/telemetry_root_dir=<telemetry_root_dir>/
39+
--source include/restart_mysqld.inc
40+
41+
42+
#
43+
# Now rename the user used by component
44+
#
45+
RENAME USER 'mysql.session'@'localhost' to 'mysql.session.tmp'@'localhost';
46+
47+
# Wait a few cycles and ensure that SHOW PROCESSLIST does not contain rows related to orphaned sessions.
48+
--let $timeout = `select $grace_interval + 3 * $scrape_interval`
49+
--sleep $timeout
50+
51+
--let $assert_text = No orphaned sessions expected in processlist
52+
--let $assert_cond = [SELECT COUNT(*) as Result FROM performance_schema.processlist WHERE user = "mysql.session";, Result, 1] = 0
53+
--source include/assert.inc
54+
55+
# Check that no new telemetry file was created
56+
--echo 'mysql.session' user used by component does not exist. Telemetry dir should still contain 1 file.
57+
--exec ls -1 $telemetry_root_dir | wc -l
58+
59+
60+
#
61+
# It should be still possible to restart the server.
62+
#
63+
RENAME USER 'mysql.session.tmp'@'localhost' to 'mysql.session'@'localhost';
64+
--let $restart_parameters = "restart:--percona_telemetry.grace_interval=$grace_interval --percona_telemetry.scrape_interval=$scrape_interval --percona_telemetry.telemetry_root_dir=$telemetry_root_dir"
65+
--replace_regex /telemetry_root_dir=.*telemetry_dir/telemetry_root_dir=<telemetry_root_dir>/
66+
--source include/restart_mysqld.inc
67+
68+
# cleanup
69+
--force-rmdir $telemetry_root_dir
70+

sql/dd/impl/upgrade/server.cc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -959,6 +959,15 @@ bool upgrade_system_schemas(THD *thd) {
959959
/* 1. We INSERT INTO, because prepared statements do not support
960960
INSTALL COMPONENT
961961
2. We use stored procedure to be able to do conditional action.
962+
3. Percona Telemetry Component uses mysql.session user. For the component to
963+
be fully functional, mysql.session user lacks the following privileges:
964+
1. REPLICATION SLAVE
965+
2. REPLICATION CLIENT
966+
3. SELECT on mysql.component
967+
4. SELECT on performance_schema.replication_group_members
968+
GRANT does not work yet as ACL is not initialized yet. Use UPDATES.
969+
For 3 and 4 we could set Select_priv = 'Y' in mysql.user table for
970+
mysql.session user, but let's allow only minimal required privileges.
962971
*/
963972
static const char *percona_telemetry_install[] = {
964973
"USE mysql;\n",
@@ -977,6 +986,15 @@ static const char *percona_telemetry_install[] = {
977986
"PREPARE stmt FROM @str;\n",
978987
"EXECUTE stmt;\n",
979988
"DROP PREPARE stmt;\n",
989+
"UPDATE mysql.user SET Repl_slave_priv = 'Y' WHERE User = 'mysql.session' "
990+
"AND Host = 'localhost';\n",
991+
"UPDATE mysql.user SET Repl_client_priv = 'Y' WHERE User = 'mysql.session' "
992+
"AND Host = 'localhost';\n",
993+
"INSERT IGNORE INTO mysql.tables_priv VALUES ('localhost', 'mysql', "
994+
"'mysql.session', 'component', 'root@localhost', NOW(), 'Select', '');\n",
995+
"INSERT IGNORE INTO mysql.tables_priv VALUES ('localhost', "
996+
"'performance_schema', 'mysql.session', 'replication_group_members', "
997+
"'root@localhost', NOW(), 'Select', '');\n",
980998
NULL};
981999

9821000
static const char *percona_telemetry_uninstall[] = {

sql/server_component/mysql_command_backend.cc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,21 @@ mysql_state_machine_status cssm_begin_connect(mysql_async_connect *ctx) {
7575
my_h_service h_command_consumer_srv = nullptr;
7676

7777
MYSQL_SESSION mysql_session = nullptr;
78+
79+
/* We need to handle the failure in this function.
80+
Setting mcs_extn->session_svc right after session open is not enough
81+
to handle user lookup errors (and following errors as well)
82+
because in case of this function returns error, mysql->extension is
83+
cleaned up immediately by the caller. The caller does not take care of
84+
session_svc, because it is not aware of this structure.
85+
*/
86+
std::shared_ptr<MYSQL_SESSION> mysql_session_close_guard(
87+
&mysql_session, [mcs_extn](MYSQL_SESSION *mysql_session_ptr) {
88+
if (*mysql_session_ptr == nullptr) return;
89+
mcs_extn->session_svc = nullptr;
90+
srv_session_close(*mysql_session_ptr);
91+
});
92+
7893
if (mcs_extn->mcs_thd == nullptr || mcs_extn->session_svc == nullptr) {
7994
/*
8095
Avoid possibility of nested txn in the current thd.
@@ -239,6 +254,7 @@ mysql_state_machine_status cssm_begin_connect(mysql_async_connect *ctx) {
239254
}
240255
mysql->client_flag = 0; /* For handshake */
241256
mysql->server_status = SERVER_STATUS_AUTOCOMMIT;
257+
mysql_session = nullptr; // disable delete quard
242258
return STATE_MACHINE_DONE;
243259
}
244260

unittest/gunit/components/percona_telemetry/data_provider-t.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ TEST_F(DataProviderTest, collect_se_usage_info_test) {
287287
const std::string query(
288288
std::string("SELECT DISTINCT ENGINE FROM information_schema.tables WHERE "
289289
"table_schema NOT IN('mysql', 'information_schema', "
290-
"'performance_schema', 'sys');"));
290+
"'performance_schema', 'sys')"));
291291
const std::string expected_json_key("se_engines_in_use");
292292
collect_array_info_common(query, expected_json_key,
293293
&MockDataProvider::collect_se_usage_info);

0 commit comments

Comments
 (0)