Skip to content
/ server Public

Commit ee7d993

Browse files
committed
MDEV-31956 SSD based InnoDB buffer pool extension
In one of the practical cloud MariaDB setups, a server node accesses its datadir over the network, but also has a fast local SSD storage for temporary data. The content of such temporary storage is lost when the server container is destroyed. The commit uses this ephemeral fast local storage (SSD) as an extension of the portion of InnoDB buffer pool (DRAM) that caches persistent data pages. This cache is separated from the persistent storage of data files and ib_logfile0 and ignored during backup. The following system variables were introduced: innodb_extended_buffer_pool_size - the size of external buffer pool file, if it equals to 0, external buffer pool will not be used; innodb_extended_buffer_pool_path - the path to external buffer pool file. If innodb_extended_buffer_pool_size is not equal to 0, external buffer pool file will be created on startup. Only clean pages will be flushed to external buffer pool file. There is no need to flush dirty pages, as such pages will become clean after flushing, and then will be evicted when they reach the tail of LRU list. The general idea of this commit is to flush clean pages to external buffer pool file when they are evicted. A page can be evicted either by transaction thread or by background thread of page cleaner. In some cases transaction thread is waiting for page cleaner thread to finish its job. We can't do flushing in external buffer pool file when transaction threads are waithing for eviction, that would heart performance. That's why the only case for flushing is when page cleaner thread evicts pages in background and there are no waiters. For this purprose buf_pool_t::done_flush_list_waiters_count variable was introduced, we flush evicted clean pages only if the variable is zeroed. Clean pages are evicted in buf_flush_LRU_list_batch() to keep some amount of pages in buffer pool's free list. That's why we flush every second page to external buffer pool file, otherwise there could be not enought amount of pages in free list to let transaction threads to allocate buffer pool pages without page cleaner waiting. This might be not a good solution, but this is enought for prototyping. External buffer pool page is introduced to store information in buffer pool page hash about the certain page can be read from external buffer pool file. The first several members of such page must be the same as the members of internal page. External page frame must be equal to the certain value to disthinguish external page from internal one. External buffer pages are preallocated on startup in external pages array. We could get rid of the frame in external page, and check if the page's address belongs to the array to distinguish external and internal pages. There are also external pages free and LRU lists. When some internal page is decided to be flushed in external buffer pool file, a new external page is allocated eighter from the head of external free list, or from the tail of external LRU list. Both lists are protected with buf_pool.mutex. It makes sense, because a page is removed from internal LRU list during eviction under buf_pool.mutex. Then internal page is locked and the allocated external page is attached to io request for external buffer pool file, and when write request is completed, the internal page is replaced with external one in page hash, external page is pushed to the head of external LRU list and internal page is unlocked. After internal page was removed from external free list, it was not placed in external LRU, and placed there only after write completion, so the page can't be used by the other threads until write is completed. Page hash chain get element function has additional template parameter, which notifies the function if external pages must be ignored or not. We don't ignore external pages in page hash in two cases, when some page is initialized for read and when one is reinitialized for new page creating. When an internal page is initialized for read and external page with the same page id is found in page hash, the internal page is locked, the external page in replaced with newly initialized internal page in the page hash chain, the external page is removed from external LRU list and attached to io request to external buffer pool file. When the io request is completed, external page is returned to external free list, internal page is unlocked. So during read external page absents in both external LRU and free lists and can't be reused. When an internal page is initialized for new page creating and external pages with the same page id is found in page hash, we just remove external page from the page hash chain and external LRU list and push it to the head of external free list. So the external page can be used for future flushing. The pages are flushed to and read from external buffer pool file with the same manner as they are flushed to their spaces, i.e. compressed and encrypted pages stay compressed and encrypted in external buffer pool file.
1 parent eea4934 commit ee7d993

25 files changed

+1793
-321
lines changed

mysql-test/suite/innodb/r/ext_buf_pool.result

Lines changed: 510 additions & 0 deletions
Large diffs are not rendered by default.

mysql-test/suite/innodb/r/innodb_information_schema_buffer.result

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
SELECT * FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
2-
POOL_ID POOL_SIZE FREE_BUFFERS DATABASE_PAGES OLD_DATABASE_PAGES MODIFIED_DATABASE_PAGES PENDING_DECOMPRESS PENDING_READS PENDING_FLUSH_LRU PENDING_FLUSH_LIST PAGES_MADE_YOUNG PAGES_NOT_MADE_YOUNG PAGES_MADE_YOUNG_RATE PAGES_MADE_NOT_YOUNG_RATE NUMBER_PAGES_READ NUMBER_PAGES_CREATED NUMBER_PAGES_WRITTEN PAGES_READ_RATE PAGES_CREATE_RATE PAGES_WRITTEN_RATE NUMBER_PAGES_GET HIT_RATE YOUNG_MAKE_PER_THOUSAND_GETS NOT_YOUNG_MAKE_PER_THOUSAND_GETS NUMBER_PAGES_READ_AHEAD NUMBER_READ_AHEAD_EVICTED READ_AHEAD_RATE READ_AHEAD_EVICTED_RATE LRU_IO_TOTAL LRU_IO_CURRENT UNCOMPRESS_TOTAL UNCOMPRESS_CURRENT
3-
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
2+
POOL_ID POOL_SIZE FREE_BUFFERS DATABASE_PAGES OLD_DATABASE_PAGES MODIFIED_DATABASE_PAGES PENDING_DECOMPRESS PENDING_READS PENDING_FLUSH_LRU PENDING_FLUSH_LIST PAGES_MADE_YOUNG PAGES_NOT_MADE_YOUNG PAGES_MADE_YOUNG_RATE PAGES_MADE_NOT_YOUNG_RATE NUMBER_PAGES_READ NUMBER_PAGES_CREATED NUMBER_PAGES_WRITTEN PAGES_READ_RATE PAGES_CREATE_RATE PAGES_WRITTEN_RATE NUMBER_PAGES_GET HIT_RATE YOUNG_MAKE_PER_THOUSAND_GETS NOT_YOUNG_MAKE_PER_THOUSAND_GETS NUMBER_PAGES_READ_AHEAD NUMBER_READ_AHEAD_EVICTED READ_AHEAD_RATE READ_AHEAD_EVICTED_RATE LRU_IO_TOTAL LRU_IO_CURRENT UNCOMPRESS_TOTAL UNCOMPRESS_CURRENT NUMBER_PAGES_WRITTEN_TO_EXTERNAL_BUFFER_POOL NUMBER_PAGES_READ_FROM_EXTERNAL_BUFFER_POOL
3+
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
44
CREATE TABLE infoschema_buffer_test (col1 INT) ENGINE = INNODB;
55
INSERT INTO infoschema_buffer_test VALUES(9);
66
SELECT * FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE

mysql-test/suite/innodb/r/innodb_status_variables.result

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,10 @@ INNODB_BUFFER_POOL_READ_AHEAD
4141
INNODB_BUFFER_POOL_READ_AHEAD_EVICTED
4242
INNODB_BUFFER_POOL_READ_REQUESTS
4343
INNODB_BUFFER_POOL_READS
44+
INNODB_EXT_BUFFER_POOL_READS
4445
INNODB_BUFFER_POOL_WAIT_FREE
4546
INNODB_BUFFER_POOL_WRITE_REQUESTS
47+
INNODB_EXT_BUFFER_POOL_PAGES_FLUSHED
4648
INNODB_CHECKPOINT_AGE
4749
INNODB_CHECKPOINT_MAX_AGE
4850
INNODB_DATA_FSYNCS
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
--innodb-buffer-pool-size=21M --innodb-extended-buffer-pool-size=1M
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
--source include/have_innodb.inc
2+
--source include/have_debug.inc
3+
--source include/have_debug_sync.inc
4+
--source include/count_sessions.inc
5+
--source ../encryption/include/have_file_key_management_plugin.inc
6+
#--source include/innodb_page_size.inc
7+
8+
--let $encrypted_row_compressed=6
9+
--let $unencrypted_row_compressed=5
10+
--let $unencrypted_uncompressed=4
11+
--let $encrypted_uncompressed=3
12+
--let $unencrypted_page_compressed=2
13+
--let $encrypted_page_compressed=1
14+
--let $i = $encrypted_row_compressed
15+
16+
--let $page_size=`SELECT @@GLOBAL.innodb_page_size`
17+
if ($page_size != 16384) {
18+
--let $i=$unencrypted_uncompressed
19+
}
20+
21+
--connect (prevent_purge,localhost,root)
22+
START TRANSACTION WITH CONSISTENT SNAPSHOT;
23+
24+
--connection default
25+
26+
--let $DATADIR = `select @@datadir`
27+
28+
--disable_query_log
29+
--error 0,ER_UNKNOWN_SYSTEM_VARIABLE
30+
SET @old_innodb_limit_optimistic_insert_debug = @@innodb_limit_optimistic_insert_debug;
31+
SET @old_debug_dbug = @@debug_dbug;
32+
--enable_query_log
33+
34+
--error 0,ER_UNKNOWN_SYSTEM_VARIABLE
35+
SET GLOBAL innodb_limit_optimistic_insert_debug = 3;
36+
SET GLOBAL DEBUG_DBUG='+d,ib_ext_bp_count_io_only_for_t';
37+
38+
while($i) {
39+
40+
SET GLOBAL DEBUG_DBUG='+d,ib_ext_bp_disable_LRU_eviction_for_t';
41+
if ($i == $unencrypted_uncompressed) {
42+
--echo ###################################################################
43+
--echo # Testing for unencrypted uncompressed table #
44+
--echo ###################################################################
45+
CREATE TABLE t (
46+
`a` INT NOT NULL,
47+
PRIMARY KEY (`a`)
48+
) ENGINE=InnoDB;
49+
}
50+
if ($i == $encrypted_uncompressed) {
51+
--echo ###################################################################
52+
--echo # Testing for encrypted uncompressed table #
53+
--echo ###################################################################
54+
CREATE TABLE t (
55+
`a` INT NOT NULL,
56+
PRIMARY KEY (`a`)
57+
) ENGINE=InnoDB encrypted=yes encryption_key_id=1;
58+
}
59+
if ($i == $unencrypted_page_compressed) {
60+
--echo ###################################################################
61+
--echo # Testing for unencrypted PAGE_COMPRESSED=1 table #
62+
--echo ###################################################################
63+
CREATE TABLE t (
64+
`a` INT NOT NULL,
65+
PRIMARY KEY (`a`)
66+
) ENGINE=InnoDB PAGE_COMPRESSED=1;
67+
}
68+
if ($i == $unencrypted_row_compressed) {
69+
--echo ###################################################################
70+
--echo # Testing for unencrypted ROW_FORMAT=COMPRESSED table #
71+
--echo ###################################################################
72+
CREATE TABLE t (
73+
`a` INT NOT NULL,
74+
PRIMARY KEY (`a`)
75+
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
76+
}
77+
if ($i == $encrypted_page_compressed) {
78+
--echo ###################################################################
79+
--echo # Testing for encrypted PAGE_COMPRESSED=1 table #
80+
--echo ###################################################################
81+
CREATE TABLE t (
82+
`a` INT NOT NULL,
83+
PRIMARY KEY (`a`)
84+
) ENGINE=InnoDB PAGE_COMPRESSED=1 encrypted=yes encryption_key_id=1;
85+
}
86+
if ($i == $encrypted_row_compressed) {
87+
--echo ###################################################################
88+
--echo # Testing for encrypted ROW_FORMAT=COMPRESSED table #
89+
--echo ###################################################################
90+
CREATE TABLE t (
91+
`a` INT NOT NULL,
92+
PRIMARY KEY (`a`)
93+
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED encrypted=yes encryption_key_id=1;
94+
}
95+
96+
SELECT variable_value INTO @prev_flushed_gs
97+
FROM information_schema.global_status
98+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_PAGES_FLUSHED';
99+
SELECT NUMBER_PAGES_WRITTEN_TO_EXTERNAL_BUFFER_POOL INTO @prev_written_ps
100+
FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
101+
SELECT variable_value INTO @prev_reads_gs
102+
FROM information_schema.global_status
103+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_READS';
104+
SELECT NUMBER_PAGES_READ_FROM_EXTERNAL_BUFFER_POOL INTO @prev_reads_ps
105+
FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
106+
107+
--eval SET @start_val = $i*100
108+
INSERT INTO t SET a = @start_val+1;
109+
INSERT INTO t SET a = @start_val+2;
110+
INSERT INTO t SET a = @start_val+3;
111+
INSERT INTO t SET a = @start_val+4;
112+
INSERT INTO t SET a = @start_val+5;
113+
INSERT INTO t SET a = @start_val+6;
114+
INSERT INTO t SET a = @start_val+7;
115+
INSERT INTO t SET a = @start_val+8;
116+
INSERT INTO t SET a = @start_val+9;
117+
INSERT INTO t SET a = @start_val+10;
118+
INSERT INTO t SET a = @start_val+11;
119+
INSERT INTO t SET a = @start_val+12;
120+
121+
SET GLOBAL DEBUG_DBUG='-d,ib_ext_bp_disable_LRU_eviction_for_t';
122+
SET GLOBAL innodb_force_LRU_eviction = TRUE;
123+
124+
let $wait_condition =
125+
SELECT (variable_value-@prev_flushed_gs) >= 9
126+
FROM information_schema.global_status
127+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_PAGES_FLUSHED';
128+
--source include/wait_condition.inc
129+
130+
SELECT variable_value-@prev_flushed_gs
131+
FROM information_schema.global_status
132+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_PAGES_FLUSHED';
133+
SELECT NUMBER_PAGES_WRITTEN_TO_EXTERNAL_BUFFER_POOL-@prev_written_ps
134+
FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
135+
SELECT variable_value-@prev_reads_gs
136+
FROM information_schema.global_status
137+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_READS';
138+
SELECT NUMBER_PAGES_READ_FROM_EXTERNAL_BUFFER_POOL-@prev_reads_ps
139+
FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
140+
141+
SELECT * FROM t;
142+
143+
SELECT variable_value-@prev_flushed_gs
144+
FROM information_schema.global_status
145+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_PAGES_FLUSHED';
146+
SELECT NUMBER_PAGES_WRITTEN_TO_EXTERNAL_BUFFER_POOL-@prev_written_ps
147+
FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
148+
SELECT variable_value-@prev_reads_gs
149+
FROM information_schema.global_status
150+
WHERE variable_name LIKE 'INNODB_EXT_BUFFER_POOL_READS';
151+
SELECT NUMBER_PAGES_READ_FROM_EXTERNAL_BUFFER_POOL-@prev_reads_ps
152+
FROM INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS;
153+
154+
DROP TABLE t;
155+
--dec $i
156+
}
157+
158+
--disable_query_log
159+
SET GLOBAL DEBUG_DBUG=@old_debug_dbug;
160+
--error 0,ER_UNKNOWN_SYSTEM_VARIABLE
161+
SET GLOBAL innodb_limit_optimistic_insert_debug = @old_innodb_limit_optimistic_insert_debug;
162+
--enable_query_log
163+
164+
--disconnect prevent_purge
165+
--source include/wait_until_count_sessions.inc

mysql-test/suite/innodb_i_s/innodb_buffer_pool_stats.result

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,5 +32,7 @@ INNODB_BUFFER_POOL_STATS CREATE TEMPORARY TABLE `INNODB_BUFFER_POOL_STATS` (
3232
`LRU_IO_TOTAL` bigint(21) unsigned NOT NULL,
3333
`LRU_IO_CURRENT` bigint(21) unsigned NOT NULL,
3434
`UNCOMPRESS_TOTAL` bigint(21) unsigned NOT NULL,
35-
`UNCOMPRESS_CURRENT` bigint(21) unsigned NOT NULL
35+
`UNCOMPRESS_CURRENT` bigint(21) unsigned NOT NULL,
36+
`NUMBER_PAGES_WRITTEN_TO_EXTERNAL_BUFFER_POOL` bigint(21) unsigned NOT NULL,
37+
`NUMBER_PAGES_READ_FROM_EXTERNAL_BUFFER_POOL` bigint(21) unsigned NOT NULL
3638
) ENGINE=MEMORY DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci

mysql-test/suite/sys_vars/r/sysvars_innodb.result

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ variable_name not in (
66
'innodb_use_native_aio', # default value depends on OS
77
'innodb_log_file_buffering', # only available on Linux and Windows
88
'innodb_linux_aio', # existence depends on OS
9-
'innodb_buffer_pool_load_pages_abort') # debug build only, and is only for testing
9+
'innodb_buffer_pool_load_pages_abort', # debug build only, and is only for testing
10+
'innodb_force_lru_eviction') # debug build only, and is only for testing
1011
order by variable_name;
1112
VARIABLE_NAME INNODB_ADAPTIVE_FLUSHING
1213
SESSION_VALUE NULL
@@ -572,6 +573,30 @@ NUMERIC_BLOCK_SIZE NULL
572573
ENUM_VALUE_LIST OFF,ON
573574
READ_ONLY YES
574575
COMMAND_LINE_ARGUMENT OPTIONAL
576+
VARIABLE_NAME INNODB_EXTENDED_BUFFER_POOL_PATH
577+
SESSION_VALUE NULL
578+
DEFAULT_VALUE
579+
VARIABLE_SCOPE GLOBAL
580+
VARIABLE_TYPE VARCHAR
581+
VARIABLE_COMMENT Path to extended buffer pool file
582+
NUMERIC_MIN_VALUE NULL
583+
NUMERIC_MAX_VALUE NULL
584+
NUMERIC_BLOCK_SIZE NULL
585+
ENUM_VALUE_LIST NULL
586+
READ_ONLY YES
587+
COMMAND_LINE_ARGUMENT REQUIRED
588+
VARIABLE_NAME INNODB_EXTENDED_BUFFER_POOL_SIZE
589+
SESSION_VALUE NULL
590+
DEFAULT_VALUE 0
591+
VARIABLE_SCOPE GLOBAL
592+
VARIABLE_TYPE BIGINT UNSIGNED
593+
VARIABLE_COMMENT The extended buffer pool file size
594+
NUMERIC_MIN_VALUE 0
595+
NUMERIC_MAX_VALUE 18446744073709551615
596+
NUMERIC_BLOCK_SIZE 0
597+
ENUM_VALUE_LIST NULL
598+
READ_ONLY NO
599+
COMMAND_LINE_ARGUMENT REQUIRED
575600
VARIABLE_NAME INNODB_FAST_SHUTDOWN
576601
SESSION_VALUE NULL
577602
DEFAULT_VALUE 1

mysql-test/suite/sys_vars/t/sysvars_innodb.test

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,6 @@ select VARIABLE_NAME, SESSION_VALUE, DEFAULT_VALUE, VARIABLE_SCOPE, VARIABLE_TYP
1717
'innodb_use_native_aio', # default value depends on OS
1818
'innodb_log_file_buffering', # only available on Linux and Windows
1919
'innodb_linux_aio', # existence depends on OS
20-
'innodb_buffer_pool_load_pages_abort') # debug build only, and is only for testing
20+
'innodb_buffer_pool_load_pages_abort', # debug build only, and is only for testing
21+
'innodb_force_lru_eviction') # debug build only, and is only for testing
2122
order by variable_name;

0 commit comments

Comments
 (0)