Skip to content

Commit 535a6b5

Browse files
PS-9837 [8.0]: Assertion failure !cursor->index->is_committed()` during DELETE + INSERT load
https://perconadev.atlassian.net/browse/PS-9837 Background: ---------- The assertion failure means, a secondary index record that is expected to be delete-marked is NOT delete-marked (ie, a normal record). This cannot happen for regular secondary indexes. Only exemption given is indexes being built (Online ALTER). The pre-conditions: ------------------- 1. Compressed tables only (ROW_FORMAT=COMPRESSED) 2. many non-unique secondary indexes 3. Change-buffering enabled (to all) 4. IO bound workload to force change-buffering Threads involved: ----------------- 1. Purge thread that is bit delayed to ensure DELETE-MARKED records are present 2. LRU thread that does eviction of uncompressed frames of compressed pages. Eviction of pages from unzip_LRU 3. DELETE statement that does change-buffering on page. The same page is being evicted by LRU thread unzip_LRU eviction 4. A new insert that re-inserts the just deleted record. Lets check the steps to the assertion failure. Lets use this table for discussion create table t1 ( id int primary key, value text not null, uid int not null, key k1(uid) ) engine=innodb ROW_FORMAT=COMPRESSED; insert into t1 values (2, 'two value', 102); This inserts a record (2, 'two value, 102) in clustered index and (102,2) in secondary index k1. Lets call this secondary index page P1. This page P1 is a compressed page and whenever this page is accessed by SELECTs, the page has uncompressed part aka frame. When a compressed page is read from disk. it doesn't have the uncompressed frame yet. After decompression (see zip_page_handler), the page will have block->frame and the compressed part is in block->zip.data. The page has also access_time set after decompression. The page is also in unzip_LRU. This LRU is a list of pages that have both compressed and uncompressed frames. On LRU pressure, there is logic to evict only the uncompressed part and keep the compressed part in the form of zip->data (in bpage_t). buf_free_from_unzip_LRU_list_batch(). Step 1: ------- DELETE FROM t1 WHERE id = 2; This query will delete-mark record in clustered index and secondary index records. When the secondary index record is about be delete-marked on page P1, at the same moment, LRU eviction thread does uncompressed frame evictions from unzip_LRU. The eviction process is as follows: 1. we create a copy of bpage with compressed-frame only. During the copy the access_time is copied from the old block to the compressed-only bpage. 2. The current block which has both uncompressed and compressed frames is freed. buf_LRU_free_page() 3. buf_LRU_block_remove_hashed() in buf_LRU_free_page() releases page_hash and block mutex 4. Later we reacquire page_hash and block_mutex and again insert the compressed frame into the page_hash Now there is teeny-tiny window between Step 3 & Step 4. At this window, if someone asks if a page is in the BP (buffer-pool), the answer is NO. Enter the DELETE statement into our discussion when DELETE searches the record to be deleted on secondary index, the page P1 will be looked if it is in BP, And remember the window above between steps 3 & 4? if it so happens that the DELETE connection threads asks if this page P1 is in BP, the answer is NO and it proceeds to change-buffer the DELETE-marking. The record doesn't have DELETE-MARK flag until the change buffer is merged. Step 2: ------- A SELECT that access the secondary index page P1. As part of SELECT, it would see that the page is in BP but only has compressed data. ie the block state is BUF_BLOCK_ZIP_PAGE/BUF_BLOCK_ZIP_DIRTY. zip_page_handler() decompresses the compressed page and transform it to a regular page (both uncompressed and compressed parts are present). In this, ibuf_merge_or_delete_for_page() is skipped if page already has "access_time" set. From the eviction process above, we see that the acess_time is copied from uncompressed block to compressed block (bpage). This "Skipping of change-buffer merge" is the root cause of the current bug. Step 3: ------- Re-insertion of the just deleted record. INSERT INTO t1 VALUES (2, 'two value', 102); This will insert the record into clustered index and then proceeds to insert into secondary index. It finds that exact record is found. The cursor search leads the record (102,2). Then it sees that is exact match and expect to do the insert by "modify". Meaning, it has to just undelete-mark the record. Since the change-buffer merge was skipped, the apply of DELETE-MARKING to the record is skipped. So the record is in "Normal" state. This is unexpected. We cannot find a record that is NOT in DELETE-MARKED. This is becuase we the clustered index already guaranteed that record was not present and such "insert by modify" always requires the record tbe DELETE-MARKED. Since the record is in NORMAL state, the assertion fails! Analysis: --------- When the compressed-page is read from disk, it is typically not decompressed immediately. Its access_time is 0. A SELECT on the page forces it to be decompressed. Since the page access time is zero, it is considered firt transition to uncompressed and the change buffer merge is attempted. But when the page transitioned from un-compressed to compressed during LRU uncompressed-only eviction of compressed pages, the page access_time is not reset and the page is immediately in BP again. A later reads/SELECTs decompress the page again but the difference is the change-buffer merge is skipped because it is considered the page is already accessed. Fix: ---- Although removing page hash release and acquire in buf_LRU_free_page() is the most ideal solution, it has latch ordering complications. Luckily a simpler alternative exist. We can simply attempt to ibuf merge always if block state is BUF_BLOCK_ZIP_PAGE or BUF_BLOCK_ZIP_DIRTY All transitions to states of BUF_BLOCK_ZIP_PAGE or ZIP_DIRTY have possibility of ibuf entries on the page. unzip_LRU uncompressed frame evictions create a small window for ibuf entries on the page. buf_LRU_block_remove_hashed() in buf_LRU_free_page() releases page hash lock and block mutex. Before they are re-acquired, ibuf is possible on the page DISK->ZIP_PAGE : ibuf entry should be applied (page was out of BP) FILE_PAGE->ZIP_PAGE: caused by unzip_LRU eviction ZIP_PAGE -> ZIP_DIRTY: page dirty on unzip LRU eviction ZIP_DIRTY->ZIP_PAGE: a dirty page (previously on unzip_LRU) is flushed.
1 parent 04ed7d4 commit 535a6b5

File tree

1 file changed

+15
-9
lines changed

1 file changed

+15
-9
lines changed

storage/innobase/buf/buf0buf.cc

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4055,8 +4055,6 @@ dberr_t Buf_fetch<T>::zip_page_handler(buf_block_t *&fix_block) {
40554055

40564056
mutex_exit(&m_buf_pool->zip_mutex);
40574057

4058-
const auto access_time = buf_page_is_accessed(&block->page);
4059-
40604058
buf_page_mutex_exit(block);
40614059

40624060
m_buf_pool->n_pend_unzip.fetch_add(1);
@@ -4074,13 +4072,21 @@ dberr_t Buf_fetch<T>::zip_page_handler(buf_block_t *&fix_block) {
40744072
}
40754073

40764074
if (!recv_no_ibuf_operations) {
4077-
if (access_time != std::chrono::steady_clock::time_point{}) {
4078-
#ifdef UNIV_IBUF_COUNT_DEBUG
4079-
ut_a(ibuf_count_get(m_page_id) == 0);
4080-
#endif /* UNIV_IBUF_COUNT_DEBUG */
4081-
} else {
4082-
ibuf_merge_or_delete_for_page(block, m_page_id, &m_page_size, true);
4083-
}
4075+
/* All transitions to state of BUF_BLOCK_ZIP_PAGE or ZIP_DIRTY have
4076+
possibility of ibuf entries on the page.
4077+
4078+
unzip_LRU uncompressed frame evictions create a small window for ibuf
4079+
entries on the page. buf_LRU_block_remove_hashed() in buf_LRU_free_page()
4080+
releases page hash lock and block mutex. Before they are re-acquired, ibuf
4081+
is possible on the page
4082+
4083+
DISK->ZIP_PAGE : ibuf entry should be applied (page was out of BP)
4084+
FILE_PAGE->ZIP_PAGE: caused by unzip_LRU eviction
4085+
ZIP_PAGE -> ZIP_DIRTY: page dirty on unzip LRU eviction
4086+
ZIP_DIRTY->ZIP_PAGE: a dirty page (previously on unzip_LRU) is flushed.
4087+
4088+
So apply change-buffer merge on the page */
4089+
ibuf_merge_or_delete_for_page(block, m_page_id, &m_page_size, true);
40844090
}
40854091

40864092
buf_page_mutex_enter(block);

0 commit comments

Comments
 (0)