Issue #2220: support skip invalid record in recovery#3437
Issue #2220: support skip invalid record in recovery#3437leizhiyuan wants to merge 3 commits intoapache:masterfrom
Conversation
StevenLuMT
left a comment
There was a problem hiding this comment.
Does the old testcase cover this function, do you need to add a new testcase?
done |
| } | ||
| LOG.info("Replaying journal {} from position {}", id, logPosition); | ||
| long scanOffset = journal.scanJournal(id, logPosition, scanner); | ||
| long scanOffset = journal.scanJournal(id, logPosition, scanner, this.conf.isSkipReplayJournalInvalidRecord()); |
There was a problem hiding this comment.
I have a question:
How do we decide to open the switch
What I understand is that open this switch ,maube cause to loss data,
does this need to discuss it on the dev@ mailing list @eolivelli , have a look,thanks
There was a problem hiding this comment.
what we meet is
a bk shutdown because of VM crash,then VM recovered , but we can not restart bk , because of the exception。we can not skip this, we only can do format data for this VM.. and re install bk
so if we want to recovery the bk in the scene, we can open the switch only on the machine,it will startup, and next time, we will close this switch
|
fix old workflow,please see #3455 for detail |
hangc0276
left a comment
There was a problem hiding this comment.
If one journal file is broken, this PR can skip the broker journal file and make the bookie can come up instead of keeping shut down. I support this fix, but I'm not sure whether there is a way to prevent the journal file broker or not, it will address the issue from the root cause.
Another way to bring the bookie up is to delete the mark delete files in ledgers directory, but it will also cause data loss.
| isPaddingRecord = true; | ||
| } else { | ||
| } else if (skipInvalidRecord){ | ||
| LOG.warn("Invalid record found with negative length: {},because of " + |
There was a problem hiding this comment.
Maybe we can catch the exception before finally.
| if (len == 0) { | ||
| continue; | ||
| } catch (IOException e) { | ||
| if (skipInvalidRecord) { |
There was a problem hiding this comment.
If we just skip one record and continue to read the next one, I'm afraid it will cause ledger data errors. Because when one journal file is broken, the rest of the data will be organized in the wrong way and we can't parse it.
|
@leizhiyuan Would you please send a discuss to the @dev mail list to have a discuss? Thanks. |
|
Bump |
|
@leizhiyuan Do you have any updates? |
|
I will take over this PR /cc @leizhiyuan |
|
@frankjkelly I have created a new PR to track it #3956 |
|
The issue has been fixed by #3956, close this PR. |
Descriptions of the changes in this PR:
Motivation
fix #2220
Changes
(Describe: what changes you have made)
Master Issue: #