ICU-23394 Validate binary RBBI data offsets in RBBIDataWrapper::init()#3961
ICU-23394 Validate binary RBBI data offsets in RBBIDataWrapper::init()#3961TristanInSec wants to merge 3 commits into
Conversation
Add bounds checking for all offset+length pairs (fFTable, fRTable, fTrie, fRuleSource, fStatusTable) against the total data length in the RBBI binary data header. Without this validation, crafted binary data with out-of-range offsets causes an out-of-bounds read when passed to RuleBasedBreakIterator(const uint8_t*, uint32_t, UErrorCode&). The overflow-safe checks verify that each offset does not exceed totalLen, and that the corresponding length does not exceed the remaining space.
058350a to
b2757e0
Compare
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
|
Hi @TristanInSec, It is true that if you give ICU corrupted binary data files -- via API or via constructing data files to be loaded -- there are all kinds of ways that ICU will "go into the weeds". Your additional checks here are the most basic, and might be reasonable, but it's not clear that it makes sense to do this partial validation. We definitely don't plan to do full validation. If we were going forward, we would want to see
Thanks, |
|
Hi @markusicu, Thank you for the detailed review. I'll address each item. Jira tickets: I created two dedicated tickets:
PR template: Will restore and fill out on all four PRs. Unit tests: I have crash inputs from libFuzzer that reproduce each issue under ASan. I'll wrap them as C++ test cases that fail before the fix (OOB read / SEGV) and pass after (returning Java port: I'll look at the ICU4J equivalents for On partial validation vs. GIGO: I understand the concern about incomplete coverage. The case for these checks:
The validation is minimal and targeted: it checks that header offsets fall within the buffer bounds, consistent with how I'll update the PRs with the items above. Best regards, |
Test that RuleBasedBreakIterator returns U_INVALID_FORMAT_ERROR when given binary data with out-of-bounds offsets or a truncated header, rather than crashing with a SEGV.
Add comprehensive offset+length bounds checking for all header fields (fFTable, fRTable, fTrie, fRuleSource, fStatusTable) against fLength before using them. Includes unit test with crafted data.
Validate all offset+length pairs (fFTable, fRTable, fTrie, fRuleSource,
fStatusTable) against the total data length in
RBBIDataWrapper::init()before computing any pointers. Malformed input now returns
U_INVALID_FORMAT_ERRORinstead of producing wild pointers.Checklist