Skip to content

Commit 00bcbe0

Browse files
sayrerclaude
andauthored
Fix UB: replace unsafe String::from_utf8_unchecked (#7781)
with checked conversion A panic in a CustomTransliterator can unwind through Insertable's Drop impl without fully restoring UTF-8 validity, causing into_string() to produce undefined behavior via from_utf8_unchecked. Use checked from_utf8().expect() so this becomes a clean panic instead. This one seemed correct to me (process: hammer on Claude until it gets a good patch). Maybe there is a better way, but I think this one is ok. ## Changelog - Fix UB in transliterator: use checked UTF-8 conversion instead of `from_utf8_unchecked` Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9c0923d commit 00bcbe0

File tree

1 file changed

+26
-5
lines changed
  • components/experimental/src/transliterate/transliterator

1 file changed

+26
-5
lines changed

components/experimental/src/transliterate/transliterator/replaceable.rs

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,10 @@
2828
// quite closely coupled to Transform Rules is `RepMatcher` and its explicit `ante`, `post`, `key`
2929
// handling.
3030

31-
// QUESTION: for this whole module, I don't know how panics work together with safety invariants. I'm fairly sure that unexpected panics
32-
// could break some invariants.
31+
// NOTE: Panics (e.g., from CustomTransliterator impls) during transliteration could unwind through
32+
// Drop impls (Insertable, InsertableGuard, etc.) that may not fully restore UTF-8 validity.
33+
// As a mitigation, TransliteratorBuffer::into_string() uses checked UTF-8 conversion so that
34+
// any such corruption results in a clean panic rather than undefined behavior.
3335

3436
use super::Filter;
3537
use alloc::string::String;
@@ -50,10 +52,13 @@ impl TransliteratorBuffer {
5052
Self(s.into_bytes())
5153
}
5254

55+
#[allow(clippy::expect_used)] // panic is strictly better than UB from from_utf8_unchecked
5356
pub(crate) fn into_string(self) -> String {
54-
debug_assert!(core::str::from_utf8(&self.0).is_ok());
55-
// SAFETY: We have exclusive access, so the vec must contain valid UTF-8
56-
unsafe { String::from_utf8_unchecked(self.0) }
57+
// Using checked conversion: if a panic during transliteration unwinds through
58+
// Drop impls that fail to fully restore UTF-8 validity, this will panic cleanly
59+
// instead of producing undefined behavior.
60+
String::from_utf8(self.0)
61+
.expect("TransliteratorBuffer must contain valid UTF-8 after transliteration")
5762
}
5863
}
5964

@@ -1055,3 +1060,19 @@ enum CursorOffset {
10551060
/// A `char`-based offset for before the replacement string.
10561061
CharsOffStart(u16),
10571062
}
1063+
1064+
#[cfg(test)]
1065+
mod tests {
1066+
use super::*;
1067+
1068+
#[test]
1069+
#[should_panic(expected = "valid UTF-8")]
1070+
fn test_into_string_rejects_invalid_utf8() {
1071+
// Simulate what would happen if a panic during transliteration
1072+
// left the buffer with invalid UTF-8.
1073+
let buffer = TransliteratorBuffer(vec![0xFF, 0xFE, 0xFD]);
1074+
// With checked conversion: panics cleanly.
1075+
// With unchecked conversion: silently produces an invalid String (UB).
1076+
let _ = buffer.into_string();
1077+
}
1078+
}

0 commit comments

Comments
 (0)