Skip to content

Commit 4c6c95d

Browse files
committed
perf: Use get_unchecked in strpos for micro-optimization
Use `get_unchecked` to avoid redundant bounds checking when comparing substring slices, as we already validate the bounds in the if condition. Benchmark results show modest improvements, especially for shorter strings: - strpos_StringArray_ascii_str_len_8: 3.7% faster - strpos_StringViewArray_ascii_str_len_8: 2.0% faster - strpos_StringViewArray_utf8_str_len_8: 1.9% faster Other benchmarks show no significant change (within noise threshold).
1 parent 5ed05f7 commit 4c6c95d

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

datafusion/functions/src/unicode/strpos.rs

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -232,11 +232,16 @@ where
232232
let mut char_pos = 0;
233233
for (byte_idx, _) in string.char_indices() {
234234
char_pos += 1;
235-
if byte_idx + substring_bytes.len() <= string_bytes.len()
236-
&& &string_bytes[byte_idx..byte_idx + substring_bytes.len()]
237-
== substring_bytes
238-
{
239-
return T::Native::from_usize(char_pos);
235+
if byte_idx + substring_bytes.len() <= string_bytes.len() {
236+
// SAFETY: We just checked that byte_idx + substring_bytes.len() <= string_bytes.len()
237+
let slice = unsafe {
238+
string_bytes.get_unchecked(
239+
byte_idx..byte_idx + substring_bytes.len(),
240+
)
241+
};
242+
if slice == substring_bytes {
243+
return T::Native::from_usize(char_pos);
244+
}
240245
}
241246
}
242247

0 commit comments

Comments
 (0)