Commit 8f465d6
committed
perf: optimize regexp_count to avoid String allocation when start position is provided
Replace `.chars().skip().collect::<String>()` with zero-copy string slicing
using `char_indices()` to find the byte offset, then slice with `&value[byte_offset..]`.
This eliminates unnecessary String allocation per row when a start position
is specified.
Changes:
- Use char_indices().nth() to find byte offset for start position (1-based)
- Use string slicing &value[byte_offset..] instead of collecting chars
- Added benchmark to measure performance improvements
Optimization:
- Before: Allocated new String via .collect() for each row with start position
- After: Uses zero-copy string slice
Benchmark results:
- size=1024, str_len=32: 96.361 µs -> 41.458 µs (57.0% faster, 2.3x speedup)
- size=1024, str_len=128: 210.16 µs -> 56.064 µs (73.3% faster, 3.7x speedup)
- size=4096, str_len=32: 376.90 µs -> 162.98 µs (56.8% faster, 2.3x speedup)
- size=4096, str_len=128: 855.68 µs -> 263.61 µs (69.2% faster, 3.2x speedup)
The optimization shows greater improvements for longer strings (up to 73% faster)
since string slicing is O(1) regardless of length, while the previous approach
had allocation costs that grew with string length.1 parent 7c50448 commit 8f465d6
File tree
3 files changed
+116
-2
lines changed- datafusion/functions
- benches
- src/regex
3 files changed
+116
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
273 | 278 | | |
274 | 279 | | |
275 | 280 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
569 | 569 | | |
570 | 570 | | |
571 | 571 | | |
572 | | - | |
573 | | - | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
574 | 582 | | |
575 | 583 | | |
576 | 584 | | |
| |||
0 commit comments