Skip to content

Conversation

@lyne7-sc
Copy link
Contributor

@lyne7-sc lyne7-sc commented Jan 1, 2026

Which issue does this PR close?

Rationale for this change

This PR improves the performance of the substring_index function by optimizing
delimiter search and substring extraction:

  • Single-byte fast path: introduces a specialized byte-based search for single-byte delimiters (e.g. ., ,), avoiding UTF-8 pattern matching overhead.
  • Efficient index discovery: replaces the split-and-sum-length approach with direct index location using match_indices / rmatch_indices.

What changes are included in this PR?

  • Added a fast path for delimiter.len() == 1 using byte-based search.
  • Refactored the general path to use match_indices and rmatch_indices for more efficient positioning.

Benchmarks

  • Single-byte delimiter benchmarks show ~2–3× speedup across batch sizes.
  • Multi-byte delimiters see a consistent ~10–15% improvement.
group                                               main_substrindex                       perf_substrindex
-----                                               ----------------                       ----------------
substr_index/substr_index_10000_long_delimiter      1.12   548.2±18.39µs        ? ?/sec    1.00   488.4±15.62µs        ? ?/sec
substr_index/substr_index_10000_single_delimiter    2.14   543.4±15.12µs        ? ?/sec    1.00    254.0±7.80µs        ? ?/sec
substr_index/substr_index_1000_long_delimiter       1.12     43.1±1.63µs        ? ?/sec    1.00     38.6±2.03µs        ? ?/sec
substr_index/substr_index_1000_single_delimiter     3.51     46.4±2.21µs        ? ?/sec    1.00     13.2±0.99µs        ? ?/sec
substr_index/substr_index_100_long_delimiter        1.01      3.7±0.18µs        ? ?/sec    1.00      3.7±0.20µs        ? ?/sec
substr_index/substr_index_100_single_delimiter      2.15      3.6±0.14µs        ? ?/sec    1.00  1675.9±79.15ns        ? ?/sec

Are these changes tested?

  • Yes, Existing unit tests pass.
  • New benchmarks added to verify performance improvement.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 1, 2026
Copy link

@uzqw uzqw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I noticed some Clippy warnings when running locally:
cargo clippy -p datafusion-functions --all-targets --all-features

match string.get(string.len().saturating_sub(length)..) {
Some(substring) => builder.append_value(substring),
None => builder.append_null(),
if n > 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else { if .. } block can be collapsed


for batch_size in batch_sizes {
group.bench_function(
&format!("substr_index_{}_single_delimiter", batch_size),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables can be used directly in the format! string

);

group.bench_function(
&format!("substr_index_{}_long_delimiter", batch_size),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables can be used directly in the format! string

@lyne7-sc
Copy link
Contributor Author

lyne7-sc commented Jan 2, 2026

Hi! I noticed some Clippy warnings when running locally: cargo clippy -p datafusion-functions --all-targets --all-features
@uzqw Thanks for catching that! I've refactored the logic to address the Clippy warnings.

@lyne7-sc
Copy link
Contributor Author

lyne7-sc commented Jan 2, 2026

Hi @Jefffrey , could you please help trigger the workflows for this PR? I'd appreciate a review when you have a moment. Thanks!

@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing perf/substrindex (0d40231) to 9a9ff8d diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and perf_substrindex
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ perf_substrindex ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2463.94 ms │       2411.31 ms │ no change │
│ QQuery 1     │   925.89 ms │        966.74 ms │ no change │
│ QQuery 2     │  1906.50 ms │       1847.69 ms │ no change │
│ QQuery 3     │  1180.54 ms │       1161.63 ms │ no change │
│ QQuery 4     │  2283.95 ms │       2313.33 ms │ no change │
│ QQuery 5     │ 28082.76 ms │      28635.63 ms │ no change │
│ QQuery 6     │  4027.29 ms │       3943.87 ms │ no change │
│ QQuery 7     │  3988.58 ms │       4072.75 ms │ no change │
└──────────────┴─────────────┴──────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 44859.47ms │
│ Total Time (perf_substrindex)   │ 45352.93ms │
│ Average Time (HEAD)             │  5607.43ms │
│ Average Time (perf_substrindex) │  5669.12ms │
│ Queries Faster                  │          0 │
│ Queries Slower                  │          0 │
│ Queries with No Change          │          8 │
│ Queries with Failure            │          0 │
└─────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ perf_substrindex ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     1.41 ms │          1.45 ms │     no change │
│ QQuery 1     │    49.65 ms │         48.54 ms │     no change │
│ QQuery 2     │   134.63 ms │        130.98 ms │     no change │
│ QQuery 3     │   159.41 ms │        150.64 ms │ +1.06x faster │
│ QQuery 4     │  1103.96 ms │       1110.02 ms │     no change │
│ QQuery 5     │  1447.26 ms │       1366.42 ms │ +1.06x faster │
│ QQuery 6     │     1.44 ms │          1.45 ms │     no change │
│ QQuery 7     │    51.93 ms │         54.44 ms │     no change │
│ QQuery 8     │  1487.19 ms │       1445.29 ms │     no change │
│ QQuery 9     │  1926.80 ms │       1819.27 ms │ +1.06x faster │
│ QQuery 10    │   349.59 ms │        343.20 ms │     no change │
│ QQuery 11    │   396.95 ms │        389.93 ms │     no change │
│ QQuery 12    │  1292.55 ms │       1288.16 ms │     no change │
│ QQuery 13    │  2005.52 ms │       1950.11 ms │     no change │
│ QQuery 14    │  1298.90 ms │       1232.90 ms │ +1.05x faster │
│ QQuery 15    │  1318.91 ms │       1233.10 ms │ +1.07x faster │
│ QQuery 16    │  2638.37 ms │       2616.87 ms │     no change │
│ QQuery 17    │  2633.22 ms │       2563.63 ms │     no change │
│ QQuery 18    │  5341.65 ms │       5093.38 ms │     no change │
│ QQuery 19    │   122.60 ms │        119.54 ms │     no change │
│ QQuery 20    │  1907.51 ms │       1853.59 ms │     no change │
│ QQuery 21    │  2200.02 ms │       2183.62 ms │     no change │
│ QQuery 22    │  3799.20 ms │       3750.99 ms │     no change │
│ QQuery 23    │ 12375.07 ms │      12077.93 ms │     no change │
│ QQuery 24    │   209.85 ms │        215.28 ms │     no change │
│ QQuery 25    │   467.06 ms │        455.26 ms │     no change │
│ QQuery 26    │   212.27 ms │        209.21 ms │     no change │
│ QQuery 27    │  2734.00 ms │       2702.47 ms │     no change │
│ QQuery 28    │ 23269.43 ms │      21660.03 ms │ +1.07x faster │
│ QQuery 29    │   941.37 ms │        943.45 ms │     no change │
│ QQuery 30    │  1424.51 ms │       1298.70 ms │ +1.10x faster │
│ QQuery 31    │  1359.72 ms │       1341.83 ms │     no change │
│ QQuery 32    │  5087.35 ms │       4700.82 ms │ +1.08x faster │
│ QQuery 33    │  5842.88 ms │       5549.30 ms │ +1.05x faster │
│ QQuery 34    │  5970.99 ms │       5904.53 ms │     no change │
│ QQuery 35    │  1985.82 ms │       1913.51 ms │     no change │
│ QQuery 36    │    66.06 ms │         65.69 ms │     no change │
│ QQuery 37    │    45.26 ms │         43.20 ms │     no change │
│ QQuery 38    │    63.70 ms │         65.70 ms │     no change │
│ QQuery 39    │   101.94 ms │        100.97 ms │     no change │
│ QQuery 40    │    25.22 ms │         25.71 ms │     no change │
│ QQuery 41    │    22.83 ms │         21.02 ms │ +1.09x faster │
│ QQuery 42    │    18.85 ms │         18.84 ms │     no change │
└──────────────┴─────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 93892.88ms │
│ Total Time (perf_substrindex)   │ 90060.94ms │
│ Average Time (HEAD)             │  2183.56ms │
│ Average Time (perf_substrindex) │  2094.44ms │
│ Queries Faster                  │         10 │
│ Queries Slower                  │          0 │
│ Queries with No Change          │         33 │
│ Queries with Failure            │          0 │
└─────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ perf_substrindex ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 118.17 ms │        118.85 ms │     no change │
│ QQuery 2     │  28.48 ms │         26.09 ms │ +1.09x faster │
│ QQuery 3     │  37.14 ms │         37.63 ms │     no change │
│ QQuery 4     │  28.10 ms │         28.61 ms │     no change │
│ QQuery 5     │  85.50 ms │         85.88 ms │     no change │
│ QQuery 6     │  19.51 ms │         19.45 ms │     no change │
│ QQuery 7     │ 217.25 ms │        217.09 ms │     no change │
│ QQuery 8     │  33.35 ms │         32.14 ms │     no change │
│ QQuery 9     │  92.42 ms │         95.35 ms │     no change │
│ QQuery 10    │  61.06 ms │         61.71 ms │     no change │
│ QQuery 11    │  18.43 ms │         17.36 ms │ +1.06x faster │
│ QQuery 12    │  49.90 ms │         50.92 ms │     no change │
│ QQuery 13    │  48.50 ms │         48.99 ms │     no change │
│ QQuery 14    │  13.94 ms │         13.63 ms │     no change │
│ QQuery 15    │  24.39 ms │         25.02 ms │     no change │
│ QQuery 16    │  23.51 ms │         25.29 ms │  1.08x slower │
│ QQuery 17    │ 144.71 ms │        150.59 ms │     no change │
│ QQuery 18    │ 279.05 ms │        274.90 ms │     no change │
│ QQuery 19    │  36.50 ms │         36.64 ms │     no change │
│ QQuery 20    │  50.23 ms │         49.84 ms │     no change │
│ QQuery 21    │ 312.96 ms │        303.72 ms │     no change │
│ QQuery 22    │  17.38 ms │         17.36 ms │     no change │
└──────────────┴───────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary               ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 1740.47ms │
│ Total Time (perf_substrindex)   │ 1737.05ms │
│ Average Time (HEAD)             │   79.11ms │
│ Average Time (perf_substrindex) │   78.96ms │
│ Queries Faster                  │         2 │
│ Queries Slower                  │         1 │
│ Queries with No Change          │        19 │
│ Queries with Failure            │         0 │
└─────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jan 5, 2026

run benchmark substr_index

@alamb-ghbot

This comment was marked as outdated.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lyne7-sc and @Jefffrey and @uzqw

match string.get(..length) {
Some(substring) => builder.append_value(substring),
None => builder.append_null(),
let result_idx = if delimiter.len() == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that this checks for length in bytes

https://doc.rust-lang.org/std/string/struct.String.html#method.len

T::Native: OffsetSizeTrait,
{
let mut builder = StringBuilder::new();
let num_rows = string_array.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to make this really fast, you could also implement special case code for the common cases where delimiter and count were scalar values -- I suspect you could make it quite fast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I’ll try adding a special-case implementation for that. Thanks!

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing perf/substrindex (0d40231) to 9a9ff8d diff
BENCH_NAME=substr_index
BENCH_COMMAND=cargo bench --features=parquet --bench substr_index
BENCH_FILTER=
BENCH_BRANCH_NAME=perf_substrindex
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                               main                                   perf_substrindex
-----                                               ----                                   ----------------
substr_index/substr_index_10000_long_delimiter                                             1.00   909.2±10.88µs        ? ?/sec
substr_index/substr_index_10000_single_delimiter                                           1.00   458.7±11.98µs        ? ?/sec
substr_index/substr_index_1000_long_delimiter                                              1.00     87.1±4.09µs        ? ?/sec
substr_index/substr_index_1000_single_delimiter                                            1.00     41.3±0.73µs        ? ?/sec
substr_index/substr_index_100_long_delimiter                                               1.00      7.4±0.39µs        ? ?/sec
substr_index/substr_index_100_single_delimiter                                             1.00      3.6±0.16µs        ? ?/sec
substr_index_array_array_1000                       1.00     96.7±1.33µs        ? ?/sec  

@alamb alamb added this pull request to the merge queue Jan 6, 2026
@alamb
Copy link
Contributor

alamb commented Jan 6, 2026

Onwards 🚀

Merged via the queue into apache:main with commit 166ef81 Jan 6, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants