Skip to content

perf: Optimize array_concat using MutableArrayData#20620

Open
neilconway wants to merge 4 commits intoapache:mainfrom
neilconway:neilc/optimize-array-concat
Open

perf: Optimize array_concat using MutableArrayData#20620
neilconway wants to merge 4 commits intoapache:mainfrom
neilconway:neilc/optimize-array-concat

Conversation

@neilconway
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The current implementation of array_concat creates an ArrayRef for each row, uses Arrow's concat kernel to merge the elements together, and then uses concat again to produce the final results. This does a lot of unnecessary allocation and copying.

Instead, we can use MutableArrayData::extend to copy element ranges in bulk, which avoids much of this intermediate copying and allocation. This approach is 5-15x faster on a microbenchmark.

What changes are included in this PR?

  • Add benchmark
  • Improve SLT test coverage for array_concat
  • Implement optimization

Are these changes tested?

Yes, and benchmarked.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Feb 28, 2026
@neilconway
Copy link
Contributor Author

Benchmarks:

  ┌────────────────┬────────────┬─────────────────┬─────────┐
  │   Benchmark    │ Old (main) │ New (optimized) │ Speedup │
  ├────────────────┼────────────┼─────────────────┼─────────┤
  │ 2_arrays/100   │ 20.6 µs    │ 1.73 µs         │ 11.9x   │
  ├────────────────┼────────────┼─────────────────┼─────────┤
  │ 2_arrays/1000  │ 200 µs     │ 13.1 µs         │ 15.3x   │
  ├────────────────┼────────────┼─────────────────┼─────────┤
  │ 2_arrays/10000 │ 2.02 ms    │ 126 µs          │ 16.0x   │
  ├────────────────┼────────────┼─────────────────┼─────────┤
  │ elem_size/5    │ 202 µs     │ 14.5 µs         │ 13.9x   │
  ├────────────────┼────────────┼─────────────────┼─────────┤
  │ elem_size/50   │ 224 µs     │ 18.4 µs         │ 12.2x   │
  ├────────────────┼────────────┼─────────────────┼─────────┤
  │ elem_size/500  │ 475 µs     │ 95.5 µs         │ 5.0x    │
  └────────────────┴────────────┴─────────────────┴─────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize array_concat to use MutableArrayData

3 participants