Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

Make benchmarks faster to run and cover all string functions.

What changes are included in this PR?

Create the input data once instead of once per expression.

How are these changes tested?

Current benchmark results:

string.txt

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.62%. Comparing base (f09f8af) to head (a0a220e).
⚠️ Report is 797 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2964      +/-   ##
============================================
+ Coverage     56.12%   59.62%   +3.49%     
- Complexity      976     1375     +399     
============================================
  Files           119      167      +48     
  Lines         11743    15488    +3745     
  Branches       2251     2567     +316     
============================================
+ Hits           6591     9234    +2643     
- Misses         4012     4956     +944     
- Partials       1140     1298     +158     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title chore: Improve string benchmarks chore: Improve microbenchmark for string expressions Dec 23, 2025
@andygrove andygrove requested a review from comphead January 2, 2026 16:43
withTempTable("parquetV1Table") {
prepareTable(
dir,
spark.sql(s"SELECT REPEAT(CAST(value AS STRING), 10) AS c1 FROM $tbl"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have random string lengths?

something like

SELECT
  substring(
    repeat(
      value
      20
    ),
    cast(rand() * 26 as int) + 1,
    cast(rand() * 20 as int) + 1
  ) AS random_string
FROM range(10);

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@andygrove andygrove merged commit 37cb5c9 into apache:main Jan 3, 2026
119 checks passed
@andygrove andygrove deleted the string-bench branch January 3, 2026 16:27
@coderfender
Copy link
Contributor

String expressions
================================================================================================

================================================================================================
Substring
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Substring:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                14             31          14          0.1       14037.9       1.0X
Comet (Scan)                                         14             34          13          0.1       13728.3       1.0X
Comet (Scan + Exec)                                  14             42          13          0.1       13484.0       1.0X


================================================================================================
ascii
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
ascii:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                11             30          12          0.1       10417.3       1.0X
Comet (Scan)                                         13             43          13          0.1       12275.7       0.8X
Comet (Scan + Exec)                                  13             37          11          0.1       12212.6       0.9X


================================================================================================
bitLength
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
bitLength:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             28          16          0.1        9242.2       1.0X
Comet (Scan)                                         10             43          16          0.1       10042.8       0.9X
Comet (Scan + Exec)                                  12             37          11          0.1       11717.4       0.8X


================================================================================================
octet_length
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
octet_length:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             37          22          0.1        9606.7       1.0X
Comet (Scan)                                         10             53          22          0.1        9840.2       1.0X
Comet (Scan + Exec)                                  11             34          13          0.1       11220.0       0.9X


================================================================================================
upper
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
upper:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             26          20          0.1        9233.7       1.0X
Comet (Scan)                                         10             40          20          0.1        9480.1       1.0X
Comet (Scan + Exec)                                  15             42          14          0.1       14291.5       0.6X


================================================================================================
lower
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
lower:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             57          22          0.1        9121.3       1.0X
Comet (Scan)                                         10             40          15          0.1        9534.1       1.0X
Comet (Scan + Exec)                                  12             43          14          0.1       11809.1       0.8X


================================================================================================
chr
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
chr:                                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             27          24          0.1        9016.2       1.0X
Comet (Scan)                                         10             43          19          0.1        9430.2       1.0X
Comet (Scan + Exec)                                  11             42          14          0.1       10895.3       0.8X


================================================================================================
initCap
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
initCap:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             51          29          0.1        9054.2       1.0X
Comet (Scan)                                         20             50          19          0.1       19110.4       0.5X
Comet (Scan + Exec)                                  10             35          16          0.1        9834.2       0.9X


================================================================================================
trim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
trim:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             31          21          0.1        8622.8       1.0X
Comet (Scan)                                         10             46          17          0.1        9474.0       0.9X
Comet (Scan + Exec)                                  11             58          18          0.1       10808.6       0.8X


================================================================================================
btrim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
btrim:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             40          19          0.1        8734.7       1.0X
Comet (Scan)                                         10             38          23          0.1        9875.3       0.9X
Comet (Scan + Exec)                                  12             56          18          0.1       11821.6       0.7X


================================================================================================
ltrim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
ltrim:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             27          17          0.1        8581.9       1.0X
Comet (Scan)                                          9             49          19          0.1        8992.0       1.0X
Comet (Scan + Exec)                                  12             44          16          0.1       11763.1       0.7X


================================================================================================
rtrim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
rtrim:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             35          17          0.1        8311.8       1.0X
Comet (Scan)                                         10             28          16          0.1        9815.4       0.8X
Comet (Scan + Exec)                                  13             45          13          0.1       12426.0       0.7X


================================================================================================
lpad
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
lpad:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             31          18          0.1        9624.0       1.0X
Comet (Scan)                                         10             52          20          0.1        9882.0       1.0X
Comet (Scan + Exec)                                  29             53          11          0.0       28350.2       0.3X


================================================================================================
rpad
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
rpad:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             32          18          0.1        9930.5       1.0X
Comet (Scan)                                         34             58          12          0.0       33057.5       0.3X
Comet (Scan + Exec)                                  12             47          16          0.1       11649.9       0.9X


================================================================================================
concat
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
concat:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             23          20          0.1        8708.3       1.0X
Comet (Scan)                                          9             53          13          0.1        9205.9       0.9X
Comet (Scan + Exec)                                  11             57          17          0.1       10575.1       0.8X


================================================================================================
concatws
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
concatws:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             22          18          0.1        8566.8       1.0X
Comet (Scan)                                         31             50          14          0.0       30137.5       0.3X
Comet (Scan + Exec)                                  32             61          15          0.0       31659.3       0.3X


================================================================================================
contains
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
contains:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             50          20          0.1        9767.3       1.0X
Comet (Scan)                                         11             46          16          0.1       10438.0       0.9X
Comet (Scan + Exec)                                  13             47          20          0.1       12454.9       0.8X


================================================================================================
startsWith
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
startsWith:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             36          20          0.1        8126.8       1.0X
Comet (Scan)                                         10             43          14          0.1        9694.1       0.8X
Comet (Scan + Exec)                                  37             74           9          0.0       36311.4       0.2X


================================================================================================
endsWith
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
endsWith:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             30          33          0.1        7704.6       1.0X
Comet (Scan)                                          9             31           6          0.1        9009.1       0.9X
Comet (Scan + Exec)                                  11             29          11          0.1       10399.3       0.7X


================================================================================================
length
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
length:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             24          13          0.1        8961.0       1.0X
Comet (Scan)                                         31             88          37          0.0       30298.2       0.3X
Comet (Scan + Exec)                                  10             40          16          0.1        9453.2       0.9X


================================================================================================
repeat
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
repeat:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             35          17          0.1        7871.5       1.0X
Comet (Scan)                                          9             32          16          0.1        8399.9       0.9X
Comet (Scan + Exec)                                  12             32          15          0.1       11398.4       0.7X


================================================================================================
reverse
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
reverse:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             20          13          0.1        8347.1       1.0X
Comet (Scan)                                          9             27          12          0.1        8532.8       1.0X
Comet (Scan + Exec)                                  11             42          13          0.1       10959.6       0.8X


================================================================================================
instr
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
instr:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             17          10          0.1        8244.3       1.0X
Comet (Scan)                                          9             40          11          0.1        9048.3       0.9X
Comet (Scan + Exec)                                  11             26          15          0.1       10335.0       0.8X


================================================================================================
replace
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
replace:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             24          16          0.1        7904.1       1.0X
Comet (Scan)                                          9             41          13          0.1        8899.6       0.9X
Comet (Scan + Exec)                                  11             35          15          0.1       10387.7       0.8X


================================================================================================
space
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
space:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             23          16          0.1        7371.1       1.0X
Comet (Scan)                                          8             15          10          0.1        7583.9       1.0X
Comet (Scan + Exec)                                   9             22          16          0.1        8781.6       0.8X


================================================================================================
translate
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
translate:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                12             31          18          0.1       11537.7       1.0X
Comet (Scan)                                         13             38          15          0.1       12661.8       0.9X
Comet (Scan + Exec)                                  16             42          16          0.1       15305.8       0.8X



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants