chore: Improve microbenchmark for string expressions #2964

andygrove · 2025-12-23T00:47:49Z

Which issue does this PR close?

N/A

Rationale for this change

Make benchmarks faster to run and cover all string functions.

What changes are included in this PR?

Create the input data once instead of once per expression.

How are these changes tested?

Current benchmark results:

string.txt

codecov-commenter · 2025-12-23T01:11:56Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.62%. Comparing base (f09f8af) to head (a0a220e).
⚠️ Report is 797 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2964      +/-   ##
============================================
+ Coverage     56.12%   59.62%   +3.49%     
- Complexity      976     1375     +399     
============================================
  Files           119      167      +48     
  Lines         11743    15488    +3745     
  Branches       2251     2567     +316     
============================================
+ Hits           6591     9234    +2643     
- Misses         4012     4956     +944     
- Partials       1140     1298     +158

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead · 2026-01-02T16:57:17Z

spark/src/test/scala/org/apache/spark/sql/benchmark/CometStringExpressionBenchmark.scala

+        withTempTable("parquetV1Table") {
+          prepareTable(
+            dir,
+            spark.sql(s"SELECT REPEAT(CAST(value AS STRING), 10) AS c1 FROM $tbl"))


should we have random string lengths?

something like

SELECT substring( repeat( value 20 ), cast(rand() * 26 as int) + 1, cast(rand() * 20 as int) + 1 ) AS random_string FROM range(10);

comphead

Thanks @andygrove

coderfender · 2026-01-05T00:56:54Z

String expressions
================================================================================================

================================================================================================
Substring
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Substring:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                14             31          14          0.1       14037.9       1.0X
Comet (Scan)                                         14             34          13          0.1       13728.3       1.0X
Comet (Scan + Exec)                                  14             42          13          0.1       13484.0       1.0X


================================================================================================
ascii
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
ascii:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                11             30          12          0.1       10417.3       1.0X
Comet (Scan)                                         13             43          13          0.1       12275.7       0.8X
Comet (Scan + Exec)                                  13             37          11          0.1       12212.6       0.9X


================================================================================================
bitLength
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
bitLength:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             28          16          0.1        9242.2       1.0X
Comet (Scan)                                         10             43          16          0.1       10042.8       0.9X
Comet (Scan + Exec)                                  12             37          11          0.1       11717.4       0.8X


================================================================================================
octet_length
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
octet_length:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             37          22          0.1        9606.7       1.0X
Comet (Scan)                                         10             53          22          0.1        9840.2       1.0X
Comet (Scan + Exec)                                  11             34          13          0.1       11220.0       0.9X


================================================================================================
upper
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
upper:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             26          20          0.1        9233.7       1.0X
Comet (Scan)                                         10             40          20          0.1        9480.1       1.0X
Comet (Scan + Exec)                                  15             42          14          0.1       14291.5       0.6X


================================================================================================
lower
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
lower:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             57          22          0.1        9121.3       1.0X
Comet (Scan)                                         10             40          15          0.1        9534.1       1.0X
Comet (Scan + Exec)                                  12             43          14          0.1       11809.1       0.8X


================================================================================================
chr
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
chr:                                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             27          24          0.1        9016.2       1.0X
Comet (Scan)                                         10             43          19          0.1        9430.2       1.0X
Comet (Scan + Exec)                                  11             42          14          0.1       10895.3       0.8X


================================================================================================
initCap
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
initCap:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             51          29          0.1        9054.2       1.0X
Comet (Scan)                                         20             50          19          0.1       19110.4       0.5X
Comet (Scan + Exec)                                  10             35          16          0.1        9834.2       0.9X


================================================================================================
trim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
trim:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             31          21          0.1        8622.8       1.0X
Comet (Scan)                                         10             46          17          0.1        9474.0       0.9X
Comet (Scan + Exec)                                  11             58          18          0.1       10808.6       0.8X


================================================================================================
btrim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
btrim:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             40          19          0.1        8734.7       1.0X
Comet (Scan)                                         10             38          23          0.1        9875.3       0.9X
Comet (Scan + Exec)                                  12             56          18          0.1       11821.6       0.7X


================================================================================================
ltrim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
ltrim:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             27          17          0.1        8581.9       1.0X
Comet (Scan)                                          9             49          19          0.1        8992.0       1.0X
Comet (Scan + Exec)                                  12             44          16          0.1       11763.1       0.7X


================================================================================================
rtrim
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
rtrim:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             35          17          0.1        8311.8       1.0X
Comet (Scan)                                         10             28          16          0.1        9815.4       0.8X
Comet (Scan + Exec)                                  13             45          13          0.1       12426.0       0.7X


================================================================================================
lpad
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
lpad:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             31          18          0.1        9624.0       1.0X
Comet (Scan)                                         10             52          20          0.1        9882.0       1.0X
Comet (Scan + Exec)                                  29             53          11          0.0       28350.2       0.3X


================================================================================================
rpad
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
rpad:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             32          18          0.1        9930.5       1.0X
Comet (Scan)                                         34             58          12          0.0       33057.5       0.3X
Comet (Scan + Exec)                                  12             47          16          0.1       11649.9       0.9X


================================================================================================
concat
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
concat:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             23          20          0.1        8708.3       1.0X
Comet (Scan)                                          9             53          13          0.1        9205.9       0.9X
Comet (Scan + Exec)                                  11             57          17          0.1       10575.1       0.8X


================================================================================================
concatws
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
concatws:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             22          18          0.1        8566.8       1.0X
Comet (Scan)                                         31             50          14          0.0       30137.5       0.3X
Comet (Scan + Exec)                                  32             61          15          0.0       31659.3       0.3X


================================================================================================
contains
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
contains:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                10             50          20          0.1        9767.3       1.0X
Comet (Scan)                                         11             46          16          0.1       10438.0       0.9X
Comet (Scan + Exec)                                  13             47          20          0.1       12454.9       0.8X


================================================================================================
startsWith
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
startsWith:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             36          20          0.1        8126.8       1.0X
Comet (Scan)                                         10             43          14          0.1        9694.1       0.8X
Comet (Scan + Exec)                                  37             74           9          0.0       36311.4       0.2X


================================================================================================
endsWith
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
endsWith:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             30          33          0.1        7704.6       1.0X
Comet (Scan)                                          9             31           6          0.1        9009.1       0.9X
Comet (Scan + Exec)                                  11             29          11          0.1       10399.3       0.7X


================================================================================================
length
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
length:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             24          13          0.1        8961.0       1.0X
Comet (Scan)                                         31             88          37          0.0       30298.2       0.3X
Comet (Scan + Exec)                                  10             40          16          0.1        9453.2       0.9X


================================================================================================
repeat
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
repeat:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             35          17          0.1        7871.5       1.0X
Comet (Scan)                                          9             32          16          0.1        8399.9       0.9X
Comet (Scan + Exec)                                  12             32          15          0.1       11398.4       0.7X


================================================================================================
reverse
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
reverse:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 9             20          13          0.1        8347.1       1.0X
Comet (Scan)                                          9             27          12          0.1        8532.8       1.0X
Comet (Scan + Exec)                                  11             42          13          0.1       10959.6       0.8X


================================================================================================
instr
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
instr:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             17          10          0.1        8244.3       1.0X
Comet (Scan)                                          9             40          11          0.1        9048.3       0.9X
Comet (Scan + Exec)                                  11             26          15          0.1       10335.0       0.8X


================================================================================================
replace
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
replace:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             24          16          0.1        7904.1       1.0X
Comet (Scan)                                          9             41          13          0.1        8899.6       0.9X
Comet (Scan + Exec)                                  11             35          15          0.1       10387.7       0.8X


================================================================================================
space
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
space:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                 8             23          16          0.1        7371.1       1.0X
Comet (Scan)                                          8             15          10          0.1        7583.9       1.0X
Comet (Scan + Exec)                                   9             22          16          0.1        8781.6       0.8X


================================================================================================
translate
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
translate:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                12             31          18          0.1       11537.7       1.0X
Comet (Scan)                                         13             38          15          0.1       12661.8       0.9X
Comet (Scan + Exec)                                  16             42          16          0.1       15305.8       0.8X

andygrove added 3 commits December 22, 2025 14:27

fixg

7f01f10

create input data once

d97afe0

smaller data

3df25fe

Merge remote-tracking branch 'apache/main' into string-bench

a0a220e

andygrove changed the title ~~chore: Improve string benchmarks~~ chore: Improve microbenchmark for string expressions Dec 23, 2025

andygrove mentioned this pull request Dec 24, 2025

chore: Add microbenchmarks for all expressions #2984

Closed

andygrove requested a review from comphead January 2, 2026 16:43

comphead reviewed Jan 2, 2026

View reviewed changes

comphead approved these changes Jan 2, 2026

View reviewed changes

andygrove merged commit 37cb5c9 into apache:main Jan 3, 2026
119 checks passed

andygrove deleted the string-bench branch January 3, 2026 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Improve microbenchmark for string expressions #2964

chore: Improve microbenchmark for string expressions #2964

Uh oh!

andygrove commented Dec 23, 2025

Uh oh!

codecov-commenter commented Dec 23, 2025 •

edited

Loading

Uh oh!

comphead Jan 2, 2026

Uh oh!

comphead left a comment

Uh oh!

Uh oh!

coderfender commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chore: Improve microbenchmark for string expressions #2964

chore: Improve microbenchmark for string expressions #2964

Uh oh!

Conversation

andygrove commented Dec 23, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

comphead Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderfender commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Dec 23, 2025 •

edited

Loading