-
Notifications
You must be signed in to change notification settings - Fork 268
chore: Improve microbenchmark for string expressions #2964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2964 +/- ##
============================================
+ Coverage 56.12% 59.62% +3.49%
- Complexity 976 1375 +399
============================================
Files 119 167 +48
Lines 11743 15488 +3745
Branches 2251 2567 +316
============================================
+ Hits 6591 9234 +2643
- Misses 4012 4956 +944
- Partials 1140 1298 +158 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| withTempTable("parquetV1Table") { | ||
| prepareTable( | ||
| dir, | ||
| spark.sql(s"SELECT REPEAT(CAST(value AS STRING), 10) AS c1 FROM $tbl")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have random string lengths?
something like
SELECT
substring(
repeat(
value
20
),
cast(rand() * 26 as int) + 1,
cast(rand() * 20 as int) + 1
) AS random_string
FROM range(10);
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove
|
Which issue does this PR close?
N/A
Rationale for this change
Make benchmarks faster to run and cover all string functions.
What changes are included in this PR?
Create the input data once instead of once per expression.
How are these changes tested?
Current benchmark results:
string.txt