GH-44615: [C++][Compute] Add extract_regex_span function#45577
GH-44615: [C++][Compute] Add extract_regex_span function#45577pitrou merged 3 commits intoapache:mainfrom
Conversation
228c2c5 to
88da027
Compare
|
@pitrou could you review this issue? |
mapleFU
left a comment
There was a problem hiding this comment.
Would the compute document also need to add this?
Yes, this is a new compute function called extract_regex_span. Should I add changes somewhere? |
pitrou
left a comment
There was a problem hiding this comment.
Thanks for this @arashandishgar . There are a number of things to improve in this PR even though the overall implementation is sound, see comments below.
|
Appreciate your feedback! I’ll look into it and follow up soon. |
29ff6ea to
9442b40
Compare
9442b40 to
2b82534
Compare
|
@pitrou I've applied all changes that you suggest. Could You review again? |
pitrou
left a comment
There was a problem hiding this comment.
Thanks for the update. Here are some more comments.
Also, can you add some documentation to https://github.com/apache/arrow/blob/main/docs/source/cpp/compute.rst#string-component-extraction?
|
Thanks for your review, I'll look into it |
Thanks for mentioning it; I wrote the relevant documentation. |
2b82534 to
229ed3b
Compare
pitrou
left a comment
There was a problem hiding this comment.
Thanks for the update @arashandishgar . Here are a couple more comments, can you take a look and address them?
|
Thanks for your comment. I will look into it |
d6c52ea to
2764dc9
Compare
2764dc9 to
703a5ac
Compare
pitrou
left a comment
There was a problem hiding this comment.
I've made some minor changes and added Python bindings. I'll merge this PR if CI is green, thank you @arashandishgar !
|
@github-actions crossbow submit -g cpp |
|
Revision: 703a5ac Submitted crossbow builds: ursacomputing/crossbow @ actions-4358c6f365 |
I wanted to take a moment to sincerely thank you for your patience and constructive feedback during the code review process. As this was my first experience contributing to Apache Arrow, I learned a lot throughout the review, and your thoughtful suggestions were extremely valuable. I realized that I made a mistake by not properly generalizing your suggestion across the entire codebase, especially in my second commit. In my rush to submit the pull request, I overlooked that point. I truly appreciate your understanding, and I hope to be more diligent in applying feedback consistently in future contributions. Thanks again for your time and support. |
|
You're welcome, @arashandishgar . Feel free to contribute again! |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 0494115. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
While the
extract_regexfunction returns substrings of the matching regex captures,extract_regex_spanreturns (index, length) pairs of these substrings relative to the original string values.Are these changes tested?
Yes, by dedicated unit tests.
Are there any user-facing changes?
No, except a new compute function.
compute.extract_regex#44615