SNOW-2443512: Add support for scalar string and binary functions ( part 2 )#3946
SNOW-2443512: Add support for scalar string and binary functions ( part 2 )#3946sfc-gh-aherreraaguilar wants to merge 24 commits intomainfrom
Conversation
…ingAndBinaryPart2
…ingAndBinaryPart2 # Conflicts: # CHANGELOG.md
…ingAndBinaryPart2
| if position is None: | ||
| position = lit(1) | ||
| if occurrence is None: | ||
| occurrence = lit(1) | ||
| if option is None: | ||
| option = lit(0) | ||
| if regexp_parameters is None: | ||
| regexp_parameters = lit("c") |
There was a problem hiding this comment.
I have two questions here:
-
do we need to check the dependency between the optional args? what happen if
occurrenceis set butpositionis not set. I feel this is not allowed in sql as in sql the arguments are positional, users have to specifypositionfirst thenoccurrencethen possibly the rest. -
I see there is a long note about
Note By default, REGEXP_INSTR returns the begin or end character offset for the entire matching part of the subject. However, if the e (for “extract”) parameter is specified, REGEXP_INSTR returns the begin or end character offset for the part of the subject that matches the first sub-expression in the pattern. If e is specified but a group_num is not also specified, then the group_num defaults to 1 (the first group). If there is no sub-expression in the pattern, REGEXP_INSTR behaves as if e was not set. For examples that use e, see [Examples](https://docs.snowflake.com/en/sql-reference/functions/regexp_instr#examples) in this topic.
shall we not set the default value from client at all and let server decide the default rule? I feel this way it's simpler
There was a problem hiding this comment.
Hello @sfc-gh-aling ,
regarding these questions, the reason why I am setting the default values is that when I call the function
builtin("regexp_instr", _emit_ast=_emit_ast)(*args)
, passing the *args list, it does not explicitly indicate the name for each parameter. So, for example, if I send regexp_instr(A,B,C) the order will be the same when calling the builtin, but when calling something like regexp_instr(A, C=C), the call in the builtin will be regexp_instr(A, C)
| subject: ColumnOrName, | ||
| pattern: ColumnOrName, | ||
| position: ColumnOrName = None, | ||
| occurrence: ColumnOrName = None, | ||
| regex_parameters: ColumnOrName = None, | ||
| group_num: ColumnOrName = None, | ||
| _emit_ast: bool = True, |
There was a problem hiding this comment.
same comment as above
|
|
||
|
|
||
| @publicapi | ||
| def regexp_substr_all( |
…yPart2 # Conflicts: # CHANGELOG.md
…yPart2 # Conflicts: # CHANGELOG.md # docs/source/snowpark/functions.rst # src/snowflake/snowpark/_functions/scalar_functions.py
…yPart2 # Conflicts: # CHANGELOG.md # src/snowflake/snowpark/_functions/scalar_functions.py
|
@sfc-gh-mvegaalvarez @sfc-gh-aling , I tried to reopen this PR, but it is not possible, so I created this new PR to merge these changes. |
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2443512
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Add support for the scalar string and binary functions:
hex_decode_stringjarowinkler_similarityparse_urlregexp_instrregexp_likeregexp_substrregexp_substr_allrtrimmed_lengthspacesplit_part