Add string_to_array function#32045
Conversation
|
All contributors have signed the CLA. |
|
I have read the Contributor License Agreement (CLA) and I hereby sign the CLA. |
ParkMyCar
left a comment
There was a problem hiding this comment.
Overall looks good, but can you address the OIDs and argument types before merging? Want to make sure we adhere to Postgres compatibility.
WOOHOO on your first PR!
src/sql/src/func.rs
Outdated
| params!(String, Any) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3947; | ||
| params!(String, Any, String) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3948; |
There was a problem hiding this comment.
We match the oids from Postgres, it looks like these should be 394 and 376 respectively (postgres).
Also, I think the second argument in both of these cases should be String? I think even with the type of String you should be able to pass NULL as an argument.
src/expr/src/scalar/func.rs
Outdated
| ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable) | ||
| } | ||
| RegexpReplace => ScalarType::String.nullable(in_nullable), | ||
| StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true), |
There was a problem hiding this comment.
I think the output is null if and only if the first argument is null. If this is true, you could give a tighter nullability here by copying the nullability of the first argument. (This would be similar to some of the above lines that use in_nullable, but you can't use that directly, because that takes the nullability of all arguments, whereas you'd need only the first argument.)
There was a problem hiding this comment.
ah neat, thanks! what difference does making the nullability more stringent like that make to the behaviour of the database?
There was a problem hiding this comment.
... IS NULL/IS NOT NULLare rewritten tofalse/true. (Or not introduced in other cases, e.g., here.) Generally, we have a lot ofIS NOT NULLchecks, becauseMirRelationExpr::Joinmatches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:- Certain subquery simplifications (
try_simplify_quantified_comparisons) are possible only for non-nullable columns. NOT INsubqueries choke on nullable inputs: https://github.com/MaterializeInc/database-issues/issues/382#issuecomment-2368827498- https://github.com/MaterializeInc/database-issues/issues/8396
- Certain subquery simplifications (
coalescecall argument lists are truncated after the first non-nullable argument.
There was a problem hiding this comment.
ah thanks so much! that makes sense
ggevay
left a comment
There was a problem hiding this comment.
Looks good!
There are some CI failures, but they are straightforward to resolve:
The "Cargo test" fail is just because of changed ids of some system objects. Needs
REWRITE=1 COCKROACH_URL=postgres://root@localhost:26257 cargo test test_http_sql
The "Fast SQL logic tests" fail is also because of changed ids of system objects, and also just needs a rewrite:
bin/sqllogictest -- -v test/sqllogictest/mz_catalog_server_index_accounting.slt --rewrite-results
test/sqllogictest/string.slt
Outdated
| ---- | ||
| {" "} | ||
|
|
||
| # string_to_array - whitespace |
There was a problem hiding this comment.
(comment copy-pasted from above)
There was a problem hiding this comment.
ah good catch, thanks
src/expr/src/scalar/func.rs
Outdated
| ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable) | ||
| } | ||
| RegexpReplace => ScalarType::String.nullable(in_nullable), | ||
| StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true), |
There was a problem hiding this comment.
... IS NULL/IS NOT NULLare rewritten tofalse/true. (Or not introduced in other cases, e.g., here.) Generally, we have a lot ofIS NOT NULLchecks, becauseMirRelationExpr::Joinmatches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:- Certain subquery simplifications (
try_simplify_quantified_comparisons) are possible only for non-nullable columns. NOT INsubqueries choke on nullable inputs: https://github.com/MaterializeInc/database-issues/issues/382#issuecomment-2368827498- https://github.com/MaterializeInc/database-issues/issues/8396
- Certain subquery simplifications (
coalescecall argument lists are truncated after the first non-nullable argument.
kay-kim
left a comment
There was a problem hiding this comment.
lgtm. I left a suggestion more for rendering ... feel free to ignore.
Co-authored-by: Kay Kim <kaykim00@gmail.com>
Co-authored-by: Kay Kim <kaykim00@gmail.com>
add string_to_array function
Motivation
https://github.com/MaterializeInc/database-issues/issues/7101
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.