Add string_to_array function by ptravers · Pull Request #32045 · MaterializeInc/materialize

ptravers · 2025-03-28T20:19:17Z

add string_to_array function

Motivation

https://github.com/MaterializeInc/database-issues/issues/7101

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

github-actions · 2025-03-28T20:19:31Z

All contributors have signed the CLA.
_{Posted by the CLA Assistant Lite bot.}

ptravers · 2025-03-28T20:20:01Z

I have read the Contributor License Agreement (CLA) and I hereby sign the CLA.

ParkMyCar

Overall looks good, but can you address the OIDs and argument types before merging? Want to make sure we adhere to Postgres compatibility.

WOOHOO on your first PR!

src/expr/src/scalar/func.rs

ParkMyCar · 2025-03-28T22:15:16Z

src/sql/src/func.rs

+            params!(String, Any) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3947;
+            params!(String, Any, String) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3948;


We match the oids from Postgres, it looks like these should be 394 and 376 respectively (postgres).

Also, I think the second argument in both of these cases should be String? I think even with the type of String you should be able to pass NULL as an argument.

src/expr/src/scalar/func.rs

ggevay · 2025-03-29T20:45:03Z

src/expr/src/scalar/func.rs

                ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable)
            }
            RegexpReplace => ScalarType::String.nullable(in_nullable),
+            StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true),


I think the output is null if and only if the first argument is null. If this is true, you could give a tighter nullability here by copying the nullability of the first argument. (This would be similar to some of the above lines that use in_nullable, but you can't use that directly, because that takes the nullability of all arguments, whereas you'd need only the first argument.)

ah neat, thanks! what difference does making the nullability more stringent like that make to the behaviour of the database?

... IS NULL/IS NOT NULL are rewritten to false/true. (Or not introduced in other cases, e.g., here.) Generally, we have a lot of IS NOT NULL checks, because MirRelationExpr::Join matches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:

Certain subquery simplifications (try_simplify_quantified_comparisons) are possible only for non-nullable columns.

NOT IN subqueries choke on nullable inputs: https://github.com/MaterializeInc/database-issues/issues/382#issuecomment-2368827498

https://github.com/MaterializeInc/database-issues/issues/8396

coalesce call argument lists are truncated after the first non-nullable argument.

ah thanks so much! that makes sense

src/expr/src/scalar/func.rs

ggevay

Looks good!

There are some CI failures, but they are straightforward to resolve:

The "Cargo test" fail is just because of changed ids of some system objects. Needs

REWRITE=1 COCKROACH_URL=postgres://root@localhost:26257 cargo test test_http_sql

The "Fast SQL logic tests" fail is also because of changed ids of system objects, and also just needs a rewrite:

bin/sqllogictest -- -v test/sqllogictest/mz_catalog_server_index_accounting.slt --rewrite-results

ggevay · 2025-04-01T13:03:15Z

test/sqllogictest/string.slt

+----
+{" "}
+
+# string_to_array - whitespace


(comment copy-pasted from above)

ah good catch, thanks

src/expr/src/scalar/func.rs

ggevay · 2025-04-01T13:03:23Z

src/expr/src/scalar/func.rs

                ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable)
            }
            RegexpReplace => ScalarType::String.nullable(in_nullable),
+            StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true),


... IS NULL/IS NOT NULL are rewritten to false/true. (Or not introduced in other cases, e.g., here.) Generally, we have a lot of IS NOT NULL checks, because MirRelationExpr::Join matches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:

Certain subquery simplifications (try_simplify_quantified_comparisons) are possible only for non-nullable columns.

NOT IN subqueries choke on nullable inputs: https://github.com/MaterializeInc/database-issues/issues/382#issuecomment-2368827498

https://github.com/MaterializeInc/database-issues/issues/8396

coalesce call argument lists are truncated after the first non-nullable argument.

doc/user/data/sql_funcs.yml

kay-kim

lgtm. I left a suggestion more for rendering ... feel free to ignore.

Co-authored-by: Kay Kim <kaykim00@gmail.com>

doc/user/data/sql_funcs.yml

Co-authored-by: Kay Kim <kaykim00@gmail.com>

ptravers requested review from a team as code owners March 28, 2025 20:19

ptravers requested a review from ParkMyCar March 28, 2025 20:19

ptravers requested a review from a team as a code owner March 28, 2025 22:13

ParkMyCar approved these changes Mar 28, 2025

View reviewed changes

ggevay reviewed Mar 29, 2025

View reviewed changes

src/expr/src/scalar/func.rs Show resolved Hide resolved

ggevay reviewed Mar 29, 2025

View reviewed changes

ptravers commented Mar 31, 2025

View reviewed changes

src/expr/src/scalar/func.rs Show resolved Hide resolved

ptravers requested a review from ggevay March 31, 2025 21:21

ggevay approved these changes Apr 1, 2025

View reviewed changes

ptravers added 14 commits April 1, 2025 10:48

Add string_to_array function

be051a5

Add additional comments

123420c

Add checking null_string=NULL is ignored

1e0c9c2

Add checking delimited word -> empty string

cbbec27

fix incorrect oids

9be7962

fix match oid to postgres oid

2458a38

refactor for clarity

c69798d

refactor to use std::str::split

9c252f4

refactor to one line return

ff8535f

add null handling for first param

4980665

fix broken snapshot

dc103fd

fix test naming

6844961

fix broken snapshot

c769a53

fix broken snapshot

00bf600

ptravers force-pushed the pt/7101 branch from 0b3a6db to 00bf600 Compare April 1, 2025 15:04

kay-kim reviewed Apr 1, 2025

View reviewed changes

doc/user/data/sql_funcs.yml Outdated Show resolved Hide resolved

kay-kim approved these changes Apr 1, 2025

View reviewed changes

ptravers enabled auto-merge April 1, 2025 15:25

ptravers disabled auto-merge April 1, 2025 15:25

update docs with clearer doc layout

6ee300a

Co-authored-by: Kay Kim <kaykim00@gmail.com>

ptravers enabled auto-merge April 1, 2025 15:26

fix multiling comment requires |

bf6ca27

kay-kim reviewed Apr 1, 2025

View reviewed changes

doc/user/data/sql_funcs.yml Outdated Show resolved Hide resolved

fix whitespaces!

771dec6

Co-authored-by: Kay Kim <kaykim00@gmail.com>

ptravers disabled auto-merge April 1, 2025 16:21

fix dropped string_to_array signature

2aa249c

ptravers enabled auto-merge April 1, 2025 16:24

ptravers merged commit 22b8308 into MaterializeInc:main Apr 1, 2025
84 checks passed

		params!(String, Any) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3947;
		params!(String, Any, String) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3948;

Comments

Conversation

ptravers commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Checklist

Uh oh!

github-actions bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptravers commented Mar 28, 2025

Uh oh!

ParkMyCar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggevay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kay-kim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ptravers commented Mar 28, 2025 •

edited

Loading

github-actions bot commented Mar 28, 2025 •

edited

Loading