-
Notifications
You must be signed in to change notification settings - Fork 71
add type checking #6284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add type checking #6284
Conversation
This commit adds type checking to the semantic analyzer thus bringing static typing to dynamic data. As far as we know, this has not been done before in a general fashion in any SQL-like query language for dynamically typed data (e.g., SQL++, Asterix, search languages, etc). This works by computing fused types of each operator's output and propagating these types in a dataflow analysis. When types are unknown, the analysis flexibly models them as having any possible type. The CSUP and BSUP formats for dynamically typed data will be updated in future PRs to include fused-type information so robust type checking can be carried out for any super-structured data. Type checking for built-in functions and aggregate functions is not yet done as we need support from the functions packages to provide type signatures. This will be done in a subsequent PR. Many existing tests were updated since they had problematic type behavior. A number of new tests were added to test the type checker but coverage is light.
|
The sqllogictests spotted something that broke in this branch. Here's the baseline working as expected on current tip of Here's Postgres handling it the same. And here it is failing on the branch: |
|
@philrz this behavior was previously present but exposed by the type-checking system. Currently, the RHS of I would recommend merging this PR then fixing the existing problems in a subsequent PR. |
| indexed entity is not indexable at line 1, column 9: | ||
| fn z(a):a[0] | ||
| ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This error will be confusing in a program with multiple calls to z. Is there any way to get include the call site as context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but we don't have a way to report errors tying together different locations in the code. We need to add this. How about we fix this in a subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit so later is fine.
| outputs: | ||
| - name: stderr | ||
| data: | | ||
| "z" no such field at line 1, column 25: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Error might read better like this.
| "z" no such field at line 1, column 25: | |
| no such field "z" at line 1, column 25: |
| values 10.1.1.1 + 1 | ||
| ~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe underline the entire offending expression instead of just one operand?
| fn foo(f,x):f(x) | ||
| ~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is covered by this test only. If you can't maintain it here, maybe add a new test that covers it?
runtime/ztests/op/join-error.yaml
Outdated
| - name: stderr | ||
| data: | | ||
| join requires two upstream parallel query paths | ||
| join requires two query inputs at line 1, column 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "query" feels unnecessary here.
| join requires two query inputs at line 1, column 1: | |
| join requires two inputs at line 1, column 1: |
compiler/semantic/checker.go
Outdated
| for _, t := range u.Types { | ||
| if hasUnknown(t) { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits:
| for _, t := range u.Types { | |
| if hasUnknown(t) { | |
| return true | |
| } | |
| } | |
| if slices.ContainsFunc(u.Types, hasUnknown) { | |
| return true | |
| } |
compiler/semantic/checker.go
Outdated
| for _, t := range typ.Types { | ||
| if hasString(t) { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| for _, t := range typ.Types { | |
| if hasString(t) { | |
| return true | |
| } | |
| } | |
| return slices.ContainsFunc(typ.Types, hasString(t)) |
compiler/semantic/checker.go
Outdated
| {Name: op.LeftAlias, Type: types[0]}, | ||
| {Name: op.RightAlias, Type: types[1]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| {Name: op.LeftAlias, Type: types[0]}, | |
| {Name: op.RightAlias, Type: types[1]}, | |
| super.NewField(op.LeftAlias, types[0]), | |
| super.NewField(op.RightAlias, types[1]), |
compiler/semantic/checker.go
Outdated
| for _, t := range u.Types { | ||
| if hasNumber(t) { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| for _, t := range u.Types { | |
| if hasNumber(t) { | |
| return true | |
| } | |
| } | |
| if slices.ContainsFunc(u.Types, hasNumber) { | |
| return true | |
| } |
|
Ok, regarding the SQL problem with the IN operator, we decided on zoom today that IN semantics will include scalar equality in addition to containment equality. This means we don't need to change the runtime. Instead, I just pushed changes to update the type checker. Docs for the IN operator will come soon to reflect this. |
|
FYI, once this merged to |
This commit adds type checking to the semantic analyzer thus bringing static typing to dynamic data. As far as we know, this has not been done before in a general fashion in any SQL-like query language for dynamically typed data (e.g., SQL++, Asterix, search languages, etc).
This works by computing fused types of each operator's output and propagating these types in a dataflow analysis. When types are unknown, the analysis flexibly models them as having any possible type. The CSUP and BSUP formats for dynamically typed data will be updated in future PRs to include fused-type information so robust type checking can be carried out for any super-structured data.
Type checking for built-in functions and aggregate functions is not yet done as we need support from the functions packages to provide type signatures. This will be done in a subsequent PR.
Many existing tests were updated since they had problematic type behavior. A number of new tests were added to test the type checker but coverage is light.