Conversation
This commit adds type checking to the semantic analyzer thus bringing static typing to dynamic data. As far as we know, this has not been done before in a general fashion in any SQL-like query language for dynamically typed data (e.g., SQL++, Asterix, search languages, etc). This works by computing fused types of each operator's output and propagating these types in a dataflow analysis. When types are unknown, the analysis flexibly models them as having any possible type. The CSUP and BSUP formats for dynamically typed data will be updated in future PRs to include fused-type information so robust type checking can be carried out for any super-structured data. Type checking for built-in functions and aggregate functions is not yet done as we need support from the functions packages to provide type signatures. This will be done in a subsequent PR. Many existing tests were updated since they had problematic type behavior. A number of new tests were added to test the type checker but coverage is light.
|
The sqllogictests spotted something that broke in this branch. Here's the baseline working as expected on current tip of Here's Postgres handling it the same. And here it is failing on the branch: |
|
@philrz this behavior was previously present but exposed by the type-checking system. Currently, the RHS of I would recommend merging this PR then fixing the existing problems in a subsequent PR. |
| indexed entity is not indexable at line 1, column 9: | ||
| fn z(a):a[0] | ||
| ~ |
There was a problem hiding this comment.
Nit: This error will be confusing in a program with multiple calls to z. Is there any way to get include the call site as context?
There was a problem hiding this comment.
I agree but we don't have a way to report errors tying together different locations in the code. We need to add this. How about we fix this in a subsequent PR?
| outputs: | ||
| - name: stderr | ||
| data: | | ||
| "z" no such field at line 1, column 25: |
There was a problem hiding this comment.
Nit: Error might read better like this.
| "z" no such field at line 1, column 25: | |
| no such field "z" at line 1, column 25: |
| values 10.1.1.1 + 1 | ||
| ~~~~~~~~ |
There was a problem hiding this comment.
Nit: Maybe underline the entire offending expression instead of just one operand?
| fn foo(f,x):f(x) | ||
| ~~~~ |
There was a problem hiding this comment.
This error is covered by this test only. If you can't maintain it here, maybe add a new test that covers it?
runtime/ztests/op/join-error.yaml
Outdated
| - name: stderr | ||
| data: | | ||
| join requires two upstream parallel query paths | ||
| join requires two query inputs at line 1, column 1: |
There was a problem hiding this comment.
Nit: "query" feels unnecessary here.
| join requires two query inputs at line 1, column 1: | |
| join requires two inputs at line 1, column 1: |
compiler/semantic/checker.go
Outdated
| for _, t := range u.Types { | ||
| if hasUnknown(t) { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Nits:
| for _, t := range u.Types { | |
| if hasUnknown(t) { | |
| return true | |
| } | |
| } | |
| if slices.ContainsFunc(u.Types, hasUnknown) { | |
| return true | |
| } |
compiler/semantic/checker.go
Outdated
| for _, t := range typ.Types { | ||
| if hasString(t) { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Nit:
| for _, t := range typ.Types { | |
| if hasString(t) { | |
| return true | |
| } | |
| } | |
| return slices.ContainsFunc(typ.Types, hasString(t)) |
compiler/semantic/checker.go
Outdated
| {Name: op.LeftAlias, Type: types[0]}, | ||
| {Name: op.RightAlias, Type: types[1]}, |
There was a problem hiding this comment.
Nit:
| {Name: op.LeftAlias, Type: types[0]}, | |
| {Name: op.RightAlias, Type: types[1]}, | |
| super.NewField(op.LeftAlias, types[0]), | |
| super.NewField(op.RightAlias, types[1]), |
compiler/semantic/checker.go
Outdated
| for _, t := range u.Types { | ||
| if hasNumber(t) { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Nit:
| for _, t := range u.Types { | |
| if hasNumber(t) { | |
| return true | |
| } | |
| } | |
| if slices.ContainsFunc(u.Types, hasNumber) { | |
| return true | |
| } |
|
Ok, regarding the SQL problem with the IN operator, we decided on zoom today that IN semantics will include scalar equality in addition to containment equality. This means we don't need to change the runtime. Instead, I just pushed changes to update the type checker. Docs for the IN operator will come soon to reflect this. |
|
FYI, once this merged to |
This commit adds type checking to the semantic analyzer thus bringing static typing to dynamic data. As far as we know, this has not been done before in a general fashion in any SQL-like query language for dynamically typed data (e.g., SQL++, Asterix, search languages, etc).
This works by computing fused types of each operator's output and propagating these types in a dataflow analysis. When types are unknown, the analysis flexibly models them as having any possible type. The CSUP and BSUP formats for dynamically typed data will be updated in future PRs to include fused-type information so robust type checking can be carried out for any super-structured data.
Type checking for built-in functions and aggregate functions is not yet done as we need support from the functions packages to provide type signatures. This will be done in a subsequent PR.
Many existing tests were updated since they had problematic type behavior. A number of new tests were added to test the type checker but coverage is light.