add type checking by mccanne · Pull Request #6284 · brimdata/super

mccanne · 2025-10-06T21:58:56Z

This commit adds type checking to the semantic analyzer thus bringing static typing to dynamic data. As far as we know, this has not been done before in a general fashion in any SQL-like query language for dynamically typed data (e.g., SQL++, Asterix, search languages, etc).

This works by computing fused types of each operator's output and propagating these types in a dataflow analysis. When types are unknown, the analysis flexibly models them as having any possible type. The CSUP and BSUP formats for dynamically typed data will be updated in future PRs to include fused-type information so robust type checking can be carried out for any super-structured data.

Type checking for built-in functions and aggregate functions is not yet done as we need support from the functions packages to provide type signatures. This will be done in a subsequent PR.

Many existing tests were updated since they had problematic type behavior. A number of new tests were added to test the type checker but coverage is light.

This commit adds type checking to the semantic analyzer thus bringing static typing to dynamic data. As far as we know, this has not been done before in a general fashion in any SQL-like query language for dynamically typed data (e.g., SQL++, Asterix, search languages, etc). This works by computing fused types of each operator's output and propagating these types in a dataflow analysis. When types are unknown, the analysis flexibly models them as having any possible type. The CSUP and BSUP formats for dynamically typed data will be updated in future PRs to include fused-type information so robust type checking can be carried out for any super-structured data. Type checking for built-in functions and aggregate functions is not yet done as we need support from the functions packages to provide type signatures. This will be done in a subsequent PR. Many existing tests were updated since they had problematic type behavior. A number of new tests were added to test the type checker but coverage is light.

philrz · 2025-10-07T00:50:25Z

The sqllogictests spotted something that broke in this branch. Here's the baseline working as expected on current tip of main:

$ super -version
Version: 525853d26

$ super -f parquet -o data.parquet -c "values {col1:1,col2:1},{col1:2,col2:3},{col1:4,col2:4}" &&
  super -c "SELECT * FROM data.parquet WHERE col1 IN (col2);"

{col1:1,col2:1}
{col1:4,col2:4}

Here's Postgres handling it the same.

$ psql postgres
psql (17.6 (Homebrew))
Type "help" for help.

postgres=# CREATE TABLE DATA (col1 INTEGER, col2 INTEGER);
CREATE TABLE
postgres=# INSERT INTO DATA (col1, col2) VALUES (1,1),(2,3),(4,4);
INSERT 0 3

postgres=# SELECT * FROM data WHERE col1 IN (col2);
 col1 | col2 
------+------
    1 |    1
    4 |    4
(2 rows)

And here it is failing on the branch:

$ super -f parquet -o data.parquet -c "values {col1:1,col2:1},{col1:2,col2:3},{col1:4,col2:4}" &&
  super -c "SELECT * FROM data.parquet WHERE col1 IN (col2);"

bad type for right-hand side of in operator: int64 at line 1, column 43:
SELECT * FROM data.parquet WHERE col1 IN (col2);
                                          ~~~~

mccanne · 2025-10-07T14:01:04Z

@philrz this behavior was previously present but exposed by the type-checking system. Currently, the RHS of ... IN ( expr ) doesn't compile into a tuple/array but the runtime IN operator returns true for scalar IN scalar. I think this is questionable and would should revisit these semantics. So, we need to fix the parser to recognize ... IN ( expr ) as a tuple and update the runtime to produce dynamic errors when testing if something is IN a scalar value.

I would recommend merging this PR then fixing the existing problems in a subsequent PR.

nwt · 2025-10-07T00:25:07Z

compiler/semantic/ztests/checker-func.yaml

+  indexed entity is not indexable at line 1, column 9:
+  fn z(a):a[0]
+          ~


Nit: This error will be confusing in a program with multiple calls to z. Is there any way to get include the call site as context?

I agree but we don't have a way to report errors tying together different locations in the code. We need to add this. How about we fix this in a subsequent PR?

Just a nit so later is fine.

nwt · 2025-10-07T00:28:47Z

compiler/semantic/ztests/checker-parquet.yaml

+outputs:
+  - name: stderr
+    data: |
+      "z" no such field at line 1, column 25:


Nit: Error might read better like this.

Suggested change

"z" no such field at line 1, column 25:

no such field "z" at line 1, column 25:

nwt · 2025-10-07T14:06:26Z

compiler/semantic/ztests/checker-plus-ip.yaml

+  values 10.1.1.1 + 1
+         ~~~~~~~~


Nit: Maybe underline the entire offending expression instead of just one operand?

nwt · 2025-10-07T14:09:24Z

compiler/semantic/ztests/no-such-builtin.yaml

-  fn foo(f,x):f(x)
-              ~~~~


This error is covered by this test only. If you can't maintain it here, maybe add a new test that covers it?

nwt · 2025-10-07T14:53:52Z

runtime/ztests/op/join-error.yaml

  - name: stderr
    data: |
-      join requires two upstream parallel query paths
+      join requires two query inputs at line 1, column 1:


Nit: "query" feels unnecessary here.

Suggested change

join requires two query inputs at line 1, column 1:

join requires two inputs at line 1, column 1:

nwt · 2025-10-07T15:48:06Z

compiler/semantic/checker.go

+		for _, t := range u.Types {
+			if hasUnknown(t) {
+				return true
+			}
+		}


Nits:

Suggested change

for _, t := range u.Types {

if hasUnknown(t) {

return true

}

}

if slices.ContainsFunc(u.Types, hasUnknown) {

return true

}

nwt · 2025-10-07T15:49:01Z

compiler/semantic/checker.go

+		for _, t := range typ.Types {
+			if hasString(t) {
+				return true
+			}
+		}


Nit:

Suggested change

for _, t := range typ.Types {

if hasString(t) {

return true

}

}

return slices.ContainsFunc(typ.Types, hasString(t))

nwt · 2025-10-07T15:52:30Z

compiler/semantic/checker.go

+		{Name: op.LeftAlias, Type: types[0]},
+		{Name: op.RightAlias, Type: types[1]},


Nit:

Suggested change

{Name: op.LeftAlias, Type: types[0]},

{Name: op.RightAlias, Type: types[1]},

super.NewField(op.LeftAlias, types[0]),

super.NewField(op.RightAlias, types[1]),

compiler/semantic/checker.go

nwt · 2025-10-07T15:58:49Z

compiler/semantic/checker.go

+		for _, t := range u.Types {
+			if hasNumber(t) {
+				return true
+			}
+		}


Nit:

Suggested change

for _, t := range u.Types {

if hasNumber(t) {

return true

}

}

if slices.ContainsFunc(u.Types, hasNumber) {

return true

}

mccanne · 2025-10-07T17:45:11Z

Ok, regarding the SQL problem with the IN operator, we decided on zoom today that IN semantics will include scalar equality in addition to containment equality. This means we don't need to change the runtime. Instead, I just pushed changes to update the type checker. Docs for the IN operator will come soon to reflect this.

philrz · 2025-10-07T18:36:33Z

FYI, once this merged to main (commit c7dcd26) I re-ran the 1+ million sqllogictests that had previously been successful and 100% of them passed. So, IN fix is confirmed a success and no other surprises were hiding behind that one. 👍

nwt approved these changes Oct 7, 2025

View reviewed changes

mccanne added 2 commits October 7, 2025 10:35

address PR feedback

c7fc9b2

change type checker to allow scalar equality for 'in' operator

c70dbc8

mccanne merged commit c7dcd26 into main Oct 7, 2025
3 checks passed

mccanne deleted the type-checker branch October 7, 2025 18:11

	"z" no such field at line 1, column 25:
	no such field "z" at line 1, column 25:

	join requires two query inputs at line 1, column 1:
	join requires two inputs at line 1, column 1:

		{Name: op.LeftAlias, Type: types[0]},
		{Name: op.RightAlias, Type: types[1]},

Conversation

mccanne commented Oct 6, 2025

Uh oh!

philrz commented Oct 7, 2025

Uh oh!

mccanne commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccanne commented Oct 7, 2025

Uh oh!

Uh oh!

philrz commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mccanne commented Oct 7, 2025 •

edited

Loading