Skip to content

Conversation

@fulghum
Copy link
Contributor

@fulghum fulghum commented Jun 2, 2025

Postgres semantics for IS NULL and IS NOT NULL differ slightly from the the MySQL semantics implemented in GMS. For records and composites, IS NULL returns true if the record/composite itself is NULL or if all values in the record/composite are NULL. IS NOT NULL returns true only if all values in the record/composite are not NULL. Note that this means, for records and composites in Postgres, IS NOT NULL is not equivalent to NOT(IS NULL).

This change adds custom implementations of IS NULL and IS NOT NULL to support Postgres' behavior with records and composites.

Depends on: dolthub/go-mysql-server#3064

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2025

Main PR
covering_index_scan_postgres 347.58/s 339.78/s -2.3%
index_join_postgres 157.83/s 157.69/s -0.1%
index_join_scan_postgres 190.62/s 190.30/s -0.2%
index_scan_postgres 12.40/s 12.50/s +0.8%
oltp_point_select 2495.94/s 2490.76/s -0.3%
oltp_read_only 1826.41/s 1828.51/s +0.1%
select_random_points 118.44/s 117.19/s -1.1%
select_random_ranges 133.76/s 131.74/s -1.6%
table_scan_postgres 11.70/s 11.64/s -0.6%
types_table_scan_postgres 5.48/s 5.43/s -1.0%

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2025

Main PR
Total 42090 42090
Successful 16646 16656
Failures 25444 25434
Partial Successes1 5539 5542
Main PR
Successful 39.5486% 39.5723%
Failures 60.4514% 60.4277%

${\color{lightgreen}Progressions (10)}$

arrays

QUERY: select array_to_string(NULL::int4[], ',') IS NULL;

create_view

QUERY: CREATE VIEW key_dependent_view AS
   SELECT * FROM view_base_table GROUP BY key;

json

QUERY: select (test_json->'field3') is null as expect_false
from test_json
where json_type = 'object';
QUERY: select (test_json->3) is null as expect_false
from test_json
where json_type = 'array';

jsonb

QUERY: SELECT (test_json->'field3') IS NULL AS expect_false FROM test_jsonb WHERE json_type = 'object';
QUERY: SELECT (test_json->3) IS NULL AS expect_false FROM test_jsonb WHERE json_type = 'array';

rowtypes

QUERY: select ROW() IS NULL;

strings

QUERY: SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
QUERY: SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;

text

QUERY: select concat_ws(NULL,10,20,null,30) is null;

Footnotes

  1. These are tests that we're marking as Successful, however they do not match the expected output in some way. This is due to small differences, such as different wording on the error messages, or the column names being incorrect while the data itself is correct.

@fulghum fulghum requested a review from zachmu June 2, 2025 22:13
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but see comments

Copy link
Collaborator

@Hydrocharged Hydrocharged left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong approach. In dolthub/go-mysql-server#3004, you mentioned that the analyzer uses *expression.IsNull for optimization passes, but what exactly are those and how do they break? In GMS we make the assumption that IS NULL == NOT(IS NULL), but from reading the Postgres documentation, this does not seem to be true.

https://www.postgresql.org/docs/15/functions-comparison.html

If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests.

This implies that IS NULL and IS NOT NULL are fundamentally different comparison operations, which would evaluate to them being different functions/expressions. I ask about the optimization passes, as assumptions made there are for specific MySQL characteristics, and while MySQL and Postgres share a lot, they are overall different engines.

I ran two test queries through this PR, and it indeed fails as expected:

SELECT ROW(null, 1) IS NULL;      ->  0  (right value, wrong type)
SELECT ROW(null, 1) IS NOT NULL;  -> "t" (should be "f")

@Hydrocharged
Copy link
Collaborator

Hydrocharged commented Jun 3, 2025

Another (somewhat side) comment, types.TupleValue was introduced in the GMS PR, but our RecordValue type is only for Postgres concepts. To my knowledge, no equivalent structure exists in MySQL (or GMS), and we aren't even completely sure if RecordValue actually works for all RECORD values in all situations (hence why we're not allowing created tables with them). It makes more sense for the value type to stay in Doltgres since it's only applicable to this project, and it makes future development easier and modifications don't have to cross package boundaries.

I think the better abstraction for GMS would be to have types as a pluggable architecture, where MySQL implements their types, and Postgres implements theirs. This would be a part of the transition to go-sql-server, with GMS and a possible go-postgres-server being specific dependencies of go-sql-server, but of course that's too large of a project for us to tackle at this time. But I feel that's the better direction for us to make changes in. I absolutely feel that, rather than try to get existing GMS constructs to work with Postgres, it makes more sense to implement the correct Postgres functionality, and factor out the commonality in GMS. It's much harder considering GMS is quite well optimized for MySQL engine functionality replication so we're having to work through those optimizations, but I think it makes our jobs easier in both the short and longer term.

All of the above I feel @zachmu should read as well, and we can go into more detail in our next sync.

@fulghum
Copy link
Contributor Author

fulghum commented Jun 4, 2025

Good callout on the fact that Postgres does not treat IS NOT NULL as equivalent to NOT(IS NULL) for record types. Thanks for calling out that section in the Postgres docs – I had missed that when scanning for data on records, but will dig in deeper.

Yup, this means we'll have to rework some of this expression logic in GMS. I'll start digging into how/where expression.IsNull is used in more detail and work on some ideas.

@fulghum fulghum force-pushed the fulghum/tuple_is_null branch 5 times, most recently from 8e25fde to 2dadaa6 Compare July 8, 2025 20:25
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one comment

@fulghum fulghum force-pushed the fulghum/tuple_is_null branch from 2dadaa6 to c9e9a62 Compare July 11, 2025 19:10
@fulghum fulghum merged commit 1a7be78 into main Jul 11, 2025
14 checks passed
@fulghum fulghum deleted the fulghum/tuple_is_null branch July 11, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants