-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-4813] ANY_VALUE assumes that arguments should be comparable #4699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| !ok | ||
|
|
||
| # [CALCITE-4813] ANY_VALUE assumes that arguments should be comparable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the limitations of pgsql, it cannot support any_value, so it was verified in duckdb and works well.
duckdb> select any_value(r) over(), s from(select array[f, s] r, s from (select 1 as f, 2 as s) t) t;
┌──────────────────────┬───┐
│ any_value(r) OVER () ┆ s │
╞══════════════════════╪═══╡
│ [1, 2] ┆ 2 │
└──────────────────────┴───┘
duckdb> select any_value(r) over(), s from(select map{f: s} r, s from (select 1 as f, 2 as s) t) t;
┌──────────────────────┬───┐
│ any_value(r) OVER () ┆ s │
╞══════════════════════╪═══╡
│ {1: 2} ┆ 2 │
└──────────────────────┴───┘
duckdb> select any_value(r) over(), s from(select row(f, s) r, s from (select 1 as f, 2 as s) t) t;
┌──────────────────────┬───┐
│ any_value(r) OVER () ┆ s │
╞══════════════════════╪═══╡
│ {: 1, : 2} ┆ 2 │
└──────────────────────┴───┘
|
|
||
| !update | ||
|
|
||
| select |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also has verified in duckdb and works well.
CREATE TABLE complex_t (
a INTEGER[],
m MAP(VARCHAR, DOUBLE),
r STRUCT(
r1 VARCHAR,
r2 INTEGER,
r3 VARCHAR
)
);
INSERT INTO complex_t VALUES
([1,2,3,4,5],MAP{'math':95.5,'science':88.0,'english':92.3},ROW('Alice Johnson',30,'a')),
([10,20,30,40,50,60],MAP{'physics':96.2,'chemistry':91.8,'biology':89.5,'computer_science':98.7},ROW('Bob Smith',25,'b')),
([100,200,300],MAP{'leadership':88.9,'teamwork':94.2,'communication':91.5,'problem_solving':97.8},ROW('Charlie Chen',35,'c'));
duckdb> SELECT max(a) AS max_a, max(m) AS max_m , max(r) AS max_r, min(a) AS min_a , min(m) AS min_m, min(r) AS min_r FROM complex_t;
┌─────────────────┬────────────────────────────┬───────────────────────────────────────┬─────────────────┬──────────────────────────────┬──────────────────────────────────────┐
│ max_a ┆ max_m ┆ max_r ┆ min_a ┆ min_m ┆ min_r │
╞═════════════════╪════════════════════════════╪═══════════════════════════════════════╪═════════════════╪══════════════════════════════╪══════════════════════════════════════╡
│ [100, 200, 300] ┆ {physics: 96.2, chemistry: ┆ {r1: Charlie Chen, r2: 35, r3: c} ┆ [1, 2, 3, 4, 5] ┆ {leadership: 88.9, teamwork: ┆ {r1: Alice Johnson, r2: 30, r3: a} │
│ ┆ 91.8, biology: 89.5, ┆ ┆ ┆ 94.2, communication: 91.5, ┆ │
│ ┆ computer_science: 98.7} ┆ ┆ ┆ problem_solving: 97.8} ┆ │
└─────────────────┴────────────────────────────┴───────────────────────────────────────┴─────────────────┴──────────────────────────────┴──────────────────────────────────────┘
fed9f3c to
100a8db
Compare
| /** | ||
| * Compares two maps. | ||
| * | ||
| * <p>Since maps in Calcite are implemented using {@link java.util.LinkedHashMap}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not crazy about this. Is the order of insertion of elements in a map deterministic? Probably not in general, e.g., if you use an aggregate to build the map.
In our implementation the maps are actually sorted by key. What do other system say about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that a LinkedHashMap can maintain the insertion order, allowing for comparisons based on that sequence. Initially, I sorted the data before comparing, but I found that this approach was inconsistent with DuckDB's behavior. The main issue is that I couldn’t find another database to compare results with. Honestly, I don’t have a strong preference for whether sorting should be done before comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, linkedhashmap retains insertion order, but that order is not defined in general for SQL, e.g., SELECT MAP(SELECT a, b FROM x) does not guarantee a specific row order.
You can leave it as it is, but the results are non-deterministic as defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, SQL is indeed nondeterministic, but in the current implementation of Calcite, it is deterministic. So I understand that executing
"SELECT MAP(SELECT a, b FROM x)" multiple times yields consistent results, which also explains why the test cases with SQL queries lacking
"ORDER BY" can still run stably. However, this is admittedly not rigorous. So, are you leaning towards sorting before comparison? Although this is inconsistent with DuckDB's behavior, it is justifiable. Perhaps other databases do the same, and I just haven't noticed it yet. What do you think is the better approach for me to choose?
(P.S.: There is currently an issue, though unrelated to our discussion, that I will also need to fix later.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are two separate issues: the order of insertion in a MAP is not defined (depending on how the map is created), and even if the order of insertion is defined, the order of elements in a MAP is not defined for comparisons.
For the tested implementation both may be deterministic, but we have to be careful to explain that these orders may not be the same as in other implementations. Maybe that's all that's required: add a comment that for linkedhashmap the order is deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already fixed this method, but I haven’t sorted it, and I have also improved the comments. What do you think?
3beeb8d to
186c468
Compare
|
Thanks for review @mihaibudiu, I will wait for 24-48 hours, if no futrue comments I will merge ths PR. |
|



See CALCITE-4813