Skip to content

Conversation

@xiedeyantu
Copy link
Member


!ok

# [CALCITE-4813] ANY_VALUE assumes that arguments should be comparable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the limitations of pgsql, it cannot support any_value, so it was verified in duckdb and works well.

duckdb> select any_value(r) over(), s from(select array[f, s] r, s from (select 1 as f, 2 as s) t) t;
┌──────────────────────┬───┐
│ any_value(r) OVER () ┆ s │
╞══════════════════════╪═══╡
│ [1, 2]               ┆ 2 │
└──────────────────────┴───┘

duckdb> select any_value(r) over(), s from(select map{f: s} r, s from (select 1 as f, 2 as s) t) t;
┌──────────────────────┬───┐
│ any_value(r) OVER () ┆ s │
╞══════════════════════╪═══╡
│ {1: 2}               ┆ 2 │
└──────────────────────┴───┘

duckdb> select any_value(r) over(), s from(select row(f, s) r, s from (select 1 as f, 2 as s) t) t;
┌──────────────────────┬───┐
│ any_value(r) OVER () ┆ s │
╞══════════════════════╪═══╡
│ {: 1, : 2}           ┆ 2 │
└──────────────────────┴───┘


!update

select
Copy link
Member Author

@xiedeyantu xiedeyantu Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also has verified in duckdb and works well.

CREATE TABLE complex_t (
    a INTEGER[],
    m MAP(VARCHAR, DOUBLE),
    r STRUCT(
        r1 VARCHAR,
        r2 INTEGER,
        r3 VARCHAR
    )
);

INSERT INTO complex_t VALUES 
([1,2,3,4,5],MAP{'math':95.5,'science':88.0,'english':92.3},ROW('Alice Johnson',30,'a')),
([10,20,30,40,50,60],MAP{'physics':96.2,'chemistry':91.8,'biology':89.5,'computer_science':98.7},ROW('Bob Smith',25,'b')),
([100,200,300],MAP{'leadership':88.9,'teamwork':94.2,'communication':91.5,'problem_solving':97.8},ROW('Charlie Chen',35,'c'));

duckdb> SELECT max(a) AS max_a, max(m) AS max_m , max(r) AS max_r, min(a) AS min_a , min(m) AS min_m, min(r) AS min_r FROM complex_t;
┌─────────────────┬────────────────────────────┬───────────────────────────────────────┬─────────────────┬──────────────────────────────┬──────────────────────────────────────┐
│ max_a           ┆ max_m                      ┆ max_r                                 ┆ min_a           ┆ min_m                        ┆ min_r                                │
╞═════════════════╪════════════════════════════╪═══════════════════════════════════════╪═════════════════╪══════════════════════════════╪══════════════════════════════════════╡
│ [100, 200, 300] ┆ {physics: 96.2, chemistry: ┆ {r1: Charlie Chen, r2: 35, r3: c}     ┆ [1, 2, 3, 4, 5] ┆ {leadership: 88.9, teamwork: ┆ {r1: Alice Johnson, r2: 30, r3: a}   │
│                 ┆ 91.8, biology: 89.5,       ┆                                       ┆                 ┆ 94.2, communication: 91.5,   ┆                                      │
│                 ┆ computer_science: 98.7}    ┆                                       ┆                 ┆ problem_solving: 97.8}       ┆                                      │
└─────────────────┴────────────────────────────┴───────────────────────────────────────┴─────────────────┴──────────────────────────────┴──────────────────────────────────────┘

@xiedeyantu xiedeyantu force-pushed the CALCITE-4813-1 branch 2 times, most recently from fed9f3c to 100a8db Compare December 19, 2025 15:55
/**
* Compares two maps.
*
* <p>Since maps in Calcite are implemented using {@link java.util.LinkedHashMap},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not crazy about this. Is the order of insertion of elements in a map deterministic? Probably not in general, e.g., if you use an aggregate to build the map.

In our implementation the maps are actually sorted by key. What do other system say about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that a LinkedHashMap can maintain the insertion order, allowing for comparisons based on that sequence. Initially, I sorted the data before comparing, but I found that this approach was inconsistent with DuckDB's behavior. The main issue is that I couldn’t find another database to compare results with. Honestly, I don’t have a strong preference for whether sorting should be done before comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, linkedhashmap retains insertion order, but that order is not defined in general for SQL, e.g., SELECT MAP(SELECT a, b FROM x) does not guarantee a specific row order.

You can leave it as it is, but the results are non-deterministic as defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, SQL is indeed nondeterministic, but in the current implementation of Calcite, it is deterministic. So I understand that executing
"SELECT MAP(SELECT a, b FROM x)" multiple times yields consistent results, which also explains why the test cases with SQL queries lacking
"ORDER BY" can still run stably. However, this is admittedly not rigorous. So, are you leaning towards sorting before comparison? Although this is inconsistent with DuckDB's behavior, it is justifiable. Perhaps other databases do the same, and I just haven't noticed it yet. What do you think is the better approach for me to choose?

(P.S.: There is currently an issue, though unrelated to our discussion, that I will also need to fix later.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two separate issues: the order of insertion in a MAP is not defined (depending on how the map is created), and even if the order of insertion is defined, the order of elements in a MAP is not defined for comparisons.

For the tested implementation both may be deterministic, but we have to be careful to explain that these orders may not be the same as in other implementations. Maybe that's all that's required: add a comment that for linkedhashmap the order is deterministic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already fixed this method, but I haven’t sorted it, and I have also improved the comments. What do you think?

@xiedeyantu
Copy link
Member Author

Thanks for review @mihaibudiu, I will wait for 24-48 hours, if no futrue comments I will merge ths PR.

@sonarqubecloud
Copy link

@xiedeyantu xiedeyantu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Dec 22, 2025
@xiedeyantu xiedeyantu merged commit ae42f77 into apache:main Dec 24, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants