Skip to content

Conversation

@treysp
Copy link
Collaborator

@treysp treysp commented Nov 18, 2025

Notes:

  • Bigquery does not document the specific algorithm they use, so results may differ from DuckDB (which uses t-digest)
  • DuckDB does not support the BQ RESPECT NULLS option

@treysp treysp force-pushed the trey/approx-quantiles branch from b25bcd4 to 726e0df Compare November 20, 2025 19:24
@treysp treysp force-pushed the trey/approx-quantiles branch 6 times, most recently from a84c84c to 69a5fa6 Compare November 21, 2025 18:49
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only thing I'd like to understand a bit more is what the results look like in source/target. Do they mostly match, with the only difference being how the estimation happens? Types and all included?

@treysp treysp force-pushed the trey/approx-quantiles branch from 69a5fa6 to a735c18 Compare November 21, 2025 20:28
@treysp treysp force-pushed the trey/approx-quantiles branch from a735c18 to 1aa03ab Compare November 21, 2025 20:30
@treysp
Copy link
Collaborator Author

treysp commented Nov 21, 2025

Looks good, only thing I'd like to understand a bit more is what the results look like in source/target. Do they mostly match, with the only difference being how the estimation happens? Types and all included?

Yep, good match other than algorithm

test_type input_values BigQuery Duckdb
BigQuery BIGNUMERIC → DuckDB DECIMAL/HUGEINT [1, 50, 100] [1,50,100] [1.000,50.500,100.000]
BigQuery FLOAT64 → DuckDB DOUBLE [1, 50, 100] [1,50,100] [1.0,50.5,100.0]
BigQuery INT64 → DuckDB BIGINT [1, 50, 100] [1,50,100] [1,50,100]
Tests with negative numbers [-100, -50, 0] [-100,-51,-1] [-100,-50,0]
Tests DISTINCT modifier Removes duplicates [0,4,9] [0,4,9]
Tests NULL handling Mixed with NULLs [1,49,99] [1,50,99]
BigQuery NUMERIC → DuckDB DECIMAL [1, 50, 100] [1,50,100] [1.000,50.500,100.000]
BigQuery DATE → DuckDB DATE [2024-01-01, 2024-02-19, 2024-04-10] [2024-01-02, 2024-02-20, 2024-04-10] [2024-01-02, 2024-02-20, 2024-04-10]
BigQuery DATETIME → DuckDB TIMESTAMP [2024-01-01 00:00:00] [2024-01-01T00:00:00, 2024-01-01T00:00:00, 2024-01-01T00:00:00] [2024-01-01 00:00:00+00, 2024-01-01 00:00:00+00, 2024-01-01 00:00:00+00]
BigQuery TIME → DuckDB TIME [00:00:00] [00:00:00, 00:00:00, 00:00:00] [00:00:00, 00:00:00, 01:27:49.182464]
BigQuery TIMESTAMP → DuckDB TIMESTAMP WITH TIME ZONE [2024-01-01 00:00:00 UTC] [2024-01-01T00:00:00Z, 2024-01-01T00:00:00Z, 2024-01-01T00:00:00Z] [2024-01-01 00:00:00+00, 2024-01-01 00:00:00+00, 2024-01-01 00:00:00+00]

@georgesittas
Copy link
Collaborator

Great, thanks!

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work

@georgesittas georgesittas merged commit e4ea6cc into main Nov 24, 2025
8 checks passed
@georgesittas georgesittas deleted the trey/approx-quantiles branch November 24, 2025 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants