feat(snowflake)!: Transpilation support for Snowflake's BITMAP_CONSTRUCT_AGG function to DuckDB#6745
Conversation
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 10938 total, 9870 passed (pass rate: 90.2%), sqlglot version: sqlglot:feature/transpile-bitmapconstruct: 10938 total, 9870 passed (pass rate: 90.2%), sqlglot version: Difference: No change |
| # Phase 1: Data Preparation (SELECT LIST_SORT): removes nulls, deduplicates, sorts the input list | ||
| # Phase 2: Hex String Construction (LIST_TRANSFORM): builds hex representation of values | ||
| # Phase 3: Final Assembly (CASE): constructs final bitmap based on size of unique values | ||
| BITMAP_CONSTRUCT_AGG_TEMPLATE: exp.Expression = exp.maybe_parse( |
There was a problem hiding this comment.
Hey @fivetran-kwoodbeck, how did you arrive at this template? Is this documented somewhere? It looks fairly complicated.
There was a problem hiding this comment.
I'm unable to review the logic as-is and am also hesitant about getting it in. If we need something this complicated to transpile, I'd rather we just use self.unsupported instead. These BITMAP_* functions should also likely be deprioritized, I doubt they're that frequent in Snowflake land. I'll adjust the task priorities shortly.
There was a problem hiding this comment.
lol, I picked the construct one because it was hard. The flow is to sanitize the input, pack it into hex, then print it out. Snowflake has some nuances that make it more complicated, but it works on an exhaustive test set. Why would we not include it if it works?
There was a problem hiding this comment.
I think all of the other BITMAP functions have been transpiled. BITMAP_CONSTRUCT_AGG is the gateway to the BITMAP functions, as that's how they're able to be created in the first place.
There was a problem hiding this comment.
What was this implementation based on? I cannot reason about this at the moment, there's a lot of stuff going on. At the very least, the template should be sufficiently documented in a docstring or something, to help folks debug it in the future, if needed. The Snowflake docs on this function are lacking from what I saw.
There was a problem hiding this comment.
I read through the documentation here to understand what it does, but also experimented to see observed behavior (see Jira). I can enhance the documentation, not sure how detailed you want but I'll update it.
There was a problem hiding this comment.
Ok, let's make sure the implementation is properly documented with a comment next to the template and we can get it in afterwards.
There was a problem hiding this comment.
I expanded the documentation, good idea because that made me realize we didn't have a range check (added).
Added transpilation of BITMAP_CONSTRUCT_AGG from Snowflake to DuckDB.
Details:
- Uses BITMAP_CONSTRUCT_AGG_TEMPLATE - pre-parsed SQL template
- Added bitmapconstructagg_sql() method that replicates Snowflake's bitmap binary format:
- Small (<5 values): 2-byte big-endian count + little-endian values + padding to 10 bytes
- Large (≥5 values): 10-byte header (0x08 + 9 zeros) + little-endian values
See Jira for full testing.