Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion macros/utils/data_types/data_type_list.sql
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@
{%- else %}
{{ return([]) }}
{%- endif %}

{% endmacro %}

{% macro clickhouse__data_type_list(data_type) %}
Expand All @@ -151,3 +151,23 @@
{%- endif %}

{% endmacro %}

{% macro dremio__data_type_list(data_type) %}
{% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
{% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
{% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
{% set boolean_list = ['BOOLEAN', 'BIT'] | list %}

{%- if data_type == 'string' %}
{{ return(string_list) }}
{%- elif data_type == 'numeric' %}
{{ return(numeric_list) }}
{%- elif data_type == 'timestamp' %}
{{ return(timestamp_list) }}
{%- elif data_type == "boolean" %}
{{ return(boolean_list) }}
{%- else %}
{{ return([]) }}
{%- endif %}

{% endmacro %}
Comment on lines +155 to +173
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Dremio type list added — verify aliases and TZ types; consider adding REAL/BOOL
Good addition and consistent with the dispatch pattern. A couple of points to tighten correctness vs Dremio’s reported data_type values:

  • Add aliases: REAL (numeric) and BOOL (boolean) are commonly seen; consider including them.
  • Verify whether BIT is actually returned by Dremio INFORMATION_SCHEMA vs upstream sources; it might be safer to include BOOL and keep BIT only if confirmed.
  • Verify support for TIME WITH TIME ZONE and TIMESTAMP WITH TIME ZONE in Dremio; if unsupported, they should be removed to avoid false-positive matches.

Proposed minimal improvement (adds REAL and BOOL; leave BIT and TZ types pending verification):

 {% macro dremio__data_type_list(data_type) %}
-    {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
-    {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
-    {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
-    {% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
+    {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
+    {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','REAL','SMALLINT','TINYINT'] | list %}
+    {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
+    {% set boolean_list = ['BOOLEAN', 'BOOL', 'BIT'] | list %}

Additionally, if Dremio reports fixed-length CHAR/CHARACTER in INFORMATION_SCHEMA for some sources, consider including:

  • string_list: add 'CHAR', 'CHARACTER'

Please confirm with Dremio docs and/or INFORMATION_SCHEMA outputs before finalizing removals.

To validate support and normalize to actual INFORMATION_SCHEMA values, please check Dremio docs for supported data types and the canonical strings reported in INFORMATION_SCHEMA:


🌐 Web query:

Dremio SQL data types and INFORMATION_SCHEMA data_type values. Specifically: does Dremio support SMALLINT, TINYINT, REAL, BOOL, BIT, TIME WITH TIME ZONE, and TIMESTAMP WITH TIME ZONE, and what are the exact canonical type names returned?

💡 Result:

Dremio does not natively support canonical SQL types SMALLINT, TINYINT, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE; it maps them to other types. The canonical type names returned in INFORMATION_SCHEMA and Dremio metadata for supported types are as follows:

Support for Specific Types & Canonical Type Names

Supported by Dremio

  • BIT: Mapped to BOOLEAN[7][1].
  • BOOL: Returned as BOOLEAN[7][1].
  • REAL: Returned as FLOAT (single-precision float)[1][3][4][7].
  • SMALLINT: Mapped to INTEGER[1][6][7][9].
  • TINYINT: Mapped to INTEGER[1][6][9].
  • TIME: Returned as TIME (without time zone)[1][6][7].
  • TIMESTAMP: Returned as TIMESTAMP (without time zone)[1][6][7].
  • TIMESTAMP WITH TIME ZONE: Mapped to TIMESTAMP or TIMESTAMP (without TZ information—Dremio ignores/strips the time zone part)[6][7].

Not Supported as Canonical Types

  • TIME WITH TIME ZONE: Mapped to TIME (no TZ)[6][7].
  • TIMESTAMP WITH TIME ZONE: Mapped to TIMESTAMP (no TZ)[6][7].
  • SMALLINT/TINYINT: Returned as INTEGER (not as "SMALLINT" or "TINYINT")[1][6][7][9].

Exact Canonical Type Names

The values Dremio reports in metadata (including INFORMATION_SCHEMA.COLUMNS) are the canonical type names listed below (case-sensitive, all uppercase):

  • INTEGER
  • BIGINT
  • FLOAT
  • DOUBLE
  • BOOLEAN
  • DECIMAL
  • VARBINARY
  • VARCHAR
  • DATE
  • TIME
  • TIMESTAMP
  • INTERVAL (day to seconds)
  • INTERVAL (years to months)

These are the types you will observe when querying for column data types in Dremio's metadata tables and via INFORMATION_SCHEMA[1][3][4][6][7].

Mappings from External Sources

  • SQL Server: SMALLINT, TINYINT, BITINTEGER; BOOL/BITBOOLEAN; REALFLOAT; no support for TIME WITH TIME ZONE or true TIMESTAMP WITH TIME ZONE (mapped to TIME/TIMESTAMP)[1].
  • PostgreSQL: BOOL/BITBOOLEAN; SMALLINT/TINYINTINTEGER; REALFLOAT; TIMESTAMPTZ/TIMETZTIMESTAMP/TIME (zone info stripped)[7].
  • Teradata: SMALLINT/TINYINT/BYTEINTINTEGER; TIME WITH TIMEZONETIME; TIMESTAMP WITH TIMEZONETIMESTAMP[6].

Summary Table

SQL Type Canonical Dremio Type Supported/Mapping Notes
SMALLINT INTEGER Mapped
TINYINT INTEGER Mapped
REAL FLOAT Canonical name is FLOAT
BIT BOOLEAN Mapped
BOOL BOOLEAN Canonical name is BOOLEAN
TIME WITH TIME ZONE TIME Zone info not stored
TIMESTAMP WITH TIME ZONE TIMESTAMP Zone info not stored

If you query INFORMATION_SCHEMA.COLUMNS, these canonical types (INTEGER, FLOAT, BOOLEAN, TIME, TIMESTAMP, etc.) will be returned—not the original SQL type names from the source system[1][3][6][7][9].

If you need to preserve the distinction between, for example, SMALLINT/TINYINT and INTEGER or retain time zone information, Dremio does not natively support this at the type level. You'll need to manage such semantics in your data model or ETL process outside Dremio.

Citations:


Update Dremio data_type_list to use only canonical INFORMATION_SCHEMA types

Dremio only returns the uppercase canonical type names—it does not emit SMALLINT, TINYINT, REAL, BOOL, BIT, or TZ-qualified types in INFORMATION_SCHEMA.COLUMNS. Tighten the macro to match exactly what Dremio reports:

• File: macros/utils/data_types/data_type_list.sql
• Lines: 155–173

 {% macro dremio__data_type_list(data_type) %}
-    {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
-    {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
-    {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
-    {% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
+    {% set string_list    = ['VARCHAR']                                         | list %}
+    {% set numeric_list   = ['INTEGER','BIGINT','FLOAT','DOUBLE','DECIMAL']       | list %}
+    {% set timestamp_list = ['DATE','TIME','TIMESTAMP']                           | list %}
+    {% set boolean_list   = ['BOOLEAN']                                          | list %}
 
     {%- if data_type == 'string' %}
         {{ return(string_list) }}
@@ unchanged @@
 {% endmacro %}

This aligns the lists with what Dremio’s INFORMATION_SCHEMA actually returns.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% macro dremio__data_type_list(data_type) %}
{% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
{% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
{% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
{% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
{%- if data_type == 'string' %}
{{ return(string_list) }}
{%- elif data_type == 'numeric' %}
{{ return(numeric_list) }}
{%- elif data_type == 'timestamp' %}
{{ return(timestamp_list) }}
{%- elif data_type == "boolean" %}
{{ return(boolean_list) }}
{%- else %}
{{ return([]) }}
{%- endif %}
{% endmacro %}
{% macro dremio__data_type_list(data_type) %}
{% set string_list = ['VARCHAR'] | list %}
{% set numeric_list = ['INTEGER','BIGINT','FLOAT','DOUBLE','DECIMAL'] | list %}
{% set timestamp_list = ['DATE','TIME','TIMESTAMP'] | list %}
{% set boolean_list = ['BOOLEAN'] | list %}
{%- if data_type == 'string' %}
{{ return(string_list) }}
{%- elif data_type == 'numeric' %}
{{ return(numeric_list) }}
{%- elif data_type == 'timestamp' %}
{{ return(timestamp_list) }}
{%- elif data_type == "boolean" %}
{{ return(boolean_list) }}
{%- else %}
{{ return([]) }}
{%- endif %}
{% endmacro %}

Loading