Skip to content

Conversation

@GuyEshdat
Copy link
Contributor

@GuyEshdat GuyEshdat commented Aug 17, 2025

Summary by CodeRabbit

  • New Features

    • Improved Dremio compatibility by adding standardized data-type lists (string, numeric, timestamp, boolean) for more reliable type-aware behavior across models and macros.
  • Tests

    • Updated integration tests to recognize Dremio as a target for exposure schema validations, ensuring correct type mappings and error detection for Dremio environments.

@linear
Copy link

linear bot commented Aug 17, 2025

ELE-4931 dremio cll

@coderabbitai
Copy link

coderabbitai bot commented Aug 17, 2025

Walkthrough

Adds a new macro dremio__data_type_list(data_type) that returns Dremio-compatible type-name lists for 'string', 'numeric', 'timestamp', and 'boolean'; updates two integration tests to include "dremio" target mappings; minor whitespace tweaks.

Changes

Cohort / File(s) Summary of changes
Dremio data type list macro
macros/utils/data_types/data_type_list.sql
Added macro dremio__data_type_list(data_type) defining string_list, numeric_list, timestamp_list, and boolean_list and returning the appropriate list for 'string', 'numeric', 'timestamp', or 'boolean'; returns empty list for other values. Minor whitespace adjustments.
Integration test updates
integration_tests/tests/test_exposure_schema_validity.py
Added "dremio" to target-specific mappings in two tests so Dremio uses the expected explicit target dtype (other mapping and "int" mapping) instead of falling back to defaults.

Sequence Diagram(s)

(omitted — changes are a macro addition and small test adjustments; no new runtime control flow to diagram)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • haritamar

Poem

I twitch my nose at types, four neat arrays,
Strings, nums, ticks, and truths in tidy trays.
Hop-hop I sort, no fuss, no fright—
Empty basket if it’s not quite right.
In Dremio fields I nibble and play,
Mapping data the bun-bun way. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ele-4931-dremio-types-mapping

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

👋 @GuyEshdat
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
macros/utils/data_types/data_type_list.sql (1)

155-173: Dremio mapping added — consider case coverage and a few type synonyms; verify BIT and “WITH TIME ZONE”

Good addition and consistent with the adapter pattern. Two follow-ups to improve robustness across information_schema/introspection outputs:

  • Include upper- and lower-case variants to avoid case-sensitivity mismatches (other adapters trend toward matching the adapter’s native casing; Dremio often surfaces uppercase).
  • Consider adding common string synonyms CHAR/CHARACTER.
  • Verify whether Dremio surfaces BIT as BOOLEAN and whether TIME/TIMESTAMP WITH TIME ZONE are valid type names. If yes, include both cases; if not, drop to reduce false positives.

Proposed update (keeps BIT and WITH TIME ZONE entries but adds uppercase variants and CHAR/CHARACTER):

 {% macro dremio__data_type_list(data_type) %}
-    {% set string_list = ['varchar', 'character varying'] | list %}
-    {% set numeric_list = ['int','integer','bigint','double','decimal','float','smallint','tinyint'] | list %}
-    {% set timestamp_list = ['date','time','timestamp', 'time with time zone', 'timestamp with time zone'] | list %}
-    {% set boolean_list = ['boolean', 'bit'] | list %}
+    {# Include both lower/upper-case synonyms to match information_schema/introspection outputs #}
+    {% set string_list = ['varchar','character varying','char','character','VARCHAR','CHARACTER VARYING','CHAR','CHARACTER'] | list %}
+    {% set numeric_list = ['int','integer','bigint','double','decimal','float','smallint','tinyint','INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
+    {% set timestamp_list = ['date','time','timestamp','time with time zone','timestamp with time zone','DATE','TIME','TIMESTAMP','TIME WITH TIME ZONE','TIMESTAMP WITH TIME ZONE'] | list %}
+    {% set boolean_list = ['boolean','bit','BOOLEAN','BIT'] | list %}
 
     {%- if data_type == 'string' %}
         {{ return(string_list) }}
     {%- elif data_type == 'numeric' %}
         {{ return(numeric_list) }}
     {%- elif data_type == 'timestamp' %}
         {{ return(timestamp_list) }}
     {%- elif data_type == "boolean" %}
         {{ return(boolean_list) }}
     {%- else %}
         {{ return([]) }}
     {%- endif %}
 
 {% endmacro %}

If docs confirm BIT is not a Dremio type, or WITH TIME ZONE variants are invalid, we should remove those entries to avoid misclassification. I can adjust the diff accordingly once confirmed.

Would you like me to check Dremio’s latest type docs and update the lists precisely, or add adapter-level unit tests to validate the classification against a mocked information_schema?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 178e57c and 3de3486.

📒 Files selected for processing (1)
  • macros/utils/data_types/data_type_list.sql (2 hunks)
🔇 Additional comments (2)
macros/utils/data_types/data_type_list.sql (2)

132-132: No-op whitespace change

Trivial whitespace-only change. No action needed.


154-154: No-op empty line addition

This is a formatting-only change. No action needed.

@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat had a problem deploying to elementary_test_env August 17, 2025 12:26 — with GitHub Actions Failure
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 13:32 — with GitHub Actions Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integration_tests/tests/test_exposure_schema_validity.py (1)

58-62: Rename variable for clarity; confirm “other” is the intended Dremio mapping

Adding "dremio" makes the name explicit_target_for_bigquery misleading. A more neutral name clarifies intent. Also, please confirm that "other" is the correct canonical value for the referenced string dtype on Dremio in this test context.

Apply this small rename for clarity:

-    explicit_target_for_bigquery = (
+    exposure_string_dtype = (
         "other"
         if dbt_project.dbt_runner.target in ["bigquery", "snowflake", "dremio", ""]
         else "string"
     )
...
-                            "data_type": explicit_target_for_bigquery,
+                            "data_type": exposure_string_dtype,

Also applies to: 71-73

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3de3486 and 48d3e59.

📒 Files selected for processing (1)
  • integration_tests/tests/test_exposure_schema_validity.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-10T11:29:19.004Z
Learnt from: GuyEshdat
PR: elementary-data/dbt-data-reliability#838
File: integration_tests/tests/dbt_project.py:191-201
Timestamp: 2025-08-10T11:29:19.004Z
Learning: In the Elementary dbt package integration tests, BigQuery works correctly with the default `("database", "schema")` property mapping in the `get_database_and_schema_properties` function. When using `target.database` and `target.schema` in source definitions, BigQuery's dbt adapter handles these references appropriately without requiring special mapping to `project` and `dataset`.

Applied to files:

  • integration_tests/tests/test_exposure_schema_validity.py
🧬 Code Graph Analysis (1)
integration_tests/tests/test_exposure_schema_validity.py (1)
integration_tests/tests/conftest.py (2)
  • dbt_project (88-89)
  • target (93-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (1.8.0, postgres) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, trino) / test
🔇 Additional comments (1)
integration_tests/tests/test_exposure_schema_validity.py (1)

125-134: Resolved: Dremio numeric mapping is correct

The dremio__data_type_list macro’s numeric_list includes both "int" and "integer", so mapping Dremio to "int" in the test aligns perfectly with the adapter. No changes required here.

@GuyEshdat GuyEshdat merged commit 2579747 into master Aug 17, 2025
15 checks passed
@GuyEshdat GuyEshdat deleted the ele-4931-dremio-types-mapping branch August 17, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants