Add strategies to reflections by bcmeireles · Pull Request #285 · dremio/dbt-dremio

bcmeireles · 2025-03-14T14:24:43Z

Summary

Currently when creating a reflection, dbt marks the job as done as soon as it sends the API call to Dremio. This new option allows users to only continue after the reflection is fully created on Dremio's side.

Description

Introduces a new strategy: reflection_strategy with 3 valid values: trigger, wait and depend.
- trigger is the default and keeps the old behavior
- wait will try to wait for the reflection to be live, up to a maximum of max_wait_time seconds, 30 being the default value
- depend will not proceed until the reflection is live
New config options for reflections:
- max_wait_time: Maximum time in seconds that the wait strategy will wait for the reflection to be ready. Default value is 30
- check_interval: Time between the API calls that check if a reflection is ready. Default value is 5

Test Results

Added 2 extra tests, one for trigger and another one for wait

Changelog

Added a summary of what this PR accomplishes to CHANGELOG.md

Related Issue

Suggested in #184

tests/functional/adapter/dremio_specific/test_reflections.py

simonpannek · 2025-03-18T14:05:45Z

CHANGELOG.md

+    - `strategy`: `trigger`, `wait` or `depend`
+    - `max_wait_time`: time in seconds to wait for the reflection to be created before moving on to other models
+      - default: `30`
+    - `wait_interval`: time in seconds to wait between checks for the reflection to be created


I think you updates this to be called check_interval instead

simonpannek · 2025-03-18T14:08:14Z

dbt/adapters/dremio/connections.py

                          computations: List[str], partition_by: List[str], partition_transform: List[str],
                          partition_method: str, distribute_by: List[str], localsort_by: List[str],
-                          arrow_cache: bool) -> None:
+                          arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None:


Should the connector check for constrains of the values provided here? For instance, a negative or 0 value of check_interval might lead to errors or accidental rate limiting by Dremio. (See DC rate limits here)

Will be adding a > 0 constraint to this

simonpannek · 2025-03-18T14:14:14Z

tests/functional/adapter/dremio_specific/test_reflections.py

            "name_reflection_from_alias.sql": name_reflection_from_alias_model,
            "name_reflection_from_filename.sql": name_reflection_from_filename_model,
+            "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection,
+            "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection,


I realize testing depend is not that easy, but there must be a way. Is it possible to create a reflection that will never be finish materializing and then try to depend on it?

The idea of depend is that it will be stuck at the job running that reflection until it is complete. Even if we could make a reflection that would never materialize we'd just be stuck waiting for it forever

My idea would've been to assert that the function hasn't returned after a few seconds and then killing the thread. But I don't know how complex we want this to get

Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.

That would be the wait strategy where the customer would set the maximum time they'd want to wait. Should we remove depend in that case? Or have it act as wait but throw an error instead of just skipping?

simonpannek · 2025-03-18T14:14:41Z

tests/functional/adapter/dremio_specific/test_reflections.py

+
+    def testWaitStrategyTimeoutReflection(self, project):
+        (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"])
+        assert "did not become available within 1 seconds, skipping wait" in log_output


Can we get a positive test case for the wait strategy as well?

We could but I was avoiding adding it as a reflection going through is already tested by all the other reflection tests + it has the potential to become flaky if it takes longer than expected to be materialized

That's fair

Without a proper mocks these tests could be all eventually flaky

howareyouman

We should polish depend strategy somehow before the release (or at least create a ticket for it)

howareyouman · 2025-03-19T16:20:48Z

CHANGELOG.md

+
+- Added 3 strategies for reflections: `trigger`, `wait` and `depend`
+  - `trigger` keeps the previous behavior for reflections, where the job to create them is triggered and then dbt-dremio moves on to other models
+  - `wait` will wait for the reflection to be created before moving on to other models, up to a `max_wait_time` timeout


Could you please add here what will happen after max_wait_time is over.

howareyouman · 2025-03-19T16:23:34Z

dbt/adapters/dremio/connections.py

                          computations: List[str], partition_by: List[str], partition_transform: List[str],
                          partition_method: str, distribute_by: List[str], localsort_by: List[str],
-                          arrow_cache: bool) -> None:
+                          arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None:


Should we define the default values here or in the template? What do you think?

They are being defined in the reflection materialization, reflection.sql has the following lines:

{%- set max_wait_time = config.get('max_wait_time', validator=validation.any[int]) or 30 -%} {%- set check_interval = config.get('check_interval', validator=validation.any[int]) or 5 -%}

This is on pair with how we do it in other situations

howareyouman · 2025-03-19T16:24:45Z

dbt/adapters/dremio/connections.py

-            rest_client.create_reflection(payload)
+            created_reflection = rest_client.create_reflection(payload)
+
+        if reflection_strategy == "wait":


Let's use match pattern here. And put all possible values somewhere in enum

Updated. I saw a validation in the Jinja, let's add match here at least. If/elif seems like a bad design pattern

Not exactly sure if I'm seeing what you're suggesting here?

howareyouman · 2025-03-19T16:26:25Z

dbt/adapters/dremio/connections.py

+        elif reflection_strategy == "depend":
+            reflection_id = created_reflection["id"]
+
+            while True:


Should we in this case give some state of the reflection - how much time left? What is the status? Is everything ok?
Because while true without any status update will be painful for customers.

Maybe we could show the status -> combinedStatus that is given by the API response to our call. Docs here

We could ask the question to our reflections team about it

I've reached out to them 👍

howareyouman · 2025-03-19T16:29:55Z

tests/functional/adapter/dremio_specific/test_reflections.py

            "name_reflection_from_alias.sql": name_reflection_from_alias_model,
            "name_reflection_from_filename.sql": name_reflection_from_filename_model,
+            "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection,
+            "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection,


Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.

howareyouman · 2025-03-19T16:30:45Z

tests/functional/adapter/dremio_specific/test_reflections.py

+
+    def testWaitStrategyTimeoutReflection(self, project):
+        (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"])
+        assert "did not become available within 1 seconds, skipping wait" in log_output


Without a proper mocks these tests could be all eventually flaky

bcmeireles added the enhancement New feature or request label Mar 14, 2025

bcmeireles self-assigned this Mar 14, 2025

bcmeireles linked an issue Mar 14, 2025 that may be closed by this pull request

Add depend strategy for reflections #184

Open

Add depend strategy for reflections

1e86f83

bcmeireles force-pushed the reflections-wait branch from 3fd8f64 to 1e86f83 Compare March 14, 2025 14:43

fix infinite test

7db5ad5

bcmeireles force-pushed the reflections-wait branch from ee838d9 to 7db5ad5 Compare March 14, 2025 15:13

bcmeireles added 2 commits March 14, 2025 15:30

test fix

381ea0c

remove useless import

773b392

bcmeireles marked this pull request as ready for review March 14, 2025 18:06

bcmeireles requested review from 99Lys, howareyouman and simonpannek March 14, 2025 18:06

99Lys reviewed Mar 17, 2025

View reviewed changes

tests/functional/adapter/dremio_specific/test_reflections.py Show resolved Hide resolved

changelog & feedback

6ff2ddf

bcmeireles changed the title ~~Add depend strategy for reflections~~ Add strategies to reflections Mar 17, 2025

99Lys approved these changes Mar 17, 2025

View reviewed changes

simonpannek reviewed Mar 18, 2025

View reviewed changes

howareyouman reviewed Mar 19, 2025

View reviewed changes

bcmeireles marked this pull request as draft March 21, 2025 10:26

Conversation

bcmeireles commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

Test Results

Changelog

Related Issue

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

howareyouman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

bcmeireles commented Mar 14, 2025 •

edited

Loading