Conversation
3fd8f64 to
1e86f83
Compare
ee838d9 to
7db5ad5
Compare
| - `strategy`: `trigger`, `wait` or `depend` | ||
| - `max_wait_time`: time in seconds to wait for the reflection to be created before moving on to other models | ||
| - default: `30` | ||
| - `wait_interval`: time in seconds to wait between checks for the reflection to be created |
There was a problem hiding this comment.
I think you updates this to be called check_interval instead
| computations: List[str], partition_by: List[str], partition_transform: List[str], | ||
| partition_method: str, distribute_by: List[str], localsort_by: List[str], | ||
| arrow_cache: bool) -> None: | ||
| arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None: |
There was a problem hiding this comment.
Should the connector check for constrains of the values provided here? For instance, a negative or 0 value of check_interval might lead to errors or accidental rate limiting by Dremio. (See DC rate limits here)
There was a problem hiding this comment.
Will be adding a > 0 constraint to this
| "name_reflection_from_alias.sql": name_reflection_from_alias_model, | ||
| "name_reflection_from_filename.sql": name_reflection_from_filename_model, | ||
| "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection, | ||
| "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection, |
There was a problem hiding this comment.
I realize testing depend is not that easy, but there must be a way. Is it possible to create a reflection that will never be finish materializing and then try to depend on it?
There was a problem hiding this comment.
The idea of depend is that it will be stuck at the job running that reflection until it is complete. Even if we could make a reflection that would never materialize we'd just be stuck waiting for it forever
There was a problem hiding this comment.
My idea would've been to assert that the function hasn't returned after a few seconds and then killing the thread. But I don't know how complex we want this to get
There was a problem hiding this comment.
Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.
There was a problem hiding this comment.
That would be the wait strategy where the customer would set the maximum time they'd want to wait. Should we remove depend in that case? Or have it act as wait but throw an error instead of just skipping?
|
|
||
| def testWaitStrategyTimeoutReflection(self, project): | ||
| (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"]) | ||
| assert "did not become available within 1 seconds, skipping wait" in log_output |
There was a problem hiding this comment.
Can we get a positive test case for the wait strategy as well?
There was a problem hiding this comment.
We could but I was avoiding adding it as a reflection going through is already tested by all the other reflection tests + it has the potential to become flaky if it takes longer than expected to be materialized
There was a problem hiding this comment.
Without a proper mocks these tests could be all eventually flaky
howareyouman
left a comment
There was a problem hiding this comment.
We should polish depend strategy somehow before the release (or at least create a ticket for it)
|
|
||
| - Added 3 strategies for reflections: `trigger`, `wait` and `depend` | ||
| - `trigger` keeps the previous behavior for reflections, where the job to create them is triggered and then dbt-dremio moves on to other models | ||
| - `wait` will wait for the reflection to be created before moving on to other models, up to a `max_wait_time` timeout |
There was a problem hiding this comment.
Could you please add here what will happen after max_wait_time is over.
| computations: List[str], partition_by: List[str], partition_transform: List[str], | ||
| partition_method: str, distribute_by: List[str], localsort_by: List[str], | ||
| arrow_cache: bool) -> None: | ||
| arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None: |
There was a problem hiding this comment.
Should we define the default values here or in the template? What do you think?
There was a problem hiding this comment.
They are being defined in the reflection materialization, reflection.sql has the following lines:
{%- set max_wait_time = config.get('max_wait_time', validator=validation.any[int]) or 30 -%}
{%- set check_interval = config.get('check_interval', validator=validation.any[int]) or 5 -%}This is on pair with how we do it in other situations
| rest_client.create_reflection(payload) | ||
| created_reflection = rest_client.create_reflection(payload) | ||
|
|
||
| if reflection_strategy == "wait": |
There was a problem hiding this comment.
Let's use match pattern here. And put all possible values somewhere in enum
Updated. I saw a validation in the Jinja, let's add match here at least. If/elif seems like a bad design pattern
There was a problem hiding this comment.
Not exactly sure if I'm seeing what you're suggesting here?
| elif reflection_strategy == "depend": | ||
| reflection_id = created_reflection["id"] | ||
|
|
||
| while True: |
There was a problem hiding this comment.
Should we in this case give some state of the reflection - how much time left? What is the status? Is everything ok?
Because while true without any status update will be painful for customers.
There was a problem hiding this comment.
Maybe we could show the status -> combinedStatus that is given by the API response to our call. Docs here
There was a problem hiding this comment.
We could ask the question to our reflections team about it
There was a problem hiding this comment.
I've reached out to them 👍
| "name_reflection_from_alias.sql": name_reflection_from_alias_model, | ||
| "name_reflection_from_filename.sql": name_reflection_from_filename_model, | ||
| "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection, | ||
| "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection, |
There was a problem hiding this comment.
Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.
|
|
||
| def testWaitStrategyTimeoutReflection(self, project): | ||
| (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"]) | ||
| assert "did not become available within 1 seconds, skipping wait" in log_output |
There was a problem hiding this comment.
Without a proper mocks these tests could be all eventually flaky
Summary
Currently when creating a reflection, dbt marks the job as done as soon as it sends the API call to Dremio. This new option allows users to only continue after the reflection is fully created on Dremio's side.
Description
reflection_strategywith 3 valid values:trigger,waitanddepend.triggeris the default and keeps the old behaviorwaitwill try to wait for the reflection to be live, up to a maximum ofmax_wait_timeseconds,30being the default valuedependwill not proceed until the reflection is livemax_wait_time: Maximum time in seconds that thewaitstrategy will wait for the reflection to be ready. Default value is30check_interval: Time between the API calls that check if a reflection is ready. Default value is5Test Results
Added 2 extra tests, one for
triggerand another one forwaitChangelog
Related Issue
Suggested in #184