-
Notifications
You must be signed in to change notification settings - Fork 36
Add strategies to reflections #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -326,7 +326,7 @@ def create_reflection(self, name: str, reflection_type: str, anchor: DremioRelat | |
| date_dimensions: List[str], measures: List[str], | ||
| computations: List[str], partition_by: List[str], partition_transform: List[str], | ||
| partition_method: str, distribute_by: List[str], localsort_by: List[str], | ||
| arrow_cache: bool) -> None: | ||
| arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we define the default values here or in the template? What do you think?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They are being defined in the {%- set max_wait_time = config.get('max_wait_time', validator=validation.any[int]) or 30 -%}
{%- set check_interval = config.get('check_interval', validator=validation.any[int]) or 5 -%}This is on pair with how we do it in other situations |
||
| thread_connection = self.get_thread_connection() | ||
| connection = self.open(thread_connection) | ||
| rest_client = connection.handle.get_client() | ||
|
|
@@ -357,13 +357,41 @@ def create_reflection(self, name: str, reflection_type: str, anchor: DremioRelat | |
| if reflection.get("name") == name: | ||
| logger.debug(f"Reflection {name} already exists. Updating it") | ||
| payload["tag"] = reflection.get("tag") | ||
| rest_client.update_reflection(reflection.get("id"), payload) | ||
| created_reflection = rest_client.update_reflection(reflection.get("id"), payload) | ||
| updated = True | ||
| break | ||
|
|
||
| if not updated: | ||
| logger.debug(f"Reflection {name} does not exist. Creating it") | ||
| rest_client.create_reflection(payload) | ||
| created_reflection = rest_client.create_reflection(payload) | ||
|
|
||
| if reflection_strategy == "wait": | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's use match pattern here. And put all possible values somewhere in enum Updated. I saw a validation in the Jinja, let's add match here at least. If/elif seems like a bad design pattern
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not exactly sure if I'm seeing what you're suggesting here? |
||
| reflection_id = created_reflection["id"] | ||
| start_time = time.time() | ||
| end_time = start_time + max_wait_time | ||
|
|
||
| while time.time() < end_time: | ||
| reflection_info = rest_client.get_reflection(reflection_id) | ||
| status = reflection_info["status"]["availability"] | ||
|
|
||
| if status == "AVAILABLE": | ||
| return | ||
|
|
||
| time.sleep(check_interval) | ||
|
|
||
| logger.info(f"Reflection {name} did not become available within {max_wait_time} seconds, skipping wait") | ||
|
|
||
| elif reflection_strategy == "depend": | ||
| reflection_id = created_reflection["id"] | ||
|
|
||
| while True: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we in this case give some state of the reflection - how much time left? What is the status? Is everything ok?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could show the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could ask the question to our reflections team about it
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've reached out to them 👍 |
||
| reflection_info = rest_client.get_reflection(reflection_id) | ||
| status = reflection_info["status"]["availability"] | ||
|
|
||
| if status == "AVAILABLE": | ||
| return | ||
|
|
||
| time.sleep(check_interval) | ||
|
|
||
| def _make_new_space_json(self, name) -> json: | ||
| python_dict = {"entityType": "space", "name": name} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,9 +3,10 @@ | |
| from dbt.adapters.dremio import DremioCredentials | ||
| from dbt.adapters.dremio.api.parameters import ParametersBuilder | ||
| from dbt.adapters.dremio.api.authentication import DremioAuthentication | ||
| from dbt.tests.util import run_dbt | ||
| from dbt.tests.util import run_dbt, run_dbt_and_capture | ||
| from pydantic.experimental.pipeline import transform | ||
|
|
||
|
|
||
| view1_model = """ | ||
| SELECT IncidntNum, Category, Descript, DayOfWeek, TO_DATE("SF_incidents2016.json"."Date", 'YYYY-MM-DD', 1) AS "Date", "SF_incidents2016.json"."Time" AS "Time", PdDistrict, Resolution, Address, X, Y, Location, PdId | ||
| FROM Samples."samples.dremio.com"."SF_incidents2016.json" AS "SF_incidents2016.json"; | ||
|
|
@@ -141,6 +142,25 @@ | |
| -- depends_on: {{ ref('view1') }} | ||
| """ | ||
|
|
||
| wait_strategy_timeout_reflection = """ | ||
| {{ config(alias='This will mock a timeout when using wait', | ||
| materialized='reflection', | ||
| reflection_strategy='wait', | ||
| max_wait_time=1, | ||
| display=['Date', 'DayOfWeek', 'PdDistrict', 'Category'], | ||
| reflection_type='raw')}} | ||
| -- depends_on: {{ ref('view1') }} | ||
| """ | ||
|
|
||
| trigger_strategy_timeout_reflection = """ | ||
| {{ config(alias='This will mock a timeout when using trigger', | ||
| materialized='reflection', | ||
| reflection_strategy='trigger', | ||
| max_wait_time=1, | ||
| display=['Date', 'DayOfWeek', 'PdDistrict', 'Category'], | ||
| reflection_type='raw')}} | ||
| -- depends_on: {{ ref('view1') }} | ||
| """ | ||
|
|
||
| class TestReflectionsDremio: | ||
| @pytest.fixture(scope="class") | ||
|
|
@@ -170,6 +190,8 @@ def models(self): | |
| "default_displays_model.sql": default_displays_model, | ||
| "name_reflection_from_alias.sql": name_reflection_from_alias_model, | ||
| "name_reflection_from_filename.sql": name_reflection_from_filename_model, | ||
| "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection, | ||
| "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I realize testing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea of
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My idea would've been to assert that the function hasn't returned after a few seconds and then killing the thread. But I don't know how complex we want this to get
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would be the |
||
| } | ||
|
|
||
| def _create_path_list(self, database, schema): | ||
|
|
@@ -407,3 +429,11 @@ def testNameReflectionFromFilename(self, project, client): | |
| assert "partitionFields" not in reflection | ||
| assert "sortFields" not in reflection | ||
| assert reflection["partitionDistributionStrategy"] == "STRIPED" | ||
|
|
||
| def testWaitStrategyTimeoutReflection(self, project): | ||
| (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"]) | ||
| assert "did not become available within 1 seconds, skipping wait" in log_output | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we get a positive test case for the wait strategy as well?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could but I was avoiding adding it as a reflection going through is already tested by all the other reflection tests + it has the potential to become flaky if it takes longer than expected to be materialized
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's fair
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without a proper mocks these tests could be all eventually flaky |
||
|
|
||
| def testTriggerStrategyTimeoutReflection(self, project): | ||
bcmeireles marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "trigger_strategy_timeout_reflection"]) | ||
| assert "did not become available within 1 seconds, skipping wait" not in log_output | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the connector check for constrains of the values provided here? For instance, a negative or 0 value of
check_intervalmight lead to errors or accidental rate limiting by Dremio. (See DC rate limits here)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be adding a
> 0constraint to this