-
Notifications
You must be signed in to change notification settings - Fork 4
WIP: Supports feature parity with pipeline & output stream spec #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Examples are all updated - Initial resource creation for the examples works - The change detection & update flow _seems_ to work as it did before - Still WIP, still hacky, needs test updates
| { | ||
| "name": "timestamp", | ||
| "expression": "`timestamp` - INTERVAL '0.001' SECOND" | ||
| output_stream={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! looking to align on the suggestion below before reviewing the rest
I would prefer if the config interface were a bit less heavy-handed / verbose. something simple like:
config(
# same as before
primary_keys=["col_1", "col_2"],
# same as before
watermarks=...,
# less verbose
input_stream_start_positions={"stream_1": "earliest", ...},
# eventually can be replaced with auto-scale up/down
pipeline_resources={"size": "M", "num_tasks": ...},
)what do you think of something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's definitely a tradeoff here. We can improve the convenience at the "call site" here for now by jumping over some of the levels in the config and providing top-level fields that configure the things you're interested in here.
My hesitation for that was related to the maintenance concerns I mentioned in the description. If we have this support the same format as our declarative specs, we can automatically support anything the API supports for these spec versions, since we're just shoveling YAML around in the client.
If we go the more convenient route, then we've got to have code in the DBT adapter to plumb anything new of interest through this interface. That's primary a concern for us in maintaining all of our clients as we add new features, but if we're slow to uptake then obviously that affects users like you as well.
I wonder if there is a "have your cake and eat it too" solution, where you could supply a more convenient configuration, but there's always an escape hatch to provide the spec partials directly.
Let me just pause before this becomes a 5-paragraph essay (too late :P) Thoughts on ^?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to capture what we discussed on Zoom (please correct me if I misquote):
In the short term, we're aligned with using the Declarative spec YAML to achieve the feature parity.
There may be opportunities to simplify the interface down the line. We also noted that there may be reasons for consumers have a wrapper for our config anyway, and any conveniences could be baked into that adapter on the consumer side.
If some of those things end up being general & useful enough, we can see about porting it over to the DBT adapter. We'd still aim to keep the escape hatch of specifying generic apply-formatted YAML to ensure all functionality is accessible (if somewhat inconvenient).
a55c0ed to
9a3cfd9
Compare
| return response.json() | ||
| else: | ||
| raise_api_exception(response.status_code, response.json()) | ||
| assert False, "unreachable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just squelching some pyright complaints about missing returns here.
dbt/adapters/decodable/impl.py
Outdated
| output_stream: Dict[str, Any], | ||
| ) -> bool: | ||
| client = self._control_plane_client() | ||
| for resource_result in client.apply(self.generate_declarative_yaml(sql, relation, pipeline, output_stream), dry_run=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I've moved us over to the apply endpoint w/ dry_run=true for change detection.
This moves more logic out of DBT, and will correctly identify any changes anywhere in the spec.
Ultimately I wonder if we can do away with the change detection and just go ahead and apply changes. Right now all changes are a delete & recreate, so I don't want to remove the change detection until we're sure we have a good UX around making in-place edits. Otherwise we'd be deleting the world on every dbt run and, well, I doubt anybody wants that.
|
Status on this: Well, first, apologies for the barrage of commits failing CICD yesterday. I'm still working on getting my yellow belt in Python. For some reason the type checker gives me very different results locally as compared to CI. I'm sure it's something I have misconfigured. I'll try to make it happy today/tomorrow (lots of meetings, but I think the end is near). Anyway, as of yesterday I have the change detection working for the new configuration options. In my testing, it seems to reliably flag changes for any setting I tweak, and none when things are unchanged. This was the main missing piece of this PR to now. As mentioned above, I think we may be able to eliminate this step altogether, but I'll need some time to vet this. @rob-apella You could try this out now in a dev environment if you're feeling adventurous. Otherwise I'll reach out again once I get CICD and all of my testing in order. I'd love to hear any feedback you all have on the new interface. If this works well, it might be the last breaking change (functionality wise) that stands in the way of an official v2 release 🤔 Might need the dbt version upgrade as well, I suppose. Edit: Do to a schedule change I've been on support rotation the past week, and haven't had a chance to get back to this. I should be free to do so by tomorrow. |
|
Closing in favor of the cleaned-up #45 |
Draft for supporting all configuration properties on DBT models (for both the output stream and the pipeline itself)
DEPRECATED: See the new PR here: #45