Skip to content

Conversation

@jbreeden
Copy link
Contributor

@jbreeden jbreeden commented Jan 27, 2025

Draft for supporting all configuration properties on DBT models (for both the output stream and the pipeline itself)

DEPRECATED: See the new PR here: #45

- Examples are all updated
- Initial resource creation for the examples works
- The change detection & update flow _seems_ to work as it did before
- Still WIP, still hacky, needs test updates
{
"name": "timestamp",
"expression": "`timestamp` - INTERVAL '0.001' SECOND"
output_stream={
Copy link
Contributor

@rob-apella rob-apella Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! looking to align on the suggestion below before reviewing the rest

I would prefer if the config interface were a bit less heavy-handed / verbose. something simple like:

config(
  # same as before
  primary_keys=["col_1", "col_2"],

  # same as before
  watermarks=..., 

  # less verbose
  input_stream_start_positions={"stream_1": "earliest", ...},
  
  # eventually can be replaced with auto-scale up/down
  pipeline_resources={"size": "M", "num_tasks": ...},
)

what do you think of something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's definitely a tradeoff here. We can improve the convenience at the "call site" here for now by jumping over some of the levels in the config and providing top-level fields that configure the things you're interested in here.

My hesitation for that was related to the maintenance concerns I mentioned in the description. If we have this support the same format as our declarative specs, we can automatically support anything the API supports for these spec versions, since we're just shoveling YAML around in the client.

If we go the more convenient route, then we've got to have code in the DBT adapter to plumb anything new of interest through this interface. That's primary a concern for us in maintaining all of our clients as we add new features, but if we're slow to uptake then obviously that affects users like you as well.

I wonder if there is a "have your cake and eat it too" solution, where you could supply a more convenient configuration, but there's always an escape hatch to provide the spec partials directly.

Let me just pause before this becomes a 5-paragraph essay (too late :P) Thoughts on ^?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to capture what we discussed on Zoom (please correct me if I misquote):

In the short term, we're aligned with using the Declarative spec YAML to achieve the feature parity.

There may be opportunities to simplify the interface down the line. We also noted that there may be reasons for consumers have a wrapper for our config anyway, and any conveniences could be baked into that adapter on the consumer side.

If some of those things end up being general & useful enough, we can see about porting it over to the DBT adapter. We'd still aim to keep the escape hatch of specifying generic apply-formatted YAML to ensure all functionality is accessible (if somewhat inconvenient).

@jbreeden jbreeden force-pushed the DE-8397-initial_positions branch from a55c0ed to 9a3cfd9 Compare February 4, 2025 18:22
return response.json()
else:
raise_api_exception(response.status_code, response.json())
assert False, "unreachable"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just squelching some pyright complaints about missing returns here.

output_stream: Dict[str, Any],
) -> bool:
client = self._control_plane_client()
for resource_result in client.apply(self.generate_declarative_yaml(sql, relation, pipeline, output_stream), dry_run=True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I've moved us over to the apply endpoint w/ dry_run=true for change detection.

This moves more logic out of DBT, and will correctly identify any changes anywhere in the spec.

Ultimately I wonder if we can do away with the change detection and just go ahead and apply changes. Right now all changes are a delete & recreate, so I don't want to remove the change detection until we're sure we have a good UX around making in-place edits. Otherwise we'd be deleting the world on every dbt run and, well, I doubt anybody wants that.

@jbreeden
Copy link
Contributor Author

jbreeden commented Feb 5, 2025

Status on this:

Well, first, apologies for the barrage of commits failing CICD yesterday. I'm still working on getting my yellow belt in Python. For some reason the type checker gives me very different results locally as compared to CI. I'm sure it's something I have misconfigured. I'll try to make it happy today/tomorrow (lots of meetings, but I think the end is near).

Anyway, as of yesterday I have the change detection working for the new configuration options. In my testing, it seems to reliably flag changes for any setting I tweak, and none when things are unchanged. This was the main missing piece of this PR to now.

As mentioned above, I think we may be able to eliminate this step altogether, but I'll need some time to vet this.

@rob-apella You could try this out now in a dev environment if you're feeling adventurous. Otherwise I'll reach out again once I get CICD and all of my testing in order. I'd love to hear any feedback you all have on the new interface. If this works well, it might be the last breaking change (functionality wise) that stands in the way of an official v2 release 🤔 Might need the dbt version upgrade as well, I suppose.

Edit: Do to a schedule change I've been on support rotation the past week, and haven't had a chance to get back to this. I should be free to do so by tomorrow.

@jbreeden
Copy link
Contributor Author

Closing in favor of the cleaned-up #45

@jbreeden jbreeden closed this Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants