Speeding Up Repeated DBT Builds #2123
alexrichey
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When we're building a DBT project and running repeated builds in a schema, we're usually loading the same data, repeating the same calculations, etc. I think we could enable caching, similar to the local development experience with two components.
1. Use state to detect changed models
When developing locally, you can do this:
This will just detect changed models since the last run, then run all downstream models.
The build will output a manifest.json, and we could just store it in the postgres schema for the build as a JSON document, then download it before we kick off the DBT build. However, this misses a very import case: when your input data changes.
2. Detecting changed data
DBT doesn't really concern itself with changed data - it cares about changed models. So firstly... maybe we just don't attempt to solve this? In the build, we could just detect when a new recipe file is pushed, and in that case just don't allow the
--state manifest.jsonBut if we wanted to do this, we could either
Then whenever
my_recipe_datasetchanged, the actual compiled model code would change as well... Combined with manifest.json, this would allow you to target models where its input dataset was changed.Conclusion
It's not a burning issue, and caching is complicated. I think enabling 1) would be straightforward and helpful.
Beta Was this translation helpful? Give feedback.
All reactions