-
Notifications
You must be signed in to change notification settings - Fork 18
Description
User story / feature request
Please describe your need, outlining the key users, the feature being requested, and the goal that that the feature will facilitate. For example: As a [user or stakeholder type], I want [software feature] so that [some business value]
As an analytics engineer, I want to understand the purpose of the different CI/CD automations that affect the deployment of dbt artifacts and changes.
Specifically, I am confused about these three actions:
- Build dbt image: https://github.com/cal-itp/data-infra/actions/runs/22956890742/workflow
- Deploy dbt: https://github.com/cal-itp/data-infra/blob/main/.github/workflows/deploy-dbt.yml
- Apply Terraform for DAG and warehouse changes: https://github.com/cal-itp/data-infra/blob/main/.github/workflows/composer-apply-files.yml
Questions:
- Why do we need a dbt image? Is that actually used for any deployment?
- What is the intended relationship between the second two -- it seems like the locations that dbt is deployed to (
calitp-dbt-artifactsbucket) do not correspond to the Terraform code that is deployed (incalitp-composerbucket)? - Why do we run a full refresh as part of CI? Historically, we would not run a full refresh on CI, only manually after merge if it was needed. See for ex: https://github.com/cal-itp/data-infra/actions/runs/22956890726/job/66636950226 -- ran a full refresh of an 8TB model, it was not needed
See notes section for more details on issue that arose with #4894
Acceptance Criteria
Please enter something that can be verified to show that this user story is satisfied. For example: I can join table X with table Y. or Column A appears in table Z in Metabase.
- Clarify ordering/dependencies -- does something need to be refactored to prevent a race condition here to ensure that the code is deployed as intended?
- Ensure that the version of dbt on the branch is compiled and deployed to Composer
- Consider deprecating any step that is not needed
- Consider stopping full refresh by default
Notes
Please enter any additional information that will facilitate the completion of this ticket. For example: Are there any constraints not mentioned above? Are there any alternatives you have considered?
Issues specifically arose with the merging of #4894.
GitHub Actions:
- Deploy dbt - run: https://github.com/cal-itp/data-infra/actions/runs/22956890726, results in this manifest file (in GCS) which correctly contains the new
dim_stop_times_orig - Apply Terraform for DAG and warehouse changes - run: https://github.com/cal-itp/data-infra/actions/runs/22956890753/job/66636106172, results in this manifest (saved in Sharepoint) which does not include the new
stop_times_origfile -- and thendim_stop_times_origwas not reflected indbt_allrun on 3/13.
It seems like there was some kind of race condition or ordering issue. The updated SQL did get added to the deployed warehouse bucket but not the deployed manifest file, and that seemingly resulted in it not getting added to dbt_all. (Which to me is also confusing... It seems wrong that the manifest in calitp-composer/data/warehouse/target does not correspond to what you'd get if you ran dbt compile on the contents of that folder....)