Skip to content

Clarify CICD GitHub Actions for dbt deployment -- possible race condition? #4929

@lauriemerrell

Description

@lauriemerrell

User story / feature request

Please describe your need, outlining the key users, the feature being requested, and the goal that that the feature will facilitate. For example: As a [user or stakeholder type], I want [software feature] so that [some business value]

As an analytics engineer, I want to understand the purpose of the different CI/CD automations that affect the deployment of dbt artifacts and changes.

Specifically, I am confused about these three actions:

Questions:

  1. Why do we need a dbt image? Is that actually used for any deployment?
  2. What is the intended relationship between the second two -- it seems like the locations that dbt is deployed to (calitp-dbt-artifacts bucket) do not correspond to the Terraform code that is deployed (in calitp-composer bucket)?
  3. Why do we run a full refresh as part of CI? Historically, we would not run a full refresh on CI, only manually after merge if it was needed. See for ex: https://github.com/cal-itp/data-infra/actions/runs/22956890726/job/66636950226 -- ran a full refresh of an 8TB model, it was not needed

See notes section for more details on issue that arose with #4894

Acceptance Criteria

Please enter something that can be verified to show that this user story is satisfied. For example: I can join table X with table Y. or Column A appears in table Z in Metabase.

  • Clarify ordering/dependencies -- does something need to be refactored to prevent a race condition here to ensure that the code is deployed as intended?
  • Ensure that the version of dbt on the branch is compiled and deployed to Composer
  • Consider deprecating any step that is not needed
  • Consider stopping full refresh by default

Notes

Please enter any additional information that will facilitate the completion of this ticket. For example: Are there any constraints not mentioned above? Are there any alternatives you have considered?

Issues specifically arose with the merging of #4894.

GitHub Actions:

It seems like there was some kind of race condition or ordering issue. The updated SQL did get added to the deployed warehouse bucket but not the deployed manifest file, and that seemingly resulted in it not getting added to dbt_all. (Which to me is also confusing... It seems wrong that the manifest in calitp-composer/data/warehouse/target does not correspond to what you'd get if you ran dbt compile on the contents of that folder....)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions