Skip to content

abstract out dbt deps/clean steps in orchestration#1324

Merged
Ishankoradia merged 3 commits intomainfrom
abstract-out-dbt-deps-clean-from-orchestration
May 1, 2026
Merged

abstract out dbt deps/clean steps in orchestration#1324
Ishankoradia merged 3 commits intomainfrom
abstract-out-dbt-deps-clean-from-orchestration

Conversation

@Ishankoradia
Copy link
Copy Markdown
Contributor

@Ishankoradia Ishankoradia commented Apr 30, 2026

Summary by CodeRabbit

  • New Features

    • dbt-clean and dbt-deps tasks are now automatically managed and injected into pipelines containing DBT tasks, reducing manual configuration.
  • Chores

    • Added a management utility to backfill auto-managed tasks in existing pipelines, with optional dry-run mode for preview.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Warning

Rate limit exceeded

@Ishankoradia has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 41 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8dcdf56f-3ef2-464a-95c6-851e5ca95d31

📥 Commits

Reviewing files that changed from the base of the PR and between 77b9674 and dd0a2fd.

📒 Files selected for processing (1)
  • ddpui/management/commands/migrate_org_queue.py

Walkthrough

This PR automates the management of DBT cleanup tasks (dbt-clean and dbt-deps) within pipeline orchestration. These tasks are now filtered from user-specified transform pipelines, automatically created if missing, and injected before CLI DBT tasks during pipeline construction. A backfill management command enables retroactive updates to existing pipelines.

Changes

Cohort / File(s) Summary
Constants & Defaults
ddpui/utils/constants.py
Removed TASK_DBTCLEAN and TASK_DBTDEPS from the default transform pipeline task list, designating them as auto-managed rather than user-configurable.
API Task Retrieval
ddpui/api/orgtask_api.py
Extended get_prefect_transformation_tasks to filter auto-managed DBT task slugs (dbt-clean, dbt-deps) when exclude_git=True, alongside existing git task exclusion.
Core Orchestration
ddpui/core/orchestrate/pipeline_service.py
Modified PipelineService to filter out auto-managed DBT tasks from user inputs, auto-inject dbt-clean and dbt-deps tasks before CLI DBT tasks, and added helper methods (_get_or_create_dbt_clean_orgtask, _get_or_create_dbt_deps_orgtask) to retrieve or create required org tasks.
Backfill Management Command
ddpui/management/commands/backfill_auto_managed_tasks.py
New Django command to scan existing orchestrate pipelines per org, identify those with transform tasks, and update pipelines missing auto-managed dbt-clean or dbt-deps steps, with dry-run support and schedule reset logic.
API Tests
ddpui/tests/api_tests/test_pipeline_api.py
Updated test fixtures and assertions to exclude dbt-clean and dbt-deps from manual transform task specifications, verify auto-injection of these tasks, and adjust sequence counts to account for additional auto-managed tasks.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant API as API Layer
    participant PipelineService as PipelineService
    participant OrgTaskDB as OrgTask DB
    participant Prefect as Prefect Backend
    
    User->>API: Submit pipeline with user transform tasks (exclude dbt-clean/deps)
    API->>PipelineService: Update pipeline with transform tasks
    PipelineService->>PipelineService: Filter out auto-managed DBT tasks
    PipelineService->>OrgTaskDB: Check for existing dbt-clean task
    alt dbt-clean missing
        OrgTaskDB-->>PipelineService: Not found
        PipelineService->>OrgTaskDB: Create dbt-clean task
    else dbt-clean exists
        OrgTaskDB-->>PipelineService: Return existing task
    end
    PipelineService->>OrgTaskDB: Check for existing dbt-deps task
    alt dbt-deps missing
        OrgTaskDB-->>PipelineService: Not found
        PipelineService->>OrgTaskDB: Create dbt-deps task
    else dbt-deps exists
        OrgTaskDB-->>PipelineService: Return existing task
    end
    PipelineService->>PipelineService: Build task sequence: [dbt-clean, dbt-deps, ...user DBT tasks]
    PipelineService->>Prefect: Deploy updated pipeline
    Prefect-->>API: Deployment confirmation
    API-->>User: Success response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • feat: ui4t git integration #1206: Modifies task slug handling and auto-managed DBT task identifiers in the same API layer, directly relevant to task filtering logic.
  • Remote infra git clone #1274: Alters orchestration constants and pipeline construction to manage system-controlled transform tasks, overlapping with this PR's auto-managed task approach.

Suggested reviewers

  • himanshudube97
  • siddhant3030
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: abstracting dbt-clean and dbt-deps steps by making them auto-managed instead of user-included tasks in orchestration pipelines.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch abstract-out-dbt-deps-clean-from-orchestration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 20 minutes and 41 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@sentry
Copy link
Copy Markdown

sentry Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 82.92683% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.42%. Comparing base (a31afa7) to head (dd0a2fd).

Files with missing lines Patch % Lines
ddpui/core/orchestrate/pipeline_service.py 84.21% 6 Missing ⚠️
ddpui/api/orgtask_api.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1324      +/-   ##
==========================================
+ Coverage   58.37%   58.42%   +0.05%     
==========================================
  Files         132      132              
  Lines       15615    15652      +37     
==========================================
+ Hits         9115     9145      +30     
- Misses       6500     6507       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ddpui/core/orchestrate/pipeline_service.py (1)

172-185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persisting auto-managed DBT tasks here leaks hidden UUIDs into pipeline details.

all_orgtasks is later remapped into DataflowOrgTask, and PipelineService.get_pipeline_details() still returns every DBT/DBTCLOUD mapping as transformTasks. After this change, edit/read responses will include dbt-clean/dbt-deps UUIDs even though ddpui/api/orgtask_api.py:get_prefect_transformation_tasks(..., exclude_git=True) no longer exposes them. That makes the pipeline-details payload non-round-trippable for the task picker and can break edit flows. Please filter the auto-managed slugs back out when serializing transformTasks.

Possible follow-up outside this range
@@
-        transform_tasks = [
+        auto_managed_task_slugs = {TASK_DBTCLEAN, TASK_DBTDEPS}
+        transform_tasks = [
             {"uuid": dataflow_orgtask.orgtask.uuid, "seq": dataflow_orgtask.seq}
             for dataflow_orgtask in DataflowOrgTask.objects.filter(
                 dataflow=org_data_flow,
                 orgtask__task__type__in=[TaskType.DBT, TaskType.DBTCLOUD],
             )
+            .exclude(orgtask__task__slug__in=auto_managed_task_slugs)
             .all()
             .order_by("seq")
         ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ddpui/core/orchestrate/pipeline_service.py` around lines 172 - 185, The
pipeline is currently including auto-managed DBT tasks (from
auto_managed_dbt_orgtasks / all_orgtasks) into map_org_tasks so
PipelineService.get_pipeline_details() serializes those hidden UUIDs into
transformTasks; fix by filtering out auto-managed DBT tasks when preparing the
mapping/serialization: when you assign map_org_tasks (or right before
serializing transformTasks in PipelineService.get_pipeline_details), remove any
org tasks that match the auto-managed DBT slugs (e.g., "dbt-clean","dbt-deps")
or have the auto-managed marker used in DataflowOrgTask (or compare against the
auto_managed_dbt_orgtasks list) so that transformTasks only includes
user-visible tasks and the payload remains round-trippable with
orgtask_api.get_prefect_transformation_tasks(..., exclude_git=True).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ddpui/management/commands/backfill_auto_managed_tasks.py`:
- Around line 158-164: The current sequence toggles the Prefect schedule
inactive→active via PipelineService.set_pipeline_schedule(org,
dataflow.deployment_id, "inactive") and then "active" which can leave the
pipeline paused if the second call fails; modify the block so that after making
the schedule inactive you attempt to set it back to "active" inside a guarded
retry/restore path (try/except/finally): call
PipelineService.set_pipeline_schedule(...) to reactivate and on any exception
immediately log the error with context (include dataflow.deployment_id and org),
perform a best-effort retry or a compensating call to re-enable the schedule
(e.g., another set_pipeline_schedule(..., "active") attempt) and surface failure
but avoid leaving the schedule disabled; ensure this logic is colocated with the
existing update_pipeline() success flow so update_pipeline() remains committed
and schedules are restored on failure.
- Around line 90-103: The current skip logic checks only for TASK_DBTCLEAN and
TASK_DBTDEPS using DataflowOrgTask and will skip pipelines that still lack the
auto-managed git step; update the condition so we only skip when dbt-clean,
dbt-deps AND the auto-managed git mapping are present. Concretely, augment the
exists check (DataflowOrgTask.objects.filter(...).exists()) to also verify the
git/orgtask mapping (or the relevant git task slug) is present, or alternatively
remove the early continue and always call
PipelineService.update_pipeline(dataflow, ...) for pipelines that have dbt-clean
and/or dbt-deps so update_pipeline can add the missing git step; reference
DataflowOrgTask, TASK_DBTCLEAN, TASK_DBTDEPS and PipelineService.update_pipeline
when making the change.

---

Outside diff comments:
In `@ddpui/core/orchestrate/pipeline_service.py`:
- Around line 172-185: The pipeline is currently including auto-managed DBT
tasks (from auto_managed_dbt_orgtasks / all_orgtasks) into map_org_tasks so
PipelineService.get_pipeline_details() serializes those hidden UUIDs into
transformTasks; fix by filtering out auto-managed DBT tasks when preparing the
mapping/serialization: when you assign map_org_tasks (or right before
serializing transformTasks in PipelineService.get_pipeline_details), remove any
org tasks that match the auto-managed DBT slugs (e.g., "dbt-clean","dbt-deps")
or have the auto-managed marker used in DataflowOrgTask (or compare against the
auto_managed_dbt_orgtasks list) so that transformTasks only includes
user-visible tasks and the payload remains round-trippable with
orgtask_api.get_prefect_transformation_tasks(..., exclude_git=True).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c343216c-0561-4214-b10e-3ac332b846d7

📥 Commits

Reviewing files that changed from the base of the PR and between a31afa7 and 77b9674.

📒 Files selected for processing (5)
  • ddpui/api/orgtask_api.py
  • ddpui/core/orchestrate/pipeline_service.py
  • ddpui/management/commands/backfill_auto_managed_tasks.py
  • ddpui/tests/api_tests/test_pipeline_api.py
  • ddpui/utils/constants.py
💤 Files with no reviewable changes (1)
  • ddpui/utils/constants.py

Comment on lines +90 to +103
# Check if dbt-clean and dbt-deps are already present
has_dbt_clean = DataflowOrgTask.objects.filter(
dataflow=dataflow, orgtask__task__slug=TASK_DBTCLEAN
).exists()
has_dbt_deps = DataflowOrgTask.objects.filter(
dataflow=dataflow, orgtask__task__slug=TASK_DBTDEPS
).exists()

if has_dbt_clean and has_dbt_deps:
self.stdout.write(
f" → Skipping {dataflow.deployment_name} (already has dbt-clean and dbt-deps)"
)
skipped += 1
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This skip condition misses pipelines that only lack the auto-managed git step.

A legacy CLI DBT pipeline that already has dbt-clean/dbt-deps but is missing its git pull/clone mapping will be skipped here, even though PipelineService.update_pipeline() would add that missing step. That leaves part of the fleet unbackfilled.

Suggested fix
             has_dbt_clean = DataflowOrgTask.objects.filter(
                 dataflow=dataflow, orgtask__task__slug=TASK_DBTCLEAN
             ).exists()
             has_dbt_deps = DataflowOrgTask.objects.filter(
                 dataflow=dataflow, orgtask__task__slug=TASK_DBTDEPS
             ).exists()
+            has_git_task = DataflowOrgTask.objects.filter(
+                dataflow=dataflow, orgtask__task__type=TaskType.GIT
+            ).exists()

-            if has_dbt_clean and has_dbt_deps:
+            if has_git_task and has_dbt_clean and has_dbt_deps:
                 self.stdout.write(
-                    f"  → Skipping {dataflow.deployment_name} (already has dbt-clean and dbt-deps)"
+                    f"  → Skipping {dataflow.deployment_name} "
+                    f"(already has git + dbt-clean + dbt-deps)"
                 )
                 skipped += 1
                 continue

             missing = []
+            if not has_git_task:
+                missing.append("git")
             if not has_dbt_clean:
                 missing.append("dbt-clean")
             if not has_dbt_deps:
                 missing.append("dbt-deps")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ddpui/management/commands/backfill_auto_managed_tasks.py` around lines 90 -
103, The current skip logic checks only for TASK_DBTCLEAN and TASK_DBTDEPS using
DataflowOrgTask and will skip pipelines that still lack the auto-managed git
step; update the condition so we only skip when dbt-clean, dbt-deps AND the
auto-managed git mapping are present. Concretely, augment the exists check
(DataflowOrgTask.objects.filter(...).exists()) to also verify the git/orgtask
mapping (or the relevant git task slug) is present, or alternatively remove the
early continue and always call PipelineService.update_pipeline(dataflow, ...)
for pipelines that have dbt-clean and/or dbt-deps so update_pipeline can add the
missing git step; reference DataflowOrgTask, TASK_DBTCLEAN, TASK_DBTDEPS and
PipelineService.update_pipeline when making the change.

Comment on lines +158 to +164
# Toggle schedule inactive → active to clear pre-scheduled runs.
# Prefect schedules runs 1-2 days in advance; those won't pick up the
# updated deployment params unless the schedule is reset.
# Only do this for pipelines that have an active schedule.
if dataflow.cron and pipeline_details.get("isScheduleActive", False):
PipelineService.set_pipeline_schedule(org, dataflow.deployment_id, "inactive")
PipelineService.set_pipeline_schedule(org, dataflow.deployment_id, "active")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

A failure between inactive and active leaves the pipeline paused.

update_pipeline() has already succeeded by this point. If the second set_pipeline_schedule(..., "active") call errors, the command reports a failure but the schedule stays disabled. This needs a best-effort restore path so a partial backfill does not silently turn off production schedules.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ddpui/management/commands/backfill_auto_managed_tasks.py` around lines 158 -
164, The current sequence toggles the Prefect schedule inactive→active via
PipelineService.set_pipeline_schedule(org, dataflow.deployment_id, "inactive")
and then "active" which can leave the pipeline paused if the second call fails;
modify the block so that after making the schedule inactive you attempt to set
it back to "active" inside a guarded retry/restore path (try/except/finally):
call PipelineService.set_pipeline_schedule(...) to reactivate and on any
exception immediately log the error with context (include dataflow.deployment_id
and org), perform a best-effort retry or a compensating call to re-enable the
schedule (e.g., another set_pipeline_schedule(..., "active") attempt) and
surface failure but avoid leaving the schedule disabled; ensure this logic is
colocated with the existing update_pipeline() success flow so update_pipeline()
remains committed and schedules are restored on failure.

@Ishankoradia Ishankoradia merged commit 9c957ab into main May 1, 2026
4 of 5 checks passed
@Ishankoradia Ishankoradia deleted the abstract-out-dbt-deps-clean-from-orchestration branch May 1, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant