abstract out dbt deps/clean steps in orchestration by Ishankoradia · Pull Request #1324 · DalgoT4D/DDP_backend

Ishankoradia · 2026-04-30T16:19:22Z

Summary by CodeRabbit

New Features
- dbt-clean and dbt-deps tasks are now automatically managed and injected into pipelines containing DBT tasks, reducing manual configuration.
Chores
- Added a management utility to backfill auto-managed tasks in existing pipelines, with optional dry-run mode for preview.

coderabbitai · 2026-04-30T16:19:30Z

Warning

Rate limit exceeded

@Ishankoradia has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 41 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8dcdf56f-3ef2-464a-95c6-851e5ca95d31

📥 Commits

Reviewing files that changed from the base of the PR and between 77b9674 and dd0a2fd.

📒 Files selected for processing (1)

ddpui/management/commands/migrate_org_queue.py

Walkthrough

This PR automates the management of DBT cleanup tasks (dbt-clean and dbt-deps) within pipeline orchestration. These tasks are now filtered from user-specified transform pipelines, automatically created if missing, and injected before CLI DBT tasks during pipeline construction. A backfill management command enables retroactive updates to existing pipelines.

Changes

Cohort / File(s)	Summary
Constants & Defaults `ddpui/utils/constants.py`	Removed `TASK_DBTCLEAN` and `TASK_DBTDEPS` from the default transform pipeline task list, designating them as auto-managed rather than user-configurable.
API Task Retrieval `ddpui/api/orgtask_api.py`	Extended `get_prefect_transformation_tasks` to filter auto-managed DBT task slugs (`dbt-clean`, `dbt-deps`) when `exclude_git=True`, alongside existing git task exclusion.
Core Orchestration `ddpui/core/orchestrate/pipeline_service.py`	Modified `PipelineService` to filter out auto-managed DBT tasks from user inputs, auto-inject `dbt-clean` and `dbt-deps` tasks before CLI DBT tasks, and added helper methods (`_get_or_create_dbt_clean_orgtask`, `_get_or_create_dbt_deps_orgtask`) to retrieve or create required org tasks.
Backfill Management Command `ddpui/management/commands/backfill_auto_managed_tasks.py`	New Django command to scan existing `orchestrate` pipelines per org, identify those with transform tasks, and update pipelines missing auto-managed `dbt-clean` or `dbt-deps` steps, with dry-run support and schedule reset logic.
API Tests `ddpui/tests/api_tests/test_pipeline_api.py`	Updated test fixtures and assertions to exclude `dbt-clean` and `dbt-deps` from manual transform task specifications, verify auto-injection of these tasks, and adjust sequence counts to account for additional auto-managed tasks.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant API as API Layer
    participant PipelineService as PipelineService
    participant OrgTaskDB as OrgTask DB
    participant Prefect as Prefect Backend
    
    User->>API: Submit pipeline with user transform tasks (exclude dbt-clean/deps)
    API->>PipelineService: Update pipeline with transform tasks
    PipelineService->>PipelineService: Filter out auto-managed DBT tasks
    PipelineService->>OrgTaskDB: Check for existing dbt-clean task
    alt dbt-clean missing
        OrgTaskDB-->>PipelineService: Not found
        PipelineService->>OrgTaskDB: Create dbt-clean task
    else dbt-clean exists
        OrgTaskDB-->>PipelineService: Return existing task
    end
    PipelineService->>OrgTaskDB: Check for existing dbt-deps task
    alt dbt-deps missing
        OrgTaskDB-->>PipelineService: Not found
        PipelineService->>OrgTaskDB: Create dbt-deps task
    else dbt-deps exists
        OrgTaskDB-->>PipelineService: Return existing task
    end
    PipelineService->>PipelineService: Build task sequence: [dbt-clean, dbt-deps, ...user DBT tasks]
    PipelineService->>Prefect: Deploy updated pipeline
    Prefect-->>API: Deployment confirmation
    API-->>User: Success response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: ui4t git integration #1206: Modifies task slug handling and auto-managed DBT task identifiers in the same API layer, directly relevant to task filtering logic.
Remote infra git clone #1274: Alters orchestration constants and pipeline construction to manage system-controlled transform tasks, overlapping with this PR's auto-managed task approach.

Suggested reviewers

himanshudube97
siddhant3030

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: abstracting dbt-clean and dbt-deps steps by making them auto-managed instead of user-included tasks in orchestration pipelines.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch abstract-out-dbt-deps-clean-from-orchestration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 20 minutes and 41 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sentry · 2026-04-30T16:22:32Z

Codecov Report

❌ Patch coverage is 82.92683% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.42%. Comparing base (a31afa7) to head (dd0a2fd).

Files with missing lines	Patch %	Lines
ddpui/core/orchestrate/pipeline_service.py	84.21%	6 Missing ⚠️
ddpui/api/orgtask_api.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1324      +/-   ##
==========================================
+ Coverage   58.37%   58.42%   +0.05%     
==========================================
  Files         132      132              
  Lines       15615    15652      +37     
==========================================
+ Hits         9115     9145      +30     
- Misses       6500     6507       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ddpui/core/orchestrate/pipeline_service.py (1)
172-185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persisting auto-managed DBT tasks here leaks hidden UUIDs into pipeline details.

all_orgtasks is later remapped into DataflowOrgTask, and PipelineService.get_pipeline_details() still returns every DBT/DBTCLOUD mapping as transformTasks. After this change, edit/read responses will include dbt-clean/dbt-deps UUIDs even though ddpui/api/orgtask_api.py:get_prefect_transformation_tasks(..., exclude_git=True) no longer exposes them. That makes the pipeline-details payload non-round-trippable for the task picker and can break edit flows. Please filter the auto-managed slugs back out when serializing transformTasks.
Possible follow-up outside this range
@@
-        transform_tasks = [
+        auto_managed_task_slugs = {TASK_DBTCLEAN, TASK_DBTDEPS}
+        transform_tasks = [
             {"uuid": dataflow_orgtask.orgtask.uuid, "seq": dataflow_orgtask.seq}
             for dataflow_orgtask in DataflowOrgTask.objects.filter(
                 dataflow=org_data_flow,
                 orgtask__task__type__in=[TaskType.DBT, TaskType.DBTCLOUD],
             )
+            .exclude(orgtask__task__slug__in=auto_managed_task_slugs)
             .all()
             .order_by("seq")
         ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ddpui/core/orchestrate/pipeline_service.py` around lines 172 - 185, The
pipeline is currently including auto-managed DBT tasks (from
auto_managed_dbt_orgtasks / all_orgtasks) into map_org_tasks so
PipelineService.get_pipeline_details() serializes those hidden UUIDs into
transformTasks; fix by filtering out auto-managed DBT tasks when preparing the
mapping/serialization: when you assign map_org_tasks (or right before
serializing transformTasks in PipelineService.get_pipeline_details), remove any
org tasks that match the auto-managed DBT slugs (e.g., "dbt-clean","dbt-deps")
or have the auto-managed marker used in DataflowOrgTask (or compare against the
auto_managed_dbt_orgtasks list) so that transformTasks only includes
user-visible tasks and the payload remains round-trippable with
orgtask_api.get_prefect_transformation_tasks(..., exclude_git=True).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ddpui/management/commands/backfill_auto_managed_tasks.py`:
- Around line 158-164: The current sequence toggles the Prefect schedule
inactive→active via PipelineService.set_pipeline_schedule(org,
dataflow.deployment_id, "inactive") and then "active" which can leave the
pipeline paused if the second call fails; modify the block so that after making
the schedule inactive you attempt to set it back to "active" inside a guarded
retry/restore path (try/except/finally): call
PipelineService.set_pipeline_schedule(...) to reactivate and on any exception
immediately log the error with context (include dataflow.deployment_id and org),
perform a best-effort retry or a compensating call to re-enable the schedule
(e.g., another set_pipeline_schedule(..., "active") attempt) and surface failure
but avoid leaving the schedule disabled; ensure this logic is colocated with the
existing update_pipeline() success flow so update_pipeline() remains committed
and schedules are restored on failure.
- Around line 90-103: The current skip logic checks only for TASK_DBTCLEAN and
TASK_DBTDEPS using DataflowOrgTask and will skip pipelines that still lack the
auto-managed git step; update the condition so we only skip when dbt-clean,
dbt-deps AND the auto-managed git mapping are present. Concretely, augment the
exists check (DataflowOrgTask.objects.filter(...).exists()) to also verify the
git/orgtask mapping (or the relevant git task slug) is present, or alternatively
remove the early continue and always call
PipelineService.update_pipeline(dataflow, ...) for pipelines that have dbt-clean
and/or dbt-deps so update_pipeline can add the missing git step; reference
DataflowOrgTask, TASK_DBTCLEAN, TASK_DBTDEPS and PipelineService.update_pipeline
when making the change.

---

Outside diff comments:
In `@ddpui/core/orchestrate/pipeline_service.py`:
- Around line 172-185: The pipeline is currently including auto-managed DBT
tasks (from auto_managed_dbt_orgtasks / all_orgtasks) into map_org_tasks so
PipelineService.get_pipeline_details() serializes those hidden UUIDs into
transformTasks; fix by filtering out auto-managed DBT tasks when preparing the
mapping/serialization: when you assign map_org_tasks (or right before
serializing transformTasks in PipelineService.get_pipeline_details), remove any
org tasks that match the auto-managed DBT slugs (e.g., "dbt-clean","dbt-deps")
or have the auto-managed marker used in DataflowOrgTask (or compare against the
auto_managed_dbt_orgtasks list) so that transformTasks only includes
user-visible tasks and the payload remains round-trippable with
orgtask_api.get_prefect_transformation_tasks(..., exclude_git=True).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c343216c-0561-4214-b10e-3ac332b846d7

📥 Commits

Reviewing files that changed from the base of the PR and between a31afa7 and 77b9674.

📒 Files selected for processing (5)

ddpui/api/orgtask_api.py
ddpui/core/orchestrate/pipeline_service.py
ddpui/management/commands/backfill_auto_managed_tasks.py
ddpui/tests/api_tests/test_pipeline_api.py
ddpui/utils/constants.py

💤 Files with no reviewable changes (1)

ddpui/utils/constants.py

coderabbitai · 2026-05-01T04:47:55Z

+            # Check if dbt-clean and dbt-deps are already present
+            has_dbt_clean = DataflowOrgTask.objects.filter(
+                dataflow=dataflow, orgtask__task__slug=TASK_DBTCLEAN
+            ).exists()
+            has_dbt_deps = DataflowOrgTask.objects.filter(
+                dataflow=dataflow, orgtask__task__slug=TASK_DBTDEPS
+            ).exists()
+
+            if has_dbt_clean and has_dbt_deps:
+                self.stdout.write(
+                    f"  → Skipping {dataflow.deployment_name} (already has dbt-clean and dbt-deps)"
+                )
+                skipped += 1
+                continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This skip condition misses pipelines that only lack the auto-managed git step.

A legacy CLI DBT pipeline that already has dbt-clean/dbt-deps but is missing its git pull/clone mapping will be skipped here, even though PipelineService.update_pipeline() would add that missing step. That leaves part of the fleet unbackfilled.

Suggested fix

has_dbt_clean = DataflowOrgTask.objects.filter( dataflow=dataflow, orgtask__task__slug=TASK_DBTCLEAN ).exists() has_dbt_deps = DataflowOrgTask.objects.filter( dataflow=dataflow, orgtask__task__slug=TASK_DBTDEPS ).exists() + has_git_task = DataflowOrgTask.objects.filter( + dataflow=dataflow, orgtask__task__type=TaskType.GIT + ).exists() - if has_dbt_clean and has_dbt_deps: + if has_git_task and has_dbt_clean and has_dbt_deps: self.stdout.write( - f" → Skipping {dataflow.deployment_name} (already has dbt-clean and dbt-deps)" + f" → Skipping {dataflow.deployment_name} " + f"(already has git + dbt-clean + dbt-deps)" ) skipped += 1 continue missing = [] + if not has_git_task: + missing.append("git") if not has_dbt_clean: missing.append("dbt-clean") if not has_dbt_deps: missing.append("dbt-deps")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ddpui/management/commands/backfill_auto_managed_tasks.py` around lines 90 - 103, The current skip logic checks only for TASK_DBTCLEAN and TASK_DBTDEPS using DataflowOrgTask and will skip pipelines that still lack the auto-managed git step; update the condition so we only skip when dbt-clean, dbt-deps AND the auto-managed git mapping are present. Concretely, augment the exists check (DataflowOrgTask.objects.filter(...).exists()) to also verify the git/orgtask mapping (or the relevant git task slug) is present, or alternatively remove the early continue and always call PipelineService.update_pipeline(dataflow, ...) for pipelines that have dbt-clean and/or dbt-deps so update_pipeline can add the missing git step; reference DataflowOrgTask, TASK_DBTCLEAN, TASK_DBTDEPS and PipelineService.update_pipeline when making the change.

coderabbitai · 2026-05-01T04:47:55Z

+        # Toggle schedule inactive → active to clear pre-scheduled runs.
+        # Prefect schedules runs 1-2 days in advance; those won't pick up the
+        # updated deployment params unless the schedule is reset.
+        # Only do this for pipelines that have an active schedule.
+        if dataflow.cron and pipeline_details.get("isScheduleActive", False):
+            PipelineService.set_pipeline_schedule(org, dataflow.deployment_id, "inactive")
+            PipelineService.set_pipeline_schedule(org, dataflow.deployment_id, "active")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

A failure between inactive and active leaves the pipeline paused.

update_pipeline() has already succeeded by this point. If the second set_pipeline_schedule(..., "active") call errors, the command reports a failure but the schedule stays disabled. This needs a best-effort restore path so a partial backfill does not silently turn off production schedules.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ddpui/management/commands/backfill_auto_managed_tasks.py` around lines 158 - 164, The current sequence toggles the Prefect schedule inactive→active via PipelineService.set_pipeline_schedule(org, dataflow.deployment_id, "inactive") and then "active" which can leave the pipeline paused if the second call fails; modify the block so that after making the schedule inactive you attempt to set it back to "active" inside a guarded retry/restore path (try/except/finally): call PipelineService.set_pipeline_schedule(...) to reactivate and on any exception immediately log the error with context (include dataflow.deployment_id and org), perform a best-effort retry or a compensating call to re-enable the schedule (e.g., another set_pipeline_schedule(..., "active") attempt) and surface failure but avoid leaving the schedule disabled; ensure this logic is colocated with the existing update_pipeline() success flow so update_pipeline() remains committed and schedules are restored on failure.

abstract out dbt deps/clean steps in orchestration

ae6ad7b

script to auto add dalgo managed tasks like git, clean deps

77b9674

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

update migrate org queue command

dd0a2fd

Ishankoradia merged commit 9c957ab into main May 1, 2026
4 of 5 checks passed

Ishankoradia deleted the abstract-out-dbt-deps-clean-from-orchestration branch May 1, 2026 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

abstract out dbt deps/clean steps in orchestration#1324

abstract out dbt deps/clean steps in orchestration#1324
Ishankoradia merged 3 commits intomainfrom
abstract-out-dbt-deps-clean-from-orchestration

Ishankoradia commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

sentry Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ishankoradia commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

sentry Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ishankoradia commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

sentry Bot commented Apr 30, 2026 •

edited

Loading