Skip to content

Conversation

@jason-lynch
Copy link
Member

@jason-lynch jason-lynch commented Jan 14, 2026

Summary

Updates the remove host workflow to track its status with a host-scoped task. This task is also now returned from the remove-host API endpoint.

Testing

Start the servers from this branch, then:

# resync your api client
cp1-api-sync

# initialize the cluster
cp-init

# stop host-2
cp-docker-exec stop host-2

# remove host-2
cp1-req remove-host host-2

# make sure to include the --force flag if there are any databases on host-2
cp1-req remove-host host-2 --force

# you should get a task ID in the output. now you can try all the host endpoints
cp1-req list-host-tasks host-2
cp1-req get-host-task host-2 <task ID>
cp1-req get-host-task-log host-2 <task ID>

# you should also see the host task in the list-tasks output
cp1-req list-tasks
cp1-req list-tasks --scope host
cp1-req list-tasks --scope host --entity-id host-2

PLAT-347

Summary by CodeRabbit

  • New Features

    • Remove-host API response now includes a task field to track the overall progress of the removal operation.
  • Improvements

    • Enhanced task tracking with parent-child relationships between host removal and database update tasks.
    • Improved error handling and cancellation management during host removal operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The pull request introduces a task-based tracking mechanism for the remove-host API operation. A new Task field is added to RemoveHostResponse, and the RemoveHost workflow is refactored to create a host-level removal task alongside per-database update tasks. The workflow implementation now includes explicit cancellation handling, error recovery, and ordered task state transitions.

Changes

Cohort / File(s) Summary
API Schema
api/apiv1/design/host.go
Added task attribute (type Task) to RemoveHostResponse; updated Required fields to include both "task" and "update_database_tasks".
Changelog
changes/unreleased/Changed-20260114-163406.yaml
New changelog entry documenting that remove-host API response now includes a task field for tracking overall operation progress.
Test Suite
clustertest/cluster_test.go
Added nil-check for resp.Task in Remove; added synchronous wait for host task completion via c.client.WaitForHostTask() before cleanup.
API Handler
server/internal/api/apiv1/post_init_handlers.go
Changed prepareDatabaseUpdates() signature from returning []*workflows.UpdateDatabaseInput to []*database.Spec; inlined database removal validation and refactored to use RemoveHost() result mapping for task and database tasks.
Workflow Orchestration
server/internal/workflows/service.go
Introduced RemoveHostTasks struct to group host task with database tasks; refactored RemoveHost() to create host-level task, per-database tasks with parent linkage, and return multi-task result instead of task list.
Workflow Implementation
server/internal/workflows/remove_host.go
Reworked workflow with deferred cancellation detection, centralized error handler, task state initialization, and new removeHostFromDatabases() helper for parallel database sub-workflow orchestration; removed DatabaseTaskIDs field from RemoveHostInput.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as API Handler
    participant WfService as Workflow Service
    participant Workflow as RemoveHost Workflow
    participant Activity as Activity Layer

    Client->>API: POST /remove-host
    API->>WfService: RemoveHost(hostID, specs...)
    
    WfService->>WfService: Create host task
    WfService->>WfService: Create per-database tasks (linked to host task)
    WfService->>Workflow: Execute RemoveHost workflow
    
    Workflow->>Workflow: UpdateTask (mark started)
    Workflow->>Workflow: logTaskEvent (removal starting)
    
    Workflow->>Workflow: removeHostFromDatabases()
    Workflow->>Workflow: Execute parallel sub-workflows
    Workflow->>Workflow: logTaskEvent per database (success/failure)
    
    Workflow->>Activity: Remove host (activity execution)
    Activity-->>Workflow: Completion
    
    Workflow->>Workflow: UpdateTask (mark complete)
    Workflow->>Workflow: logTaskEvent (removal success)
    
    Workflow-->>WfService: RemoveHostTasks {host task, database tasks}
    WfService-->>API: RemoveHostTasks
    API-->>Client: RemoveHostResponse {task, update_database_tasks}
    
    Client->>API: Poll task status
    API-->>Client: Task status & progress
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hop, hop! With tasks aligned,
The host removal's task is signed,
Each database dances in the flow,
While progress tales do gently grow,
All tracked through workflows, intertwined!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: return host task from remove-host' clearly summarizes the main change: returning a host task from the remove-host workflow/API endpoint.
Description check ✅ Passed The PR description covers the summary, testing steps, and issue reference, though checklist items are not explicitly marked and documentation/breaking changes sections are omitted.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from 4c7d9b7 to f43d9aa Compare January 15, 2026 13:51
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from ff49726 to d2d01f2 Compare January 15, 2026 13:52
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from f43d9aa to 85c8902 Compare January 15, 2026 14:00
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from d2d01f2 to 0e390fb Compare January 15, 2026 14:00
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from 85c8902 to f5b0ba8 Compare January 15, 2026 14:03
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from 0e390fb to a96f2df Compare January 15, 2026 14:04
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from f5b0ba8 to e2c5ecc Compare January 15, 2026 17:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from a96f2df to bfec708 Compare January 15, 2026 17:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch 2 times, most recently from ff7f98c to 57c9324 Compare January 15, 2026 21:50
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from bfec708 to f1e442e Compare January 15, 2026 21:53
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from 57c9324 to bdd4154 Compare January 15, 2026 21:56
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from f1e442e to 0fc3bee Compare January 15, 2026 21:56
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from bdd4154 to e77c094 Compare January 15, 2026 23:03
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from 0fc3bee to b59faae Compare January 15, 2026 23:04
@jason-lynch
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server/internal/workflows/service.go`:
- Around line 447-456: The CreateWorkflowInstance result is ignored so
hostTask.WorkflowInstanceID and hostTask.WorkflowExecutionID are never set;
update the call to s.client.CreateWorkflowInstance (used with
s.workflows.RemoveHost and RemoveHostInput) to capture the returned instance,
then assign instance.InstanceID and instance.ExecutionID (or the actual field
names returned by CreateWorkflowInstance) into hostTask.WorkflowInstanceID and
hostTask.WorkflowExecutionID before returning; mirror the pattern used in
createWorkflow at lines where workflow instance fields are populated so
subsequent abortTasks/status tracking can correlate the hostTask to its
workflow.
🧹 Nitpick comments (4)
server/internal/workflows/service.go (1)

430-431: Consider consolidating abort calls.

The two abortTasks calls could be combined into one to reduce verbosity.

♻️ Suggested refactor
-		s.abortTasks(ctx, hostTask)
-		s.abortTasks(ctx, databaseTasks...)
+		s.abortTasks(ctx, append([]*task.Task{hostTask}, databaseTasks...)...)
server/internal/workflows/remove_host.go (3)

37-49: Consider logging when task update fails in error handler.

Line 46 silently discards the error from updateTask. If this call fails, the task could remain in "started" state while the workflow has actually failed, leading to inconsistent task state that clients may observe.

♻️ Suggested improvement
-		_ = w.updateTask(ctx, logger, updateTaskInput)
+		if updateErr := w.updateTask(ctx, logger, updateTaskInput); updateErr != nil {
+			logger.With("update_error", updateErr).Error("failed to mark task as failed")
+		}

83-85: Consider reordering the "removing host" log message.

The "removing host" log entry is added after all database updates have completed. From a user's perspective, the task log would show:

  1. Database update messages...
  2. "removing host 'X'"
  3. "successfully removed host 'X'"

It might be clearer to log "removing host" before the database updates begin, reflecting the overall operation start rather than just the final host removal activity.


120-121: Consider nil check for Spec field.

dbInput.Spec is a pointer that's dereferenced without a nil check. While the service layer likely validates this, a nil Spec would cause a panic here.

♻️ Defensive nil check
 for i, dbInput := range inputs {
+	if dbInput.Spec == nil {
+		return fmt.Errorf("update database input at index %d has nil spec", i)
+	}
 	logger.Info("creating update database sub-workflow", "database_id", dbInput.Spec.DatabaseID)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e77c094 and b59faae.

⛔ Files ignored due to path filters (7)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
📒 Files selected for processing (6)
  • api/apiv1/design/host.go
  • changes/unreleased/Changed-20260114-163406.yaml
  • clustertest/cluster_test.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/workflows/remove_host.go
  • server/internal/workflows/service.go
🧰 Additional context used
📓 Path-based instructions (5)
api/apiv1/design/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

API endpoints should be defined using Goa's DSL in api/apiv1/design/ with separate files for domain-specific types (database.go, host.go, cluster.go), then regenerated with make -C api generate

Files:

  • api/apiv1/design/host.go
server/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/**/*.go: Use samber/do injector for dependency injection; each package should have a Provide() function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa

Files:

  • server/internal/workflows/service.go
  • server/internal/workflows/remove_host.go
  • server/internal/api/apiv1/post_init_handlers.go
server/internal/workflows/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/internal/workflows/**/*.go: Workflows represent long-running operations and should persist state to etcd for durability and resumability using the cschleiden/go-workflows framework
Workflow definitions should be placed in server/internal/workflows/ and use activities from server/internal/workflows/activities/, registered in the Activities.Register method

Files:

  • server/internal/workflows/service.go
  • server/internal/workflows/remove_host.go
clustertest/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Cluster integration tests should use build tag //go:build cluster_test and use testcontainers for Docker-based test environments

Files:

  • clustertest/cluster_test.go
server/internal/api/apiv1/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Implement generated Goa service interface methods in server/internal/api/apiv1/ after regenerating code

Files:

  • server/internal/api/apiv1/post_init_handlers.go
🧠 Learnings (4)
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to clustertest/**/*.go : Cluster integration tests should use build tag `//go:build cluster_test` and use testcontainers for Docker-based test environments

Applied to files:

  • clustertest/cluster_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework

Applied to files:

  • server/internal/workflows/remove_host.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflow definitions should be placed in `server/internal/workflows/` and use activities from `server/internal/workflows/activities/`, registered in the `Activities.Register` method

Applied to files:

  • server/internal/workflows/remove_host.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/orchestrator/swarm/**/*.go : Docker Swarm integration should use services for Postgres instances, overlay networks for database isolation, and bind mounts for configuration and data directories

Applied to files:

  • server/internal/api/apiv1/post_init_handlers.go
🧬 Code graph analysis (5)
api/apiv1/design/host.go (2)
api/apiv1/design/task.go (1)
  • Task (7-79)
server/internal/task/task.go (1)
  • Task (67-83)
server/internal/workflows/service.go (4)
server/internal/task/task.go (3)
  • Task (67-83)
  • Scope (13-13)
  • Type (24-24)
server/internal/workflows/update_database.go (1)
  • UpdateDatabaseInput (17-22)
server/internal/utils/utils.go (1)
  • HostQueue (139-141)
server/internal/workflows/remove_host.go (1)
  • RemoveHostInput (14-18)
clustertest/cluster_test.go (1)
api/apiv1/gen/control_plane/service.go (3)
  • Task (940-967)
  • GetHostTaskPayload (533-538)
  • Identifier (586-586)
server/internal/workflows/remove_host.go (7)
server/internal/task/task.go (3)
  • ScopeHost (21-21)
  • Scope (13-13)
  • UpdateOptions (165-174)
server/internal/workflows/activities/update_task.go (1)
  • UpdateTaskInput (16-21)
server/internal/task/service.go (1)
  • LogEntry (103-107)
server/internal/workflows/activities/remove_host.go (2)
  • RemoveHostInput (17-19)
  • RemoveHostOutput (21-21)
server/internal/workflows/activities/activities.go (1)
  • Activities (14-19)
server/internal/workflows/update_database.go (2)
  • UpdateDatabaseInput (17-22)
  • UpdateDatabaseOutput (24-26)
server/internal/database/spec.go (1)
  • Spec (252-267)
server/internal/api/apiv1/post_init_handlers.go (5)
api/apiv1/design/host.go (1)
  • RemoveHostResponse (255-264)
api/apiv1/gen/control_plane/service.go (3)
  • RemoveHostResponse (761-766)
  • Task (940-967)
  • Database (307-322)
api/apiv1/design/task.go (1)
  • Task (7-79)
server/internal/task/task.go (1)
  • Task (67-83)
server/internal/database/spec.go (1)
  • Spec (252-267)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (11)
changes/unreleased/Changed-20260114-163406.yaml (1)

1-3: LGTM!

The changelog entry accurately documents the new task field in the remove-host API response for tracking the overall host removal operation.

server/internal/workflows/service.go (2)

404-407: LGTM!

Clean struct definition for grouping the host removal task with its associated database update tasks.


419-441: LGTM!

The loop correctly creates database tasks with proper parent-child linking via ParentID, and constructs UpdateDatabaseInput entries with the host to be removed.

server/internal/api/apiv1/post_init_handlers.go (2)

203-216: LGTM!

The handler correctly delegates task creation to the workflow service and maps the RemoveHostTasks output to the API response.


247-263: Consider rollback if workflow creation fails after database state changes.

Databases are updated to DatabaseStateModifying (line 254) before the workflow is created. If RemoveHost fails downstream, these databases remain in the Modifying state with no rollback.

Other handlers (e.g., UpdateDatabase at lines 388-401) implement rollback on workflow failure. Consider whether similar protection is warranted here, or document why it's acceptable for this flow.

api/apiv1/design/host.go (1)

255-264: LGTM!

The API design correctly adds the task attribute to RemoveHostResponse with appropriate description and required fields. This is an additive, non-breaking change that enables clients to track the overall host removal operation.

clustertest/cluster_test.go (1)

104-112: LGTM!

Good additions:

  • Nil checks provide clear failure messages if the response structure is incorrect
  • WaitForHostTask ensures the test synchronizes with workflow completion before cleaning up local state, preventing race conditions

The test correctly uses the new Task field from the response to track the host removal operation. Based on learnings, the file correctly uses the //go:build cluster_test build tag.

server/internal/workflows/remove_host.go (4)

28-35: LGTM! Good cancellation handling pattern.

Using workflow.NewDisconnectedContext(ctx) for cleanup operations after cancellation is the correct approach with go-workflows, ensuring the cleanup activities can complete even when the parent context is canceled.


51-64: LGTM!

The curried logTaskEvent function with non-fatal error handling is well-documented and appropriate for supplementary logging that shouldn't block the main workflow.


135-148: Early return leaves remaining sub-workflows unobserved.

When a sub-workflow fails, this returns immediately without waiting for or canceling the remaining futures. Those sub-workflows will continue executing independently, potentially leading to partial completion (some databases updated, others not).

If the UpdateDatabase sub-workflows are idempotent, this may be acceptable since a retry would handle incomplete work. However, consider whether you want to:

  1. Wait for all futures and aggregate errors, or
  2. Explicitly cancel remaining sub-workflows on failure

Verify whether partial completion is acceptable for your use case and whether UpdateDatabase sub-workflows are idempotent.


22-26: LGTM! Well-structured workflow implementation.

The workflow correctly follows the go-workflows patterns with proper context handling, structured logging with slog, and clear separation between activities and workflow logic. The task lifecycle management (start → processing → complete/fail) is well-organized.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from b59faae to 9d79fdf Compare January 15, 2026 23:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from e77c094 to 61a0c96 Compare January 16, 2026 14:23
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from 9d79fdf to cd8fd02 Compare January 16, 2026 14:24
Updates the remove host workflow to track its status with a host-scoped
task. This task is also now returned from the remove-host API endpoint.

PLAT-347
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/remove-host-task branch from cd8fd02 to f456234 Compare January 16, 2026 21:51
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/new-task-endpoints branch from 61a0c96 to cf8c511 Compare January 16, 2026 21:51
Copy link
Contributor

@rshoemaker rshoemaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants