Skip to content

Add WithLongRunningSteps option for handlers that run for extended periods#26

Open
sosiska wants to merge 1 commit intofloxy-project:mainfrom
sosiska:feature/long-running-steps
Open

Add WithLongRunningSteps option for handlers that run for extended periods#26
sosiska wants to merge 1 commit intofloxy-project:mainfrom
sosiska:feature/long-running-steps

Conversation

@sosiska
Copy link

@sosiska sosiska commented Feb 2, 2026

Problem

When using Floxy with handlers that take a long time to execute (20-80 minutes), the entire ExecuteNext operation runs within a single database transaction. This causes:

  1. Connection pool exhaustion — each worker holds a DB connection for the entire handler duration
  2. Invisible "running" status — other connections see pending until handler completes (PostgreSQL READ COMMITTED)
  3. Scalability issues — with 200+ workers and long handlers, the connection pool is exhausted

Solution

Add WithLongRunningSteps() engine option that splits step execution into separate transactions:

engine := NewEngine(pool, WithLongRunningSteps())

How it works:

  1. Transaction 1: Dequeue step, update status to running, remove from queue, commit
    • Status is immediately visible to other connections
    • DB connection is released
  2. Outside transaction: Execute handler (can take minutes/hours)
  3. Transaction 2: Record result (success/failure), enqueue next steps

Scope:

  • Task steps: use split-transaction mode
  • Non-Task steps (Fork, Join, Condition, SavePoint, Human): use normal single-transaction (they are fast)
  • Compensation steps: use normal execution

Trade-offs

  • If a worker crashes mid-handler, the step remains in running status — users should implement a recovery mechanism
  • Handlers should be idempotent or use StepContext.IdempotencyKey()

Changes

  • engine_opts.go: add WithLongRunningSteps() option
  • engine.go: add executeNextLongRunning() method and longRunningStepContext struct
  • engine_long_running_test.go: tests for the new mode
  • docs/ENGINE_SPEC.md: documentation

Testing

All existing tests pass. Added new tests covering:

  • Empty queue handling
  • Successful task step execution
  • Failed task step with retries exhausted
  • Non-task steps using normal execution
  • Missing handler rescheduling
  • DLQ instance skipping

…riods

This mode splits step execution into separate transactions:
- Transaction 1: dequeue + update to "running" + commit (status immediately visible)
- Handler execution outside transaction (DB connection released)
- Transaction 2: record result

Solves connection pool exhaustion and invisible "running" status for long handlers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant