Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

Copy link
Contributor Author

MasterPtato commented Jan 9, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 9, 2026

PR Review: fix: clear all pending signals after workflow complete

Summary

This PR adds logic to clear all pending signals when a workflow completes, preventing signal leaks that could accumulate over time. The implementation adds a new subspace key variant to query all pending signals for a workflow regardless of signal name.


Code Quality & Best Practices

✅ Strengths:

  1. Good use of structured logging: Properly uses tracing::debug!(count=%pending_signal_cleared_count, ...) following the project's logging conventions
  2. Consistent error handling: Uses Result types and proper error propagation
  3. Transaction safety: Adds conflict key for read consistency (tx.add_conflict_key)
  4. Timeout protection: Implements EARLY_TXN_TIMEOUT (2.5s) to prevent long-running transactions
  5. Metric updates: Properly decrements the SignalPending gauge metric for each cleared signal
  6. Code organization: Clean refactoring of the transaction code with better use of helper methods like tx.read_entry, tx.delete, and tx.write

⚠️ Areas for improvement:

  1. Timeout handling could be more robust (engine/packages/gasoline/src/db/kv/mod.rs:1852-1858):

    • When timeout occurs, the function logs a warning and breaks, but the transaction still commits
    • This means some signals might not be cleared if there are many pending signals
    • Consider: Should this be an error instead? Or should we track which signals were cleared?
  2. Missing documentation: The new workflow_subspace method (engine/packages/gasoline/src/db/kv/keys/workflow.rs:808-810) lacks doc comments explaining its purpose

  3. Metric update pattern (engine/packages/gasoline/src/db/kv/mod.rs:1871-1877):

    • The update_metric call passes Some(...) for previous and None for current to decrement the metric
    • This is intentional but unusual - a comment explaining this pattern would help future readers

Potential Bugs & Issues

🟡 Moderate:

  1. Partial cleanup on timeout: If the timeout is hit, some signals won't be cleared but the workflow still completes. This could lead to:

    • Orphaned signal data in the database
    • Incorrect metric counts
    • Recommendation: Log the workflow_id with the timeout warning for debugging
  2. Unused serde dependency (workflow-worker/Cargo.toml):

    • Added serde dependency but it's not used in the diff
    • Verify this is actually needed or if it was accidentally included

Performance Considerations

✅ Good:

  1. Uses streaming with StreamingMode::WantAll for efficient batch processing
  2. Timeout prevents unbounded transaction duration (2.5s max)
  3. Only logs when signals are actually cleared

⚠️ Concerns:

  1. Large transaction potential: For workflows with many pending signals, this could create a large transaction

    • Each signal: read entry + add conflict key + write ack timestamp + update metric + delete pending signal
    • The 2.5s timeout provides protection, but consider if batching across multiple transactions would be better for very high counts
  2. Sequential processing: Signals are processed one at a time in the loop

    • For most cases this is fine, but for workflows with hundreds of signals it could be slow
    • Current timeout protection mitigates this concern

Security Concerns

✅ No security issues identified

The changes follow existing security patterns:

  • Proper transaction isolation (Serializable)
  • Adds conflict keys for consistency
  • No user input processing

Test Coverage

🔴 Missing test coverage:

The PR adds significant new functionality but doesn't include tests. Strongly recommend adding tests for:

  1. Basic case: Workflow completes with pending signals

    • Create a workflow
    • Send signals that aren't consumed
    • Complete the workflow
    • Verify pending signals are cleared and ack'd
    • Verify metrics updated correctly
  2. Edge case: Workflow completes with no pending signals

    • Verify no errors occur
    • Verify count is 0
  3. Edge case: Many pending signals (if feasible to test timeout)

Suggested test location: engine/packages/gasoline/tests/workflows/signal_test.rs


Additional Observations

Minor items:

  1. Removed unused imports in runner.rs and runner2.rs (metrics) - good cleanup ✅

  2. Log level change in universalpubsub (debug → trace):

    • Changed "attempting to get connection for publish" from debug to trace
    • Seems unrelated to the PR - intentional or accidental inclusion?
  3. Cargo.toml alphabetization in workflow-worker - good cleanup ✅


Recommendations

High Priority:

  1. Add test coverage for pending signal cleanup (at minimum, basic case + no signals case)
  2. ⚠️ Verify serde dependency is actually needed in workflow-worker

Medium Priority:
3. 📝 Add doc comment to workflow_subspace method explaining when to use it
4. 📝 Add inline comment explaining the metric decrement pattern
5. 📊 Include workflow_id in timeout warning log for debugging

Low Priority:
6. 🧹 Clarify if universalpubsub log level change is intentional or should be separate PR


Conclusion

This is a solid bug fix that addresses a real issue (signal leaks). The implementation:

  • ✅ Follows project conventions well
  • ✅ Includes proper error handling and timeout protection
  • ✅ Updates metrics correctly
  • ⚠️ Needs test coverage
  • ⚠️ Has minor documentation gaps

The core logic is sound and the flexible subspace key design is well-thought-out.

Status: ⚠️ Approve with recommendations - please add test coverage before merging. The serde dependency and doc comments can be addressed in follow-up if needed.


Review updated by Claude Code - 2026-01-10

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 9, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3798

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3798

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3798

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3798

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3798

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3798

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3798

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3798

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3798

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3798

commit: 270fad1

@MasterPtato MasterPtato force-pushed the 01-08-fix_clear_all_pending_signals_after_workflow_complete branch from 865be47 to 270fad1 Compare January 10, 2026 02:00
@MasterPtato MasterPtato force-pushed the 01-08-fix_remove_pending_actors_metric_fix_actor_error_tracker_engine_runner_error_print branch from 1ba3107 to 22e1cee Compare January 10, 2026 02:00
@vercel
Copy link

vercel bot commented Jan 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
rivet-cloud Ready Ready Preview, Comment Jan 10, 2026 2:02am
rivetkit-serverless Error Error Jan 10, 2026 2:02am
2 Skipped Deployments
Project Deployment Review Updated (UTC)
rivet-inspector Ignored Ignored Preview Jan 10, 2026 2:02am
rivet-site Ignored Ignored Preview Jan 10, 2026 2:02am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants