Skip to content

[πŸ› BUG]: Temporal Workflow events are lost after restarting PHP workerΒ #2203

@roxblnfk

Description

@roxblnfk

No duplicates πŸ₯².

  • I have searched for a similar issue in our bug tracker and didn't find any solutions.

What happened?

Workflow events are lost immediately after restarting the PHP worker.

The test for reproduction is in temporalio/roadrunner-temporal#667.
In the RoadRunner tests, I see this error:

{"message":"static_pool_exec: Workers watcher stopped:\n\tstatic_pool_exec:\n\tworker_watcher_get_free_worker","source":"GoSDK","stackTrace":"process event for default [panic]:\ngithub.com/temporalio/roadrunner-temporal/v5/aggregatedpool.(*Workflow).OnWorkflowTaskStarted(0xc0006a40e0, 0x3?)\n\td:/git/temporal/temporalio-roadrunner-temporal/aggregatedpool/workflow.go:312 +0x3e5\ngo.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc000b02798, 0xc000022b40, 0x3?, 0x0)\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_event_handlers.go:1204 +0x30a\ngo.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc000c19c20, 0xc00079be00)\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_task_handlers.go:1197 +0x1a8a\ngo.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc00034e900, 0xc00079be00, 0xc000c19c20, 0xc00102efc0)\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_task_handlers.go:934 +0x59e\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0002ce000, 0xc00079be00)\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_task_pollers.go:402 +0x3db\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0002ce000, {0x17b04a0, 0xc00079be00})\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_task_pollers.go:350 +0x205\ngo.temporal.io/sdk/internal.(*baseWorker).processTaskAsync.func1()\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_worker_base.go:440 +0x12f\ncreated by go.temporal.io/sdk/internal.(*baseWorker).processTaskAsync in goroutine 228\n\tC:/Users/roxbl/go/pkg/mod/go.temporal.io/sdk@v1.35.0/internal/internal_worker_base.go:419 +0x8c","applicationFailureInfo":{"type":"PanicError","nonRetryable":true}}

In the PHP tests, the error is slightly different, but events are also lost:

{
  "message": "BadCancelTimerAttributes: invalid history builder state for action: add-timer-canceled-event, TimerID: 5",
  "serverFailureInfo": {}
}

After some tests, I found that:

  • PHP Worker restart happens
  • The new Worker receives events sent to the old Worker.
  • The PHP SDK responds with Failure: Workflow with the specified run identifier "4bc0a3d6-9ea2-4fc5-b232-823611320f4e" not found (RR_ID is used).
  • The plugin records it as a just Workflow Task Completed event.

During the next replay of the Workflow, the PHP SDK does not send a Failure but sends an event (such as Cancel Timer or Side Effect, which would occur with correct execution of the Workflow Task). This causes a determinism error.

Version (rr --version)

RR 2025.1.2
PHP SDK 2.15.0

How to reproduce the issue?

There is test case: temporalio/roadrunner-temporal#667

Relevant log output

Metadata

Metadata

Assignees

Type

Projects

Status

βœ… Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions