-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
type/regressionRegression from previous behavior (a specific type of bug)Regression from previous behavior (a specific type of bug)
Description
Pre-requisites
- I have double-checked my configuration
- I have tested with the
:latestimage tag (i.e.quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on:latest. If not, I have explained why, in detail, in my description below. - I have searched existing issues and could not find a match for this bug
- I'd like to contribute the fix myself (see contributing guide)
What happened? What did you expect to happen?
Note: Not deterministically reproducible as this is concurrency issue.
Controller sometimes panic due to concurrent map writes when using semaphore. This is suspected to be regression caused by #14321. In this PR, we switched to using read-write mutex and use RLock instead of Lock, but it turns out when we release the lock in release(), we are deleting the key from the map, which is a write operation. This is causing controller to panic and suggestion is to switch back to exclusive lock.
Version(s)
v3.7.3
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: synchronization-tmpl-level-
labels:
workflows.argoproj.io/no-test: "environment"
spec:
entrypoint: synchronization-tmpl-level-example
templates:
- name: synchronization-tmpl-level-example
steps:
- - name: synchronization-acquire-lock
template: acquire-lock
arguments:
parameters:
- name: seconds
value: "{{item}}"
withParam: '["1","2","3","4","5"]'
- name: acquire-lock
synchronization:
semaphores: # v3.6 and after
- configMapKeyRef:
name: my-config
key: template
container:
image: alpine:3.23
command: [sh, -c]
args: ["sleep 10; echo acquired lock"]Logs from the workflow controller
goroutine 447 [running]:
internal/runtime/maps.fatal({0x2fdb7b7?, 0xc006f84a00?})
/usr/local/go/src/runtime/panic.go:1058 +0x18
github.com/argoproj/argo-workflows/v3/workflow/sync.(*prioritySemaphore).release(0xc00044d680, {0xc0380a9180, 0x9b})
/go/src/github.com/argoproj/argo-workflows/workflow/sync/semaphore.go:101 +0x5c
github.com/argoproj/argo-workflows/v3/workflow/sync.(*Manager).Release(0xc0004ff180, {0x3440998, 0xc02aeda810}, 0xc004951688, {0xc03b5056d0?, 0xc005419500?}, 0xc03a7f6000)
/go/src/github.com/argoproj/argo-workflows/workflow/sync/sync_manager.go:478 +0x250
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeTemplate(0xc007a5da40, {0x3440998, 0xc02aeda810}, {0xc01249c460, 0x4d}, {0x3447f60, 0xc009aa8900}, 0xc01704da40, {{0x0, 0x0, ...}, ...}, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2047 +0x4bc8
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeStepGroup(0xc007a5da40, {0x3440998, 0xc02aeda810}, {0xc009aa83c0, 0x1, 0x1}, {0xc03b505f90, 0x42}, 0xc02d843f80)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/steps.go:285 +0x567
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeSteps(0xc007a5da40, {0x3440998, 0xc02aeda810}, {0xc00c2fa000, 0x3f}, 0xc01704da40, {0xc03b505d60, 0x45}, 0xc00541d8c8, {0x3447f60, ...}, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/steps.go:110 +0x685
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeTemplate(0xc007a5da40, {0x3440998, 0xc02aeda810}, {0xc00c2fa000, 0x3f}, {0x3447f60, 0xc007a5de00}, 0xc01704da00, {{0xc035aaaa80, 0x1, ...}, ...}, ...)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2294 +0x3428
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate(0xc007a5da40, {0x3440998, 0xc02aeda810})
/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:368 +0x1d6b
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem(0xc0006eb708, {0x34409d0, 0xc00044cd70})
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:766 +0x728
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker(0xc0006eb708, {0x34409d0, 0xc00044cd70})
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:677 +0x9e
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1({0x34409d0?, 0xc00044cd70?}, 0xc001d0c000?)
/go/pkg/mod/k8s.io/apimachinery@v0.33.1/pkg/util/wait/backoff.go:255 +0x51
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext({0x34409d0, 0xc00044cd70}, 0xc002ae4a20, {0x3404760, 0xc001d0c000}, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.33.1/pkg/util/wait/backoff.go:256 +0xe5
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x34409d0, 0xc00044cd70}, 0xc002ae4a20, 0x3b9aca00, 0x0, 0x1)
/go/pkg/mod/k8s.io/apimachinery@v0.33.1/pkg/util/wait/backoff.go:223 +0x8f
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
/go/pkg/mod/k8s.io/apimachinery@v0.33.1/pkg/util/wait/backoff.go:172
created by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run in goroutine 202
/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:382 +0x19bc
Logs from in your workflow's wait container
N/A
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
type/regressionRegression from previous behavior (a specific type of bug)Regression from previous behavior (a specific type of bug)