Race condition in Global Concurrency Limits - recommended protection strategy? #20520
Unanswered
oleksandr-ieremchuk
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
Situation
We have multiple long-running workflow deployments for each tenant that must NOT run concurrently due to:
We use Global Concurrency Limits (GCL) with
limit=1to enforce this:Problem
We observed two flows simultaneously acquired slots from the same GCL despite
limit=1.Timeline from PostgreSQL
flow_runtable:When Flow A released its slot at
18:35:56, both Flow B and Flow C (which had been waiting) captured the slot simultaneously, violatinglimit=1.Database evidence:
Analysis
Looking at Prefect source code:
1. Orchestration rules (deployment/task concurrency) - PROTECTED:
bulk_increment_active_slotstest_concurrent_reacquisition_only_one_succeedsvalidate protection2. HTTP API (
/v2/concurrency_limits/increment-with-lease) - NOT PROTECTED:active_slots=0before any commitsSELECT FOR UPDATEor similar lockingQuestions
Is this expected behavior? Should Global Concurrency Limits be considered "best-effort" rather than strict guarantees?
What's the recommended approach for strict enforcement across multiple deployments for the same resource/tenant?
Should we use deployment-level concurrency instead? Our use case: multiple deployment types per tenant, need coordination between them.
Is atomic protection planned for HTTP API-based GCL, or is there architectural reason not to add it?
Beta Was this translation helpful? Give feedback.
All reactions