-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Description
We experienced an intermittent issue with Bull 4.x where jobs are marked as stalled without transitioning to failed, and workers appear to stop processing jobs when concurrency > 1.
The same codebase works correctly with concurrency = 1.
After extensive comparison, the issue was traced to a minor version difference in ioredis, even though the Bull version and Node.js version were identical.
This issue may be relevant for other users running Bull 4.x in production environments (Kubernetes / Docker / Redis clusters).
Environment
Bull: 4.16.5
Node.js: v20.19.0
Observed Behavior
With concurrency = 1
Jobs are processed normally
No stalled events
With concurrency = 3
Jobs enter active
After some time, jobs are marked as stalled
No failed event is emitted
The worker stops processing further jobs ("silent stall")
No application-level errors are logged
This behavior is intermittent but reproducible under load.
Key Finding
Two independent workers, built from the same monorepo and running the same Bull and Node.js versions, exhibited different runtime behavior due to a minor version difference in the Redis client (ioredis).
Worker Bull ioredis
Worker A (stable) 4.16.5 5.8.2
Worker B (unstable) 4.16.5 5.9.0
The issue was not related to application logic, Redis infrastructure, or Bull configuration, but rather to dependency resolution drift.