Skip to content

Storage worker shuts down unexpectedly #7968

@bisgaard-itis

Description

@bisgaard-itis

Is there an existing issue for this?

  • I have searched the existing issues

Which deploy/s?

No response

Current Behavior

While running a "load test" (uploading 100 files through the api-server concurrently) the storage worker started failing. Here are the logs we see in one of the failing containers:

Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
INFO: [entrypoint.sh]  Entrypoint for stage production ...
INFO: [entrypoint.sh]  User :uid=0(root) gid=0(root) groups=0(root)
INFO: [entrypoint.sh]  Workdir : /home/scu
INFO: [entrypoint.sh]  User : uid=8004(scu) gid=8004(scu) groups=8004(scu),100(users)
INFO: [entrypoint.sh]  python : /home/scu/.venv/bin/python
INFO: [entrypoint.sh]  pip : /usr/local/bin/pip
INFO: [entrypoint.sh]  Starting /bin/sh
services/storage/docker/boot.sh ...
  scu rights    : uid=8004(scu) gid=8004(scu) groups=8004(scu),100(users)
  local dir : total 12
drwx------ 1 scu  scu    22 Jun 24 06:13 .
drwxr-xr-x 1 root root   17 Jun 23 17:58 ..
-rw-r--r-- 1 scu  scu   220 Jun 23 17:58 .bash_logout
-rw-r--r-- 1 scu  scu  3526 Jun 23 17:58 .bashrc
-rw-r--r-- 1 scu  scu   807 Jun 23 17:58 .profile
drwxr-xr-x 5 scu  scu   125 Jun 24 06:13 .venv
drwxr-xr-x 3 scu  scu    21 Jun 24 06:13 services
INFO: [boot.sh]  Booting in production mode ...
INFO: [boot.sh]  User :uid=8004(scu) gid=8004(scu) groups=8004(scu),100(users)
INFO: [boot.sh]  Workdir : /home/scu
INFO: [boot.sh]  Log-level app/server: WARNING/warning
log_level=WARNING | log_timestamp=2025-06-24 06:45:16,530 | log_source=servicelib.fastapi.cancellation_middleware:__init__(50) | log_uid=None | log_oec=None| log_trace_id=0 | log_span_id=0 | log_resource.service.name= | log_trace_sampled=False] | log_msg=CancellationMiddleware is in use, in case of client disconection, FastAPI BackgroundTasks will be cancelled too!
[2025-06-24 06:45:16,813: WARNING/MainProcess] 
Setting ssl_cert_reqs=CERT_NONE when connecting to redis means that celery will not validate the identity of the redis broker when connecting. This leaves you vulnerable to man in the middle attacks.
 
 -------------- celery@sto-worker-devSim1-1-g2v3tw9t5g59ersqieyzq9cd0 v5.5.2 (immunity)
--- ***** ----- 
-- ******* ---- Linux-6.8.0-1029-aws-x86_64-with-glibc2.36 2025-06-24 06:45:16
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         __main__:0x734a3f158fd0
- ** ---------- .> transport:   amqps://scu:**@b-9e1a2b3b-4b23-4c5d-9da6-135d28d6bdf0.mq.us-east-1.amazonaws.com:5671//
- ** ---------- .> results:     rediss://admin:**@master.osparc-dev-redis-replication-group.zesahi.use1.cache.amazonaws.com:6379/9
- *** --- * --- .> concurrency: 100 (thread)
-- ******* ---- .> task events: ON
--- ***** ----- 
 -------------- [queues]
                .> default          exchange=default(direct) key=default
                
[2025-06-24 06:45:17,606: WARNING/MainProcess] 
  ____  _                                __        __         _
 / ___|| |_ ___  _ __ __ _  __ _  ___    \ \      / /__  _ __| | _____ _ __
 \___ \| __/ _ \| '__/ _` |/ _` |/ _ \____\ \ /\ / / _ \| '__| |/ / _ \ '__|
  ___) | || (_) | | | (_| | (_| |  __/_____\ V  V / (_) | |  |   <  __/ |
 |____/ \__\___/|_|  \__,_|\__, |\___|      \_/\_/ \___/|_|  |_|\_\___|_|
                           |___/                                                v0.7.0
worker: Warm shutdown (MainProcess)
[2025-06-24 08:16:58,547: ERROR/MainProcess] Signal handler <function on_worker_shutdown at 0x734a40f94400> raised: RuntimeError('This event loop is already running')
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
           ^^^^^^^^^^^^^^^^
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 341, in start
    blueprint.start(self)
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 772, in start
    c.loop(*c.loop_args())
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/loops.py", line 86, in asynloop
    state.maybe_shutdown()
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/state.py", line 93, in maybe_shutdown
    raise WorkerShutdown(should_stop)
celery.exceptions.WorkerShutdown: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.11/site-packages/celery/utils/dispatch/signal.py", line 280, in send
    response = receiver(signal=self, sender=sender, **named)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scu/.venv/lib/python3.11/site-packages/celery_library/signals.py", line 66, in on_worker_shutdown
    app_server.event_loop.run_until_complete(app_server.shutdown())
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 630, in run_until_complete
    self._check_running()
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 589, in _check_running
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
[2025-06-24 08:16:58,845: WARNING/MainProcess] sys:1: RuntimeWarning: coroutine 'BaseAppServer.shutdown' was never awaited

Here is the script I use to reproduce this:
file.zip

One needs to have the api keys in the env and I use seq 100 | parallel -j4 python test.py to run the test

Expected Behavior

No response

Steps To Reproduce

No response

Anything else?

No response

Metadata

Metadata

Labels

a:celery-librarya:storageissue related to storage servicebugbuggy, it does not work as expected

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions