-
Notifications
You must be signed in to change notification settings - Fork 32
Labels
a:celery-librarya:storageissue related to storage serviceissue related to storage servicebugbuggy, it does not work as expectedbuggy, it does not work as expected
Milestone
Description
Is there an existing issue for this?
- I have searched the existing issues
Which deploy/s?
No response
Current Behavior
While running a "load test" (uploading 100 files through the api-server concurrently) the storage worker started failing. Here are the logs we see in one of the failing containers:
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
INFO: [entrypoint.sh] Entrypoint for stage production ...
INFO: [entrypoint.sh] User :uid=0(root) gid=0(root) groups=0(root)
INFO: [entrypoint.sh] Workdir : /home/scu
INFO: [entrypoint.sh] User : uid=8004(scu) gid=8004(scu) groups=8004(scu),100(users)
INFO: [entrypoint.sh] python : /home/scu/.venv/bin/python
INFO: [entrypoint.sh] pip : /usr/local/bin/pip
INFO: [entrypoint.sh] Starting /bin/sh
services/storage/docker/boot.sh ...
scu rights : uid=8004(scu) gid=8004(scu) groups=8004(scu),100(users)
local dir : total 12
drwx------ 1 scu scu 22 Jun 24 06:13 .
drwxr-xr-x 1 root root 17 Jun 23 17:58 ..
-rw-r--r-- 1 scu scu 220 Jun 23 17:58 .bash_logout
-rw-r--r-- 1 scu scu 3526 Jun 23 17:58 .bashrc
-rw-r--r-- 1 scu scu 807 Jun 23 17:58 .profile
drwxr-xr-x 5 scu scu 125 Jun 24 06:13 .venv
drwxr-xr-x 3 scu scu 21 Jun 24 06:13 services
INFO: [boot.sh] Booting in production mode ...
INFO: [boot.sh] User :uid=8004(scu) gid=8004(scu) groups=8004(scu),100(users)
INFO: [boot.sh] Workdir : /home/scu
INFO: [boot.sh] Log-level app/server: WARNING/warning
log_level=WARNING | log_timestamp=2025-06-24 06:45:16,530 | log_source=servicelib.fastapi.cancellation_middleware:__init__(50) | log_uid=None | log_oec=None| log_trace_id=0 | log_span_id=0 | log_resource.service.name= | log_trace_sampled=False] | log_msg=CancellationMiddleware is in use, in case of client disconection, FastAPI BackgroundTasks will be cancelled too!
[2025-06-24 06:45:16,813: WARNING/MainProcess]
Setting ssl_cert_reqs=CERT_NONE when connecting to redis means that celery will not validate the identity of the redis broker when connecting. This leaves you vulnerable to man in the middle attacks.
-------------- celery@sto-worker-devSim1-1-g2v3tw9t5g59ersqieyzq9cd0 v5.5.2 (immunity)
--- ***** -----
-- ******* ---- Linux-6.8.0-1029-aws-x86_64-with-glibc2.36 2025-06-24 06:45:16
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: __main__:0x734a3f158fd0
- ** ---------- .> transport: amqps://scu:**@b-9e1a2b3b-4b23-4c5d-9da6-135d28d6bdf0.mq.us-east-1.amazonaws.com:5671//
- ** ---------- .> results: rediss://admin:**@master.osparc-dev-redis-replication-group.zesahi.use1.cache.amazonaws.com:6379/9
- *** --- * --- .> concurrency: 100 (thread)
-- ******* ---- .> task events: ON
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
[2025-06-24 06:45:17,606: WARNING/MainProcess]
____ _ __ __ _
/ ___|| |_ ___ _ __ __ _ __ _ ___ \ \ / /__ _ __| | _____ _ __
\___ \| __/ _ \| '__/ _` |/ _` |/ _ \____\ \ /\ / / _ \| '__| |/ / _ \ '__|
___) | || (_) | | | (_| | (_| | __/_____\ V V / (_) | | | < __/ |
|____/ \__\___/|_| \__,_|\__, |\___| \_/\_/ \___/|_| |_|\_\___|_|
|___/ v0.7.0
worker: Warm shutdown (MainProcess)
[2025-06-24 08:16:58,547: ERROR/MainProcess] Signal handler <function on_worker_shutdown at 0x734a40f94400> raised: RuntimeError('This event loop is already running')
Traceback (most recent call last):
File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/home/scu/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/home/scu/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
^^^^^^^^^^^^^^^^
File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 341, in start
blueprint.start(self)
File "/home/scu/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 772, in start
c.loop(*c.loop_args())
File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/loops.py", line 86, in asynloop
state.maybe_shutdown()
File "/home/scu/.venv/lib/python3.11/site-packages/celery/worker/state.py", line 93, in maybe_shutdown
raise WorkerShutdown(should_stop)
celery.exceptions.WorkerShutdown: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/scu/.venv/lib/python3.11/site-packages/celery/utils/dispatch/signal.py", line 280, in send
response = receiver(signal=self, sender=sender, **named)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/scu/.venv/lib/python3.11/site-packages/celery_library/signals.py", line 66, in on_worker_shutdown
app_server.event_loop.run_until_complete(app_server.shutdown())
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 630, in run_until_complete
self._check_running()
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 589, in _check_running
raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
[2025-06-24 08:16:58,845: WARNING/MainProcess] sys:1: RuntimeWarning: coroutine 'BaseAppServer.shutdown' was never awaited
Here is the script I use to reproduce this:
file.zip
One needs to have the api keys in the env and I use seq 100 | parallel -j4 python test.py to run the test
Expected Behavior
No response
Steps To Reproduce
No response
Anything else?
No response
Metadata
Metadata
Assignees
Labels
a:celery-librarya:storageissue related to storage serviceissue related to storage servicebugbuggy, it does not work as expectedbuggy, it does not work as expected