We found the following situation:
- the fastapi-app inside
staging-simcore_staging_sto-worker
was failing the on_startup
event to connect to postgres constantly
- the swarm manager observes that
staging-simcore_staging_sto-worker
is healthy so it does not restart it
- the service
staging-simcore_staging_sto-worker
is in reality not functional since it cannot connect to the database
Suggestions:
- A mechanism that reflects the health of the spawned fastapi-app in the
staging-simcore_staging_sto-worker
healthcheck