Skip to content

Commit 404dff8

Browse files
authored
Missing tips for the docs (#1066)
* docs: note about how to shut down the ui-dev container * docs: troubleshooting tips for RabbitMQ
1 parent 00d71ca commit 404dff8

File tree

2 files changed

+72
-0
lines changed

2 files changed

+72
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,11 @@ Antenna uses [Docker](https://docs.docker.com/get-docker/) & [Docker Compose](ht
4646

4747
# To stream the logs
4848
docker compose logs -f django celeryworker ui-dev
49+
50+
# To stop the ui-dev container, you must specify the profile when running `down` or `stop`
51+
docker compose --profile ui-dev down
52+
# Or!
53+
docker compose --profile "*" down
4954
```
5055
_**Note that this will create a `ui/node_modules` folder if one does not exist yet. This folder is created by the mounting of the `/ui` folder
5156
for the `ui-dev` service, and is written by a `root` user.

docs/WORKER_MONITORING.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,73 @@ Access at: http://localhost:15672
140140

141141
## Troubleshooting Common Issues
142142

143+
### Workers appear connected but tasks don't execute
144+
145+
**Symptoms:**
146+
- Worker logs show "Connected to amqp://..." and "celery@... ready"
147+
- `celery inspect` times out: "No nodes replied within time constraint"
148+
- Flower shows "no workers connected"
149+
- Task publishing hangs indefinitely
150+
- RabbitMQ UI shows connections in "blocked" state
151+
152+
**Possible cause: RabbitMQ Disk Space Alarm**
153+
154+
When RabbitMQ runs low on disk space, it triggers an alarm and **blocks ALL connections** from publishing or consuming. This alarm is not prominently displayed in standard monitoring.
155+
156+
**Diagnosis:**
157+
158+
1. Check RabbitMQ Management UI (http://rabbitmq-server:15672) → Connections tab
159+
- Look for State = "blocked" or "blocking"
160+
161+
2. Check for active alarms on RabbitMQ server:
162+
```bash
163+
rabbitmqctl list_alarms
164+
# Note: "rabbitmqctl status | grep alarms" is unreliable
165+
```
166+
167+
3. Check disk space:
168+
```bash
169+
df -h
170+
```
171+
172+
4. Check RabbitMQ logs:
173+
```bash
174+
journalctl -u rabbitmq-server -n 100 | grep -i "alarm\|block"
175+
```
176+
177+
**Resolution:**
178+
179+
1. Free up disk space on RabbitMQ server
180+
2. Verify alarm cleared: `rabbitmqctl list_alarms`
181+
3. Adjust disk limit if needed: `rabbitmqctl set_disk_free_limit 5GB`
182+
4. Restart RabbitMQ: `systemctl restart rabbitmq-server`
183+
5. Restart workers: `docker compose restart celeryworker`
184+
185+
**Prevention:**
186+
- Monitor disk space on RabbitMQ server (alert at 80% usage)
187+
- Set reasonable disk free limit: `rabbitmqctl set_disk_free_limit 5GB`
188+
- Configure log rotation for RabbitMQ logs
189+
- Purge stale queues regularly (see below)
190+
191+
### Stale worker queues breaking celery inspect
192+
193+
**Symptoms:**
194+
- `celery inspect` times out even after fixing RabbitMQ issues
195+
- Multiple `celery@<old-container-id>.celery.pidbox` queues in RabbitMQ
196+
197+
**Cause:**
198+
Worker restarts create new pidbox control queues but old ones persist. `celery inspect` broadcasts to ALL and waits, timing out on dead workers.
199+
200+
**Resolution:**
201+
1. Go to RabbitMQ Management UI → Queues
202+
2. Delete old `celery@<old-container-id>.celery.pidbox` queues
203+
3. Keep only current worker's pidbox queue
204+
205+
**Alternative:** Target specific worker:
206+
```bash
207+
celery -A config.celery_app inspect stats -d celery@<current-worker-id>
208+
```
209+
143210
### Worker keeps restarting every 100 tasks
144211

145212
**This is normal behavior** with `--max-tasks-per-child=100`.

0 commit comments

Comments
 (0)