-
Notifications
You must be signed in to change notification settings - Fork 145
Description
Problem
We noticed unusually high Redis memory usage on our internal Railway instance - 2 GB for a deployment serving only ~20 users with ~6M data points.
Investigation revealed that 22 orphaned *.reply.celery.pidbox keys were responsible for nearly all of it, each holding ~112 MB of base64-encoded Apple Health upload payloads that were never cleaned up.
These keys are created by Celery's remote control mechanism (pidbox). Normally workers delete them after reading the reply, but if a worker crashes or restarts, the keys remain forever with TTL -1.
Reproduction steps
Check overall memory usage:
redis-cli -u $REDIS_URL INFO memory | grep used_memory_human
# used_memory_human:1.97G
Find large keys using a Lua scan (keys > 100KB with size in MB):
redis-cli -u $REDIS_URL EVAL "
local cursor = '0'
local big = {}
local total = 0
repeat
local result = redis.call('SCAN', cursor, 'COUNT', 500)
cursor = result[1]
for i, key in ipairs(result[2]) do
local mem = redis.call('MEMORY', 'USAGE', key)
if mem > 100000 then
local mb = string.format('%.1f', mem / 1048576)
total = total + mem
table.insert(big, key .. ' | ' .. mb .. ' MB | ttl:' .. redis.call('TTL', key))
end
end
until cursor == '0'
table.insert(big, 'TOTAL: ' .. string.format('%.1f', total / 1048576) .. ' MB across ' .. #big .. ' keys')
return big
" 0
Results on our Railway instance - 22 pidbox keys, all with ttl:-1:
0822df03-...reply.celery.pidbox | 112.0 MB | ttl:-1
b6d13c71-...reply.celery.pidbox | 112.0 MB | ttl:-1
67c5f659-...reply.celery.pidbox | 32.0 MB | ttl:-1
... (22 keys total, TOTAL: 2015.8 MB)
All other keys (5,268 celery-task-meta-* + 243 garmin:*) total < 5 MB combined.
Root cause
control_queue_ttl and control_queue_expires are not configured in backend/app/integrations/celery/core.py, so pidbox reply keys have no automatic expiry.
Fix
- Add to
celery_app.conf.update()incore.py:control_queue_ttl=300, control_queue_expires=300,
- One-time cleanup of existing keys:
redis-cli -u $REDIS_URL EVAL " local cursor = '0' local deleted = 0 repeat local result = redis.call('SCAN', cursor, 'MATCH', '*.reply.celery.pidbox', 'COUNT', 100) cursor = result[1] for i, key in ipairs(result[2]) do redis.call('DEL', key) deleted = deleted + 1 end until cursor == '0' return 'Deleted ' .. deleted .. ' keys' " 0