-
-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Bug Report
Disclosure: AI used in log collection and report creation.
Problem
The warrior container crashes with exec /usr/local/bin/python: resource temporarily unavailable and KeyboardInterrupt exceptions when wget processes accumulate too many open file descriptors.
Environment
- OS: Fedora (latest)
- Docker: docker-compose
- System Resources: 135GB RAM, 24 CPU cores (resources not the issue)
- Project: goo-gl archiving
Symptoms
- wget-at processes accumulate 400+ file descriptors
- Container becomes unable to spawn new Python processes
- Repeated crashes and restart loops
systemd: Failed to allocate manager object: Too many open fileserrors
Example Process State Before Crash
PID FD_COUNT COMMAND
513163 413 wget-at
514537 91 wget-at
Detailed Logs
Initial file descriptor exhaustion detection:
# lsof command showing stuck processes
sudo lsof | awk '{print $2}' | sort | uniq -c | sort -nr | head -10
513163 413 wget-at
514537 91 wget-at
17252 228 gnome-shell
15637 219 dbus-broker
20350 218 firefoxSystem limits at time of failure:
# System file descriptor usage
cat /proc/sys/fs/file-nr
20326 0 9223372036854775807
# User process limits
ulimit -n
1024Systemd error message:
Broadcast message from systemd-journald@fedora-desktop (Sun 2025-08-10 01:04:00 CDT):
systemd[511381]: Failed to allocate manager object: Too many open files
wget-at process details:
ps -fp 513163 514537
UID PID PPID C STIME TTY STAT TIME CMD
ryana 514537 485560 3 01:05 ? S 0:01 /home/warrior/data/wget-at -U Mozilla/5.0 (X11; Linux i686; rv:124.0) Gecko/20100101 Firefox/124.0 -nv --no-cookies --host-lookups dns --hosts-file /dev/null --resolvconf-file /dev/null --dns-servers 9.9.9.10,149.112.112.10,2620:fe::10,2620:fe::fe:10 --reject-reserved-subnets --prefer-family IPv4 --content-on-error --lua-script goo-gl.lua -o /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/wget.log --no-check-certificate --output-document /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/wget.tmp --truncate-output -e robots=off --recursive --level=inf --no-parent --page-requisites --timeout 30 --connect-timeout 1 --tries inf --domains goo.gl --span-hosts --waitretry 30 --warc-file /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/goo-gl-142acab49cc5bab54a4d8745ca7f36c4de15460c-20250810-060548 --warc-header operator: Archive Team --warc-header x-wget-at-project-version: 20250805.03 --warc-header x-wget-at-project-name: goo-gl --warc-dedup-url-agnostic --warc-compression-use-zstd --warc-zstd-dict-no-include --warc-zstd-dict /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/zstdictComplete docker-compose logs during crash:
archiveTeamWarrior | 2025-08-10 03:33:41,415 - seesaw.warrior - DEBUG - Check project has update goo-gl
archiveTeamWarrior | 2025-08-10 03:33:41,419 - seesaw.warrior - DEBUG - git fetch
archiveTeamWarrior | 2025-08-10 03:33:41,887 - seesaw.warrior - DEBUG - False
archiveTeamWarrior | 2025-08-10 03:33:41,963 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior | 2025-08-10 03:43:41,374 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior | 2025-08-10 03:43:41,374 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior | 2025-08-10 03:43:41,941 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior | 2025-08-10 03:53:41,374 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior | 2025-08-10 03:53:41,375 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior | 2025-08-10 03:53:41,965 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior | 2025-08-10 05:53:41,375 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior | 2025-08-10 05:53:41,375 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior | 2025-08-10 05:53:41,986 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior | Traceback (most recent call last):
archiveTeamWarrior | File "/home/warrior/start.py", line 23, in <module>
archiveTeamWarrior | subprocess.run([
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 507, in run
archiveTeamWarrior | stdout, stderr = process.communicate(input, timeout=timeout)
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1126, in communicate
archiveTeamWarrior | self.wait()
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1189, in wait
archiveTeamWarrior | return self._wait(timeout=timeout)
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1933, in _wait
archiveTeamWarrior | (pid, sts) = self._try_wait(0)
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1891, in _try_wait
archiveTeamWarrior | (pid, sts) = os.waitpid(self.pid, wait_flags)
archiveTeamWarrior | KeyboardInterrupt
archiveTeamWarrior | 2025-08-10 05:54:50,011 - root - INFO - Logging to /home/warrior/data/warrior.log
archiveTeamWarrior | 2025-08-10 05:54:50,017 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior | 2025-08-10 05:54:50,017 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior | 2025-08-10 05:54:50,532 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior | 2025-08-10 05:54:50,532 - seesaw.warrior - DEBUG - Start selected project goo-gl (reinstall=False)
archiveTeamWarrior | 2025-08-10 05:54:50,532 - seesaw.warrior - DEBUG - Install project goo-gl
archiveTeamWarrior | 2025-08-10 05:54:50,537 - seesaw.warrior - DEBUG - git pull from https://github.com/ArchiveTeam/goo-gl-grab
archiveTeamWarrior | 2025-08-10 05:54:51,018 - seesaw.warrior - DEBUG - git operation: Already up to date.
archiveTeamWarrior |
archiveTeamWarrior | 2025-08-10 05:54:51,019 - seesaw.warrior - DEBUG - Install complete Already up to date.
archiveTeamWarrior |
archiveTeamWarrior | 2025-08-10 05:54:51,020 - seesaw.warrior - DEBUG - Result of the install process: True
archiveTeamWarrior | 2025-08-10 05:54:51,020 - seesaw.warrior - DEBUG - Clone project goo-gl /home/warrior/projects/goo-gl
archiveTeamWarrior | 2025-08-10 05:54:51,026 - seesaw.warrior - DEBUG - Cloning version c6a188f
archiveTeamWarrior | 2025-08-10 05:54:51,027 - seesaw.warrior - DEBUG - Load pipeline /home/warrior/data/projects/goo-gl-c6a188f/pipeline.py
archiveTeamWarrior | 2025-08-10 05:54:51,027 - seesaw.warrior - DEBUG - Pipeline has been read. Begin ConfigValue collection
archiveTeamWarrior | 2025-08-10 05:54:51,028 - seesaw.warrior - DEBUG - Executing pipeline
archiveTeamWarrior | 2025-08-10 05:54:51,275 - seesaw.warrior - DEBUG - Stopped ConfigValue collecting
archiveTeamWarrior | 2025-08-10 05:54:51,277 - seesaw.warrior - INFO - Project goo-gl installed
archiveTeamWarrior | Traceback (most recent call last):
archiveTeamWarrior | File "/home/warrior/start.py", line 23, in <module>
archiveTeamWarrior | subprocess.run([
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 507, in run
archiveTeamWarrior | stdout, stderr = process.communicate(input, timeout=timeout)
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1126, in communicate
archiveTeamWarrior | self.wait()
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1189, in wait
archiveTeamWarrior | return self._wait(timeout=timeout)
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1933, in _wait
archiveTeamWarrior | (pid, sts) = self._try_wait(0)
archiveTeamWarrior | File "/usr/local/lib/python3.9/subprocess.py", line 1891, in _try_wait
archiveTeamWarrior | (pid, sts) = os.waitpid(self.pid, wait_flags)
archiveTeamWarrior | KeyboardInterrupt
archiveTeamWarrior | exec /usr/local/bin/python: resource temporarily unavailable
archiveTeamWarrior | exec /usr/local/bin/python: resource temporarily unavailable
archiveTeamWarrior exited with code 0
archiveTeamWarrior exited with code 0
archiveTeamWarrior | exec /usr/local/bin/python: resource temporarily unavailable
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
Docker container status during crash:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3ff773a44a03 atdr.meo.ws/archiveteam/warrior-dockerfile "python start.py" 6 months ago Up 12 minutes (healthy) 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp archiveTeamWarriorlsof output showing file descriptor usage by wget-at:
sudo lsof -p 514537 | head -20
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
wget-at 514537 ryana cwd DIR 0,94 130 2232285198 /home/warrior/data/projects/goo-gl-c6a188f
wget-at 514537 ryana rtd DIR 0,94 32 27824540 /
wget-at 514537 ryana txt REG 0,94 1790000 27821535 /usr/local/bin/wget-lua
wget-at 514537 ryana mem REG 0,94 27028 27807598 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
wget-at 514537 ryana mem REG 0,94 78768 27821685 /usr/local/lib/lua/5.1/utf8.so
wget-at 514537 ryana mem REG 0,94 43048 27821671 /usr/local/lib/lua/5.1/cjson.so
wget-at 514537 ryana mem REG 0,94 102848 27821684 /usr/local/lib/lua/5.1/ssl.so
wget-at 514537 ryana mem REG 0,94 21496 27821679 /usr/local/lib/lua/5.1/mime/core.so
wget-at 514537 ryana mem REG 0,94 79808 27821681 /usr/local/lib/lua/5.1/socket/core.soRoot Cause
The container lacks proper resource limits, particularly:
- File descriptor limits (
ulimit -n) - Process limits (
ulimit -u) - Memory limits for runaway processes
Suggested Fix
The dockerfile or documentation should include recommended docker-compose resource limits:
archiveTeamWarrior:
# ... existing config ...
mem_limit: 4g
ulimits:
nofile:
soft: 4096
hard: 8192
nproc: 1024
environment:
- WGET_ARGS=--timeout=30 --tries=3Workaround
Adding resource limits to docker-compose.yml resolves the issue. The container runs stably with appropriate ulimits.
Impact
- Affects long-running archival tasks
- Can cause data loss if WARC files are incomplete
- Wastes computational resources on crash/restart loops
- May affect other containers on the same host due to file descriptor exhaustion
Proposed Solutions
- Add default resource limits to the official docker-compose examples
- Document recommended limits in README
- Add health checks to detect stuck wget processes
- Implement cleanup logic for hung wget processes
- Add monitoring for file descriptor usage
Disclosure: This issue was identified and documented through a collaborative debugging session between a human user and Claude Sonnet 4 (Anthropic AI). The logs and technical details are authentic system output. The analysis, suggested solutions, and issue formatting received AI assistance. I used AI because the AI knows more than I do. It was not laziness, rather trust.
Bug Closure: Maybe this is a one-off error. This is just what I noticed on my PC. You have a lot going on. Feel free to close this issue with close/skip.
Warm regards,
Ryan