Skip to content

Container crashes with "resource temporarily unavailable" due to file descriptor exhaustion #94

@RyansOpenSourceRice

Description

@RyansOpenSourceRice

Bug Report

Disclosure: AI used in log collection and report creation.

Problem

The warrior container crashes with exec /usr/local/bin/python: resource temporarily unavailable and KeyboardInterrupt exceptions when wget processes accumulate too many open file descriptors.

Environment

  • OS: Fedora (latest)
  • Docker: docker-compose
  • System Resources: 135GB RAM, 24 CPU cores (resources not the issue)
  • Project: goo-gl archiving

Symptoms

  1. wget-at processes accumulate 400+ file descriptors
  2. Container becomes unable to spawn new Python processes
  3. Repeated crashes and restart loops
  4. systemd: Failed to allocate manager object: Too many open files errors

Example Process State Before Crash

PID    FD_COUNT  COMMAND
513163 413       wget-at
514537 91        wget-at

Detailed Logs

Initial file descriptor exhaustion detection:

# lsof command showing stuck processes
sudo lsof | awk '{print $2}' | sort | uniq -c | sort -nr | head -10
513163 413 wget-at
514537 91 wget-at
17252 228 gnome-shell
15637 219 dbus-broker
20350 218 firefox

System limits at time of failure:

# System file descriptor usage
cat /proc/sys/fs/file-nr
20326	0	9223372036854775807

# User process limits
ulimit -n
1024

Systemd error message:

Broadcast message from systemd-journald@fedora-desktop (Sun 2025-08-10 01:04:00 CDT):
systemd[511381]: Failed to allocate manager object: Too many open files

wget-at process details:

ps -fp 513163 514537
UID          PID    PPID  C STIME TTY      STAT   TIME CMD
ryana     514537  485560  3 01:05 ?        S      0:01 /home/warrior/data/wget-at -U Mozilla/5.0 (X11; Linux i686; rv:124.0) Gecko/20100101 Firefox/124.0 -nv --no-cookies --host-lookups dns --hosts-file /dev/null --resolvconf-file /dev/null --dns-servers 9.9.9.10,149.112.112.10,2620:fe::10,2620:fe::fe:10 --reject-reserved-subnets --prefer-family IPv4 --content-on-error --lua-script goo-gl.lua -o /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/wget.log --no-check-certificate --output-document /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/wget.tmp --truncate-output -e robots=off --recursive --level=inf --no-parent --page-requisites --timeout 30 --connect-timeout 1 --tries inf --domains goo.gl --span-hosts --waitretry 30 --warc-file /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/goo-gl-142acab49cc5bab54a4d8745ca7f36c4de15460c-20250810-060548 --warc-header operator: Archive Team --warc-header x-wget-at-project-version: 20250805.03 --warc-header x-wget-at-project-name: goo-gl --warc-dedup-url-agnostic --warc-compression-use-zstd --warc-zstd-dict-no-include --warc-zstd-dict /home/warrior/data/projects/goo-gl-c6a188f/data/1754805948ee818c98753d1677-17/142acab49cc5bab54a4d8745ca7f36c4de15460c/zstdict

Complete docker-compose logs during crash:

archiveTeamWarrior  | 2025-08-10 03:33:41,415 - seesaw.warrior - DEBUG - Check project has update goo-gl
archiveTeamWarrior  | 2025-08-10 03:33:41,419 - seesaw.warrior - DEBUG - git fetch
archiveTeamWarrior  | 2025-08-10 03:33:41,887 - seesaw.warrior - DEBUG - False
archiveTeamWarrior  | 2025-08-10 03:33:41,963 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior  | 2025-08-10 03:43:41,374 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior  | 2025-08-10 03:43:41,374 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior  | 2025-08-10 03:43:41,941 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior  | 2025-08-10 03:53:41,374 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior  | 2025-08-10 03:53:41,375 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior  | 2025-08-10 03:53:41,965 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior  | 2025-08-10 05:53:41,375 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior  | 2025-08-10 05:53:41,375 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior  | 2025-08-10 05:53:41,986 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior  | Traceback (most recent call last):
archiveTeamWarrior  |   File "/home/warrior/start.py", line 23, in <module>
archiveTeamWarrior  |     subprocess.run([
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 507, in run
archiveTeamWarrior  |     stdout, stderr = process.communicate(input, timeout=timeout)
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1126, in communicate
archiveTeamWarrior  |     self.wait()
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1189, in wait
archiveTeamWarrior  |     return self._wait(timeout=timeout)
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1933, in _wait
archiveTeamWarrior  |     (pid, sts) = self._try_wait(0)
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1891, in _try_wait
archiveTeamWarrior  |     (pid, sts) = os.waitpid(self.pid, wait_flags)
archiveTeamWarrior  | KeyboardInterrupt
archiveTeamWarrior  | 2025-08-10 05:54:50,011 - root - INFO - Logging to /home/warrior/data/warrior.log
archiveTeamWarrior  | 2025-08-10 05:54:50,017 - seesaw.warrior - DEBUG - Update warrior hq.
archiveTeamWarrior  | 2025-08-10 05:54:50,017 - seesaw.warrior - DEBUG - Warrior ID ''.
archiveTeamWarrior  | 2025-08-10 05:54:50,532 - seesaw.warrior - DEBUG - Select project goo-gl
archiveTeamWarrior  | 2025-08-10 05:54:50,532 - seesaw.warrior - DEBUG - Start selected project goo-gl (reinstall=False)
archiveTeamWarrior  | 2025-08-10 05:54:50,532 - seesaw.warrior - DEBUG - Install project goo-gl
archiveTeamWarrior  | 2025-08-10 05:54:50,537 - seesaw.warrior - DEBUG - git pull from https://github.com/ArchiveTeam/goo-gl-grab
archiveTeamWarrior  | 2025-08-10 05:54:51,018 - seesaw.warrior - DEBUG - git operation: Already up to date.
archiveTeamWarrior  | 
archiveTeamWarrior  | 2025-08-10 05:54:51,019 - seesaw.warrior - DEBUG - Install complete Already up to date.
archiveTeamWarrior  | 
archiveTeamWarrior  | 2025-08-10 05:54:51,020 - seesaw.warrior - DEBUG - Result of the install process: True
archiveTeamWarrior  | 2025-08-10 05:54:51,020 - seesaw.warrior - DEBUG - Clone project goo-gl /home/warrior/projects/goo-gl
archiveTeamWarrior  | 2025-08-10 05:54:51,026 - seesaw.warrior - DEBUG - Cloning version c6a188f
archiveTeamWarrior  | 2025-08-10 05:54:51,027 - seesaw.warrior - DEBUG - Load pipeline /home/warrior/data/projects/goo-gl-c6a188f/pipeline.py
archiveTeamWarrior  | 2025-08-10 05:54:51,027 - seesaw.warrior - DEBUG - Pipeline has been read. Begin ConfigValue collection
archiveTeamWarrior  | 2025-08-10 05:54:51,028 - seesaw.warrior - DEBUG - Executing pipeline
archiveTeamWarrior  | 2025-08-10 05:54:51,275 - seesaw.warrior - DEBUG - Stopped ConfigValue collecting
archiveTeamWarrior  | 2025-08-10 05:54:51,277 - seesaw.warrior - INFO - Project goo-gl installed
archiveTeamWarrior  | Traceback (most recent call last):
archiveTeamWarrior  |   File "/home/warrior/start.py", line 23, in <module>
archiveTeamWarrior  |     subprocess.run([
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 507, in run
archiveTeamWarrior  |     stdout, stderr = process.communicate(input, timeout=timeout)
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1126, in communicate
archiveTeamWarrior  |     self.wait()
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1189, in wait
archiveTeamWarrior  |     return self._wait(timeout=timeout)
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1933, in _wait
archiveTeamWarrior  |     (pid, sts) = self._try_wait(0)
archiveTeamWarrior  |   File "/usr/local/lib/python3.9/subprocess.py", line 1891, in _try_wait
archiveTeamWarrior  |     (pid, sts) = os.waitpid(self.pid, wait_flags)
archiveTeamWarrior  | KeyboardInterrupt
archiveTeamWarrior  | exec /usr/local/bin/python: resource temporarily unavailable
archiveTeamWarrior  | exec /usr/local/bin/python: resource temporarily unavailable
archiveTeamWarrior exited with code 0
archiveTeamWarrior exited with code 0
archiveTeamWarrior  | exec /usr/local/bin/python: resource temporarily unavailable
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255
archiveTeamWarrior exited with code 255

Docker container status during crash:

docker ps
CONTAINER ID   IMAGE                                       COMMAND           CREATED        STATUS                 PORTS                               NAMES
3ff773a44a03   atdr.meo.ws/archiveteam/warrior-dockerfile   "python start.py"   6 months ago   Up 12 minutes (healthy)   0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   archiveTeamWarrior

lsof output showing file descriptor usage by wget-at:

sudo lsof -p 514537 | head -20
COMMAND    PID  USER   FD   TYPE DEVICE SIZE/OFF       NODE NAME
wget-at 514537 ryana  cwd    DIR   0,94      130 2232285198 /home/warrior/data/projects/goo-gl-c6a188f
wget-at 514537 ryana  rtd    DIR   0,94       32   27824540 /
wget-at 514537 ryana  txt    REG   0,94  1790000   27821535 /usr/local/bin/wget-lua
wget-at 514537 ryana  mem    REG   0,94    27028   27807598 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
wget-at 514537 ryana  mem    REG   0,94    78768   27821685 /usr/local/lib/lua/5.1/utf8.so
wget-at 514537 ryana  mem    REG   0,94    43048   27821671 /usr/local/lib/lua/5.1/cjson.so
wget-at 514537 ryana  mem    REG   0,94   102848   27821684 /usr/local/lib/lua/5.1/ssl.so
wget-at 514537 ryana  mem    REG   0,94    21496   27821679 /usr/local/lib/lua/5.1/mime/core.so
wget-at 514537 ryana  mem    REG   0,94    79808   27821681 /usr/local/lib/lua/5.1/socket/core.so

Root Cause

The container lacks proper resource limits, particularly:

  • File descriptor limits (ulimit -n)
  • Process limits (ulimit -u)
  • Memory limits for runaway processes

Suggested Fix

The dockerfile or documentation should include recommended docker-compose resource limits:

archiveTeamWarrior:
  # ... existing config ...
  mem_limit: 4g
  ulimits:
    nofile:
      soft: 4096
      hard: 8192
    nproc: 1024
  environment:
    - WGET_ARGS=--timeout=30 --tries=3

Workaround

Adding resource limits to docker-compose.yml resolves the issue. The container runs stably with appropriate ulimits.

Impact

  • Affects long-running archival tasks
  • Can cause data loss if WARC files are incomplete
  • Wastes computational resources on crash/restart loops
  • May affect other containers on the same host due to file descriptor exhaustion

Proposed Solutions

  1. Add default resource limits to the official docker-compose examples
  2. Document recommended limits in README
  3. Add health checks to detect stuck wget processes
  4. Implement cleanup logic for hung wget processes
  5. Add monitoring for file descriptor usage

Disclosure: This issue was identified and documented through a collaborative debugging session between a human user and Claude Sonnet 4 (Anthropic AI). The logs and technical details are authentic system output. The analysis, suggested solutions, and issue formatting received AI assistance. I used AI because the AI knows more than I do. It was not laziness, rather trust.

Bug Closure: Maybe this is a one-off error. This is just what I noticed on my PC. You have a lot going on. Feel free to close this issue with close/skip.

Warm regards,
Ryan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions