Skip to content

Permission denied when extracting .apkg to /tmp/pixell_packages - agent crashes on startup #2

@syumpx

Description

@syumpx

Summary

Agent deployment succeeds but the agent process crashes immediately with a permission error when trying to load the extracted package from /tmp/pixell_packages/.

Environment

  • PAR Version: Running on EC2 instance i-09dcb7f387166efd0
  • Deployment Mode: EC2 multi-agent (supervisor-based)
  • Affected Agent: ed8784f3-b602-481c-8701-3b6406c8fd98
  • Package: paf-core-agent@1.0.0

Symptoms

  1. Agent is deployed via supervisor API successfully
  2. Agent process starts but crashes within 1 second
  3. Process becomes zombie (PID 383582 shows as <defunct>)
  4. No service listening on allocated ports (60001, 63001, 65001)
  5. ALB health checks fail with StatusCode.UNAVAILABLE
  6. Client connections get unavailable error

Root Cause

When PAR extracts .apkg files to /tmp/pixell_packages/{package_name}@{version}, the directory is created with incorrect permissions that prevent the agent's Linux user from accessing it.

Evidence from Agent Logs

{"event": "Starting three-surface runtime", "ports": {"rest": 63001, "a2a": 60001, "ui": 65001}}
{"event": "Loading agent package", "path": "/var/lib/pixell/packages/5de43fee45d0e06f.apkg"}
{"event": "Loading package"}
{"error": "[Errno 13] Permission denied: '/tmp/pixell_packages/paf-core-agent@1.0.0'"}
{"error": "Failed to load package: [Errno 13] Permission denied: '/tmp/pixell_packages/paf-core-agent@1.0.0'"}
{"event": "Shutting down three-surface runtime"}
{"event": "Runtime shutdown complete"}

Directory Permissions Investigation

Working agent directory:

drwx------. 8 agent_8c82966883524dad_4906eeb7 root 280 Oct 17 13:59 vivid-commenter@1.0.2

The failing agent's directory was likely created with wrong ownership (e.g., owned by root instead of the agent's Linux user).

Process State

# Agent shows as running in supervisor
curl http://localhost:9000/agents
# {"agent_app_id": "ed8784f3-b602-481c-8701-3b6406c8fd98", "status": "running", "pid": 383582, ...}

# But the process is actually a zombie
ps aux | grep 383582
# agent_8+ 383582 0.0 0.0 0 0 ? Z 00:18 0:00 [python3.11] <defunct>

# No service listening on ports
netstat -tlnp | grep -E ':(60001|63001)'
# (no output - ports not listening)

Expected Behavior

  1. PAR supervisor should extract .apkg to a directory owned by the agent's Linux user
  2. Directory permissions should allow the agent process (running as that Linux user) to read the extracted files
  3. Agent should start successfully and listen on allocated ports

Suggested Fix

In the package extraction code (likely in supervisor or runtime initialization):

# After extracting .apkg to /tmp/pixell_packages/{package_name}@{version}/
import shutil
import os

extract_path = f"/tmp/pixell_packages/{package_name}@{version}"

# Ensure directory is owned by the agent's Linux user
shutil.chown(extract_path, user=linux_user, group=linux_user)

# Or recursively for all extracted files
for root, dirs, files in os.walk(extract_path):
    shutil.chown(root, user=linux_user, group=linux_user)
    for d in dirs:
        shutil.chown(os.path.join(root, d), user=linux_user, group=linux_user)
    for f in files:
        shutil.chown(os.path.join(root, f), user=linux_user, group=linux_user)

Impact

  • Severity: High - Blocks agent deployment entirely
  • Workaround: Redeploy and hope the permissions are correct next time (unreliable)
  • Affected: Random deployments may fail due to this race condition or permission issue

Additional Context

  • Working agents on the same EC2 instance have proper permissions
  • This appears to be either a race condition or an issue with how the supervisor creates the extraction directory
  • The supervisor should also be enhanced to detect zombie processes and clean them up

Related Files

  • Package extraction logic in PAR
  • Supervisor deployment endpoint (POST /agents)
  • Agent bootstrap/initialization code

Environment Info:

  • EC2 Instance: i-09dcb7f387166efd0 (10.0.1.37)
  • Supervisor PID: 352196
  • Agent PID (zombie): 383582
  • Agent User: agent_8c82966883524dad_5pwbelmv
  • Log File: /var/lib/pixell/logs/agent_ed8784f3-b602-481c-8701-3b6406c8fd98.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions