-
Notifications
You must be signed in to change notification settings - Fork 0
Permission denied when extracting .apkg to /tmp/pixell_packages - agent crashes on startup #2
Copy link
Copy link
Open
Description
Summary
Agent deployment succeeds but the agent process crashes immediately with a permission error when trying to load the extracted package from /tmp/pixell_packages/.
Environment
- PAR Version: Running on EC2 instance i-09dcb7f387166efd0
- Deployment Mode: EC2 multi-agent (supervisor-based)
- Affected Agent: ed8784f3-b602-481c-8701-3b6406c8fd98
- Package: paf-core-agent@1.0.0
Symptoms
- Agent is deployed via supervisor API successfully
- Agent process starts but crashes within 1 second
- Process becomes zombie (PID 383582 shows as
<defunct>) - No service listening on allocated ports (60001, 63001, 65001)
- ALB health checks fail with
StatusCode.UNAVAILABLE - Client connections get
unavailableerror
Root Cause
When PAR extracts .apkg files to /tmp/pixell_packages/{package_name}@{version}, the directory is created with incorrect permissions that prevent the agent's Linux user from accessing it.
Evidence from Agent Logs
{"event": "Starting three-surface runtime", "ports": {"rest": 63001, "a2a": 60001, "ui": 65001}}
{"event": "Loading agent package", "path": "/var/lib/pixell/packages/5de43fee45d0e06f.apkg"}
{"event": "Loading package"}
{"error": "[Errno 13] Permission denied: '/tmp/pixell_packages/paf-core-agent@1.0.0'"}
{"error": "Failed to load package: [Errno 13] Permission denied: '/tmp/pixell_packages/paf-core-agent@1.0.0'"}
{"event": "Shutting down three-surface runtime"}
{"event": "Runtime shutdown complete"}Directory Permissions Investigation
Working agent directory:
drwx------. 8 agent_8c82966883524dad_4906eeb7 root 280 Oct 17 13:59 vivid-commenter@1.0.2
The failing agent's directory was likely created with wrong ownership (e.g., owned by root instead of the agent's Linux user).
Process State
# Agent shows as running in supervisor
curl http://localhost:9000/agents
# {"agent_app_id": "ed8784f3-b602-481c-8701-3b6406c8fd98", "status": "running", "pid": 383582, ...}
# But the process is actually a zombie
ps aux | grep 383582
# agent_8+ 383582 0.0 0.0 0 0 ? Z 00:18 0:00 [python3.11] <defunct>
# No service listening on ports
netstat -tlnp | grep -E ':(60001|63001)'
# (no output - ports not listening)Expected Behavior
- PAR supervisor should extract
.apkgto a directory owned by the agent's Linux user - Directory permissions should allow the agent process (running as that Linux user) to read the extracted files
- Agent should start successfully and listen on allocated ports
Suggested Fix
In the package extraction code (likely in supervisor or runtime initialization):
# After extracting .apkg to /tmp/pixell_packages/{package_name}@{version}/
import shutil
import os
extract_path = f"/tmp/pixell_packages/{package_name}@{version}"
# Ensure directory is owned by the agent's Linux user
shutil.chown(extract_path, user=linux_user, group=linux_user)
# Or recursively for all extracted files
for root, dirs, files in os.walk(extract_path):
shutil.chown(root, user=linux_user, group=linux_user)
for d in dirs:
shutil.chown(os.path.join(root, d), user=linux_user, group=linux_user)
for f in files:
shutil.chown(os.path.join(root, f), user=linux_user, group=linux_user)Impact
- Severity: High - Blocks agent deployment entirely
- Workaround: Redeploy and hope the permissions are correct next time (unreliable)
- Affected: Random deployments may fail due to this race condition or permission issue
Additional Context
- Working agents on the same EC2 instance have proper permissions
- This appears to be either a race condition or an issue with how the supervisor creates the extraction directory
- The supervisor should also be enhanced to detect zombie processes and clean them up
Related Files
- Package extraction logic in PAR
- Supervisor deployment endpoint (
POST /agents) - Agent bootstrap/initialization code
Environment Info:
- EC2 Instance: i-09dcb7f387166efd0 (10.0.1.37)
- Supervisor PID: 352196
- Agent PID (zombie): 383582
- Agent User: agent_8c82966883524dad_5pwbelmv
- Log File: /var/lib/pixell/logs/agent_ed8784f3-b602-481c-8701-3b6406c8fd98.log
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels