Skip to content

Commit a148070

Browse files
Copilotmudler
andcommitted
Add Kubernetes security context requirements and troubleshooting docs
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
1 parent 6e1ecc9 commit a148070

File tree

2 files changed

+152
-0
lines changed

2 files changed

+152
-0
lines changed

docs/content/getting-started/kubernetes.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,79 @@ helm show values go-skynet/local-ai > values.yaml
2929

3030
helm install local-ai go-skynet/local-ai -f values.yaml
3131
```
32+
33+
## Security Context Requirements
34+
35+
LocalAI spawns child processes to run model backends (e.g., llama.cpp, diffusers, whisper). To properly stop these processes and free resources like VRAM, LocalAI needs permission to send signals to its child processes.
36+
37+
If you're using restrictive security contexts, ensure the `CAP_KILL` capability is available:
38+
39+
```yaml
40+
apiVersion: v1
41+
kind: Pod
42+
metadata:
43+
name: local-ai
44+
spec:
45+
containers:
46+
- name: local-ai
47+
image: quay.io/go-skynet/local-ai:latest
48+
securityContext:
49+
allowPrivilegeEscalation: false
50+
capabilities:
51+
drop:
52+
- ALL
53+
add:
54+
- KILL # Required for LocalAI to stop backend processes
55+
seccompProfile:
56+
type: RuntimeDefault
57+
runAsNonRoot: true
58+
runAsUser: 1000
59+
```
60+
61+
Without the `KILL` capability, LocalAI cannot terminate backend processes when models are stopped, leading to:
62+
- VRAM and memory not being freed
63+
- Orphaned backend processes holding GPU resources
64+
- Error messages like `error while deleting process error=permission denied`
65+
66+
## Troubleshooting
67+
68+
### Issue: VRAM is not freed when stopping models
69+
70+
**Symptoms:**
71+
- Models appear to stop but GPU memory remains allocated
72+
- Logs show `(deleteProcess) error while deleting process error=permission denied`
73+
- Backend processes remain running after model unload
74+
75+
**Common Causes:**
76+
- All capabilities are dropped without adding back `CAP_KILL`
77+
- Using user namespacing (`hostUsers: false`) with certain configurations
78+
- Overly restrictive seccomp profiles that block signal-related syscalls
79+
- Pod Security Policies or Pod Security Standards blocking required capabilities
80+
81+
**Solution:**
82+
83+
1. Add the `KILL` capability to your container's security context as shown in the example above.
84+
85+
2. If you're using a Helm chart, configure the security context in your `values.yaml`:
86+
87+
```yaml
88+
securityContext:
89+
allowPrivilegeEscalation: false
90+
capabilities:
91+
drop:
92+
- ALL
93+
add:
94+
- KILL
95+
seccompProfile:
96+
type: RuntimeDefault
97+
```
98+
99+
3. Verify the capability is present in the running pod:
100+
101+
```bash
102+
kubectl exec -it <pod-name> -- grep CapEff /proc/1/status
103+
```
104+
105+
4. If running in privileged mode works but the above doesn't, check your cluster's Pod Security Policies or Pod Security Standards. You may need to adjust cluster-level policies to allow the `KILL` capability.
106+
107+
5. Ensure your seccomp profile (if custom) allows the `kill` syscall. The `RuntimeDefault` profile typically includes this.

docs/content/installation/kubernetes.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,79 @@ helm show values go-skynet/local-ai > values.yaml
2929

3030
helm install local-ai go-skynet/local-ai -f values.yaml
3131
```
32+
33+
## Security Context Requirements
34+
35+
LocalAI spawns child processes to run model backends (e.g., llama.cpp, diffusers, whisper). To properly stop these processes and free resources like VRAM, LocalAI needs permission to send signals to its child processes.
36+
37+
If you're using restrictive security contexts, ensure the `CAP_KILL` capability is available:
38+
39+
```yaml
40+
apiVersion: v1
41+
kind: Pod
42+
metadata:
43+
name: local-ai
44+
spec:
45+
containers:
46+
- name: local-ai
47+
image: quay.io/go-skynet/local-ai:latest
48+
securityContext:
49+
allowPrivilegeEscalation: false
50+
capabilities:
51+
drop:
52+
- ALL
53+
add:
54+
- KILL # Required for LocalAI to stop backend processes
55+
seccompProfile:
56+
type: RuntimeDefault
57+
runAsNonRoot: true
58+
runAsUser: 1000
59+
```
60+
61+
Without the `KILL` capability, LocalAI cannot terminate backend processes when models are stopped, leading to:
62+
- VRAM and memory not being freed
63+
- Orphaned backend processes holding GPU resources
64+
- Error messages like `error while deleting process error=permission denied`
65+
66+
## Troubleshooting
67+
68+
### Issue: VRAM is not freed when stopping models
69+
70+
**Symptoms:**
71+
- Models appear to stop but GPU memory remains allocated
72+
- Logs show `(deleteProcess) error while deleting process error=permission denied`
73+
- Backend processes remain running after model unload
74+
75+
**Common Causes:**
76+
- All capabilities are dropped without adding back `CAP_KILL`
77+
- Using user namespacing (`hostUsers: false`) with certain configurations
78+
- Overly restrictive seccomp profiles that block signal-related syscalls
79+
- Pod Security Policies or Pod Security Standards blocking required capabilities
80+
81+
**Solution:**
82+
83+
1. Add the `KILL` capability to your container's security context as shown in the example above.
84+
85+
2. If you're using a Helm chart, configure the security context in your `values.yaml`:
86+
87+
```yaml
88+
securityContext:
89+
allowPrivilegeEscalation: false
90+
capabilities:
91+
drop:
92+
- ALL
93+
add:
94+
- KILL
95+
seccompProfile:
96+
type: RuntimeDefault
97+
```
98+
99+
3. Verify the capability is present in the running pod:
100+
101+
```bash
102+
kubectl exec -it <pod-name> -- grep CapEff /proc/1/status
103+
```
104+
105+
4. If running in privileged mode works but the above doesn't, check your cluster's Pod Security Policies or Pod Security Standards. You may need to adjust cluster-level policies to allow the `KILL` capability.
106+
107+
5. Ensure your seccomp profile (if custom) allows the `kill` syscall. The `RuntimeDefault` profile typically includes this.

0 commit comments

Comments
 (0)