Comprehensive security guide for running Kimia in production Kubernetes environments.
- Security Architecture Overview
- User Namespace Isolation (Core Security)
- Additional Security Layers
- Pod Security Configuration
- Operational Security
- Verification and Monitoring
- Troubleshooting
- Compliance
- Summary
Kimia provides defense-in-depth security through multiple layers, with user namespace isolation as the core security mechanism.
- Rootless Pod Execution - Runs as UID 1000 (non-root)
- User Namespace Isolation ★ - Root inside → UID 1000 outside (critical boundary)
- Minimal Capabilities - Only SETUID & SETGID
- Seccomp Profile - Filters dangerous system calls
- Network Policies - Restricts egress traffic
Key principle: Even if one layer is breached, others provide protection.
User namespaces are Kimia's primary security mechanism, creating a hard isolation boundary between the container and the host.
The magic: Inside the container, build processes appear to run as root (UID 0), but on the Kubernetes node, they actually run as an unprivileged user (UID 1000).
Kimia maps 65,536 UIDs from the container namespace to subordinate UIDs on the host:
Container Namespace → Host Reality
──────────────────── ────────────
UID 0 (root) → UID 1000 (kimia user)
UID 1-65535 → UID 100000-165535
Step-by-step:
- Pod Start (T0): Kubernetes starts the pod as UID 1000
- Namespace Creation (T1):
newuidmap/newgidmapestablish mappings - Build Starts (T2): Process sees itself as root, but host sees UID 1000
- Isolated Execution (T3): Complete security boundary in effect
# Check if enabled
cat /proc/sys/user/max_user_namespaces
# Should return > 0 (e.g., 15000)
# Enable on each node
sudo sysctl -w user.max_user_namespaces=255851
# Persist across reboots
echo "user.max_user_namespaces=15000" | sudo tee -a /etc/sysctl.conf# /etc/subuid on Kubernetes nodes
kimia:100000:65536
# /etc/subgid on Kubernetes nodes
kimia:100000:65536# These binaries must have SETUID bit
ls -l /usr/bin/newuidmap
-rwsr-xr-x 1 root root 54096 /usr/bin/newuidmap
^
SETUID bit (allows UID mapping)User namespaces create a hard security boundary:
| Threat Scenario | Without User Namespaces | With Kimia (User Namespaces) |
|---|---|---|
| Container escape | 🔴 Root access to host | 🟢 UID 1000 only |
| Privilege escalation | 🔴 Can gain root | 🟢 Isolated to namespace |
| Host file access | 🔴 Read/write /etc/shadow | 🟢 Blocked by UID mapping |
| Process manipulation | 🔴 Kill any process | 🟢 Cannot affect host processes |
| Network attacks | 🔴 Bind privileged ports | 🟢 Cannot bind < 1024 |
| Kernel operations | 🔴 Load kernel modules | 🟢 Completely blocked |
Malicious Dockerfile:
FROM alpine
RUN apk add --no-cache bash curl
# Malicious step - attempt to escape
RUN whoami && \
cat /proc/1/root/etc/shadow && \
kill -9 1What Happens:
Inside the container (thinks it's root):
$ whoami
root
$ id
uid=0(root) gid=0(root) groups=0(root)On the host (reality):
$ ps aux | grep buildkitd
kimia 12345 ... /usr/bin/buildkitdAll attack attempts fail:
$ cat /proc/1/root/etc/shadow
Permission denied
$ kill -9 1
Operation not permittedResult: User namespace isolation blocks all privilege escalation attempts.
Kimia runs as UID 1000 (non-root):
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000Benefit: Even if the container is compromised, attackers only have unprivileged user access.
Kimia requires only two Linux capabilities:
capabilities:
drop: [ALL]
add: [SETUID, SETGID] # Only for user namespace operationsWhy these capabilities?
SETUID- Required to create user namespace UID mappingsSETGID- Required to create user namespace GID mappings
Benefit: Minimal attack surface (2 of ~40 capabilities).
Kimia does not require privileged: true:
# ❌ NOT needed
securityContext:
privileged: true
# ✅ Kimia works with this
securityContext:
allowPrivilegeEscalation: true # Only for newuidmap/newgidmapNote: allowPrivilegeEscalation: true is specifically for SETUID binaries, not root access.
Kimia requires AppArmor to be set to Unconfined:
securityContext:
appArmorProfile:
type: Unconfined # Required for user namespace operationsWhy Unconfined is required:
- Default AppArmor profiles may block
unshare()syscall - Blocks operations needed for UID/GID mapping setup
- Prevents SETUID binaries from functioning correctly
What AppArmor typically restricts:
- Certain mount operations
- Namespace creation syscalls
- File capability operations
- SETUID binary execution
Security considerations:
- User namespaces provide the primary security boundary
- Even with Unconfined AppArmor, the container:
- Runs as UID 1000 (non-root)
- Has only SETUID/SETGID capabilities
- Is isolated by user namespace mapping
- Cannot escalate to real root on host
Kimia requires Seccomp to be set to Unconfined:
securityContext:
seccompProfile:
type: Unconfined # Required for unshare() and namespace syscallsWhy Unconfined is required:
- RuntimeDefault seccomp profile blocks
unshare(CLONE_NEWUSER) - Blocks syscalls needed for user namespace creation
- Some
clone()variations required for containers are filtered
Critical syscalls that must be allowed:
unshare(CLONE_NEWUSER) // User namespace creation
setuid(), setgid(), setgroups() // UID/GID mapping
clone(CLONE_NEWUSER | CLONE_NEWNS | CLONE_NEWPID) // Container namespaces
open("/etc/subuid"), open("/etc/subgid") // Subordinate ID filesSecurity considerations:
- User namespace isolation is the primary security mechanism
- Even with Unconfined seccomp:
- Process runs as unprivileged user (UID 1000)
- User namespace prevents host access
- Capabilities limited to SETUID/SETGID only
- Cannot perform privileged operations on host
No Docker or Podman daemon required:
- Reduces attack surface
- No daemon socket exposure
- No shared daemon state
- Isolated build processes
Kimia requires specific security context settings for user namespace support:
apiVersion: v1
kind: Pod
metadata:
name: kimia-build
labels:
app: kimia
spec:
# Pod-level security
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
containers:
- name: kimia
image: ghcr.io/rapidfort/kimia:latest
args:
- --context=.
- --destination=myregistry.io/myapp:latest
# Container-level security - ALL settings required
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: true # Required for SETUID binaries
capabilities:
drop: [ALL]
add: [SETUID, SETGID] # Required for user namespace mapping
appArmorProfile:
type: Unconfined # Required for user namespace operations
seccompProfile:
type: Unconfined # Required for unshare() syscall
readOnlyRootFilesystem: false # Builds need writable filesystem
resources:
limits:
memory: "8Gi"
cpu: "4"
ephemeral-storage: "20Gi"
requests:
memory: "2Gi"
cpu: "1"
volumeMounts:
- name: docker-config
mountPath: /home/kimia/.docker
readOnly: true
volumes:
- name: docker-config
secret:
secretName: registry-credentials| Restriction | Requirement | Kimia Status |
|---|---|---|
runAsNonRoot |
Must be true | ✅ Required (UID 1000) |
allowPrivilegeEscalation |
Must be false* | |
capabilities |
Can only add SETUID/SETGID | ✅ Only SETUID & SETGID |
appArmorProfile |
Must be set | |
seccompProfile |
Must be set | |
privileged |
Must be false | ✅ Not required |
hostNetwork |
Must be false | ✅ Not required |
hostPID |
Must be false | ✅ Not required |
*allowPrivilegeEscalation: true, appArmorProfile: Unconfined, and seccompProfile: Unconfined are needed specifically for user namespace operations, which provide the primary security isolation.
Restrict network access using Kubernetes NetworkPolicies:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kimia-network-policy
namespace: builds
spec:
podSelector:
matchLabels:
app: kimia
policyTypes:
- Egress
egress:
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# Allow HTTPS to registries
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
# Allow HTTP for package downloads
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 80apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kimia-strict-policy
namespace: builds
spec:
podSelector:
matchLabels:
app: kimia
policyTypes:
- Egress
egress:
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# Allow only specific registry
- to:
- podSelector: {}
namespaceSelector:
matchLabels:
name: registry-namespace
ports:
- protocol: TCP
port: 443Always configure resource limits to prevent resource exhaustion attacks:
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "8Gi"
cpu: "4"
ephemeral-storage: "10Gi" # Important for build artifacts!| Build Size | Memory Request | Memory Limit | CPU Request | CPU Limit | Storage |
|---|---|---|---|---|---|
| Small (<500MB) | 1Gi | 4Gi | 500m | 2 | 5Gi |
| Medium (500MB-2GB) | 2Gi | 8Gi | 1 | 4 | 10Gi |
| Large (>2GB) | 4Gi | 16Gi | 2 | 8 | 20Gi |
# Create from Docker config
kubectl create secret generic registry-credentials \
--from-file=.dockerconfigjson=$HOME/.docker/config.json \
--type=kubernetes.io/dockerconfigjsonMount secrets read-only:
volumeMounts:
- name: docker-config
mountPath: /home/kimia/.docker
readOnly: true # ✅ Prevent modificationapiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: registry-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: registry-credentials
template:
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: "{{ .registryAuth | toString }}"
data:
- secretKey: registryAuth
remoteRef:
key: container-registry
property: dockerconfigjsonGKE Workload Identity:
apiVersion: v1
kind: ServiceAccount
metadata:
name: kimia-builder
annotations:
iam.gke.io/gcp-service-account: kimia-builder@project.iam.gserviceaccount.comEKS IRSA:
apiVersion: v1
kind: ServiceAccount
metadata:
name: kimia-builder
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/KimiaBuilderRoleEnable audit logging for compliance:
apiVersion: batch/v1
kind: Job
metadata:
name: kimia-build
annotations:
build.initiated-by: "john@company.com"
build.source: "https://github.com/org/repo.git"
build.commit: "abc123"
build.timestamp: "2024-01-15T10:30:00Z"
spec:
template:
metadata:
labels:
app: kimia
build-id: "build-12345"
project: "myapp"
environment: "production"# Get pod name
KIMIA_POD=$(kubectl get pods -l app=kimia -o jsonpath='{.items[0].metadata.name}')
# Check namespace mappings
kubectl exec -it $KIMIA_POD -- cat /proc/self/uid_map
# 0 1000 1
# 1 100000 65535
# Explanation:
# Inside NS Host UID Count
# 0 → 1000 1 (root maps to UID 1000)
# 1 → 100000 65535 (UIDs 1-65535 map to 100000-165535)# Check effective capabilities
kubectl exec -it $KIMIA_POD -- cat /proc/self/status | grep Cap
# CapEff: 00000000000000c0
# ├──────────┬─────┘
# │ └─ SETUID (bit 7) + SETGID (bit 6)
# └─ No other capabilities!# Check AppArmor status
kubectl exec -it $KIMIA_POD -- cat /proc/self/attr/current
# unconfined# Check seccomp status
kubectl exec -it $KIMIA_POD -- grep Seccomp /proc/self/status
# Seccomp: 0 (0 = unconfined)# Check for NoNewPrivs flag (should be 0 for allowPrivilegeEscalation)
kubectl exec -it $KIMIA_POD -- cat /proc/self/status | grep NoNewPrivs
# NoNewPrivs: 0
# Verify running as non-root on host
kubectl exec -it $KIMIA_POD -- ps aux
# USER PID ...
# kimia 1 ... <--- All processes as 'kimia' userSymptom:
Error: cannot create user namespace: operation not permitted
Diagnosis:
# Check if user namespaces enabled on nodes
kubectl get nodes -o jsonpath='{.items[*].metadata.name}' | xargs -I {} \
kubectl debug node/{} -it --image=alpine -- \
cat /proc/sys/user/max_user_namespacesSolution:
# On each node
sudo sysctl -w user.max_user_namespaces=15000
sudo sysctl -pSymptom:
Error: newuidmap: Permission denied
Diagnosis:
# Check if allowPrivilegeEscalation is set
kubectl get pod $KIMIA_POD -o jsonpath='{.spec.containers[0].securityContext.allowPrivilegeEscalation}'
# Should return: true
# Check NoNewPrivs flag
kubectl exec -it $KIMIA_POD -- grep NoNewPrivs /proc/self/status
# Should return: NoNewPrivs: 0Solution:
securityContext:
allowPrivilegeEscalation: true # Must be true!Symptom:
Error: operation not permitted (AppArmor denial)
audit: type=1400 audit(...): apparmor="DENIED" operation="..."
Diagnosis:
# Check AppArmor profile
kubectl exec -it $KIMIA_POD -- cat /proc/self/attr/current
# Should return: unconfined
# Check for AppArmor denials in node logs
kubectl logs -n kube-system <node-logger-pod> | grep apparmorSolution:
securityContext:
appArmorProfile:
type: Unconfined # Must be UnconfinedSymptom:
Error: unshare: Operation not permitted
Error: cannot create user namespace
Diagnosis:
# Check seccomp status
kubectl exec -it $KIMIA_POD -- grep Seccomp /proc/self/status
# Seccomp: 2 (2 = filtering enabled - WRONG)
# Should be: 0 (0 = unconfined)Solution:
securityContext:
seccompProfile:
type: Unconfined # Must be UnconfinedSymptom:
Error: no subuid ranges allocated for user kimia
Solution:
# On Kubernetes nodes (via DaemonSet or node access)
echo "kimia:100000:65536" | sudo tee -a /etc/subuid
echo "kimia:100000:65536" | sudo tee -a /etc/subgidIf a Build Pod is Compromised:
-
Immediate Actions:
# Delete the compromised pod kubectl delete pod <kimia-pod> # Check for other suspicious pods kubectl get pods -l app=kimia --all-namespaces # Review audit logs kubectl logs <kimia-pod> --previous
-
Investigation:
- Check what images were built/pushed
- Review registry access logs
- Verify no malicious images were created
- Check for privilege escalation attempts
-
Mitigation:
- Rotate registry credentials
- Scan all recently built images
- Review and tighten NetworkPolicies
- Update RBAC policies if needed
Kimia addresses NIST 800-190 container security recommendations:
- ✅ Runtime Defense: Rootless operation
- ✅ Image Security: Supports image scanning integration
- ✅ Registry Security: TLS enforcement
- ✅ Orchestrator Security: Kubernetes-native with Pod Security Standards
- ✅ Host Security: User namespace isolation
Kimia aligns with CIS Kubernetes security benchmarks:
- ✅ 5.2.1 - Minimize admission of privileged containers
- ✅ 5.2.5 - Minimize admission of containers with capabilities
- ✅ 5.2.6 - Minimize admission of root containers
- ✅ 5.7.3 - Apply Security Context to Pods and Containers
Kimia requires specific deviations from the Restricted Pod Security Standard:
| Setting | Standard Requirement | Kimia Requirement | Justification |
|---|---|---|---|
allowPrivilegeEscalation |
false | true | Required for SETUID binaries (newuidmap/newgidmap) |
appArmorProfile |
runtime/default | Unconfined | Required for user namespace creation syscalls |
seccompProfile |
RuntimeDefault | Unconfined | Required for unshare(CLONE_NEWUSER) syscall |
Important: While these settings deviate from Restricted standard, user namespace isolation provides equivalent or stronger security guarantees.
-
User Namespace Isolation ★
- Container processes appear as root (UID 0) inside
- Actually run as UID 1000 (unprivileged) on host
- Complete isolation from host system
-
Minimal Privilege Model
- Only SETUID and SETGID capabilities required
- No privileged mode needed
- No daemon socket exposure
-
Defense in Depth
- Rootless pod execution (layer 1)
- User namespace mapping (layer 2)
- Minimal capabilities (layer 3)
- Network policies (layer 4)
For Kimia to work with user namespaces, ALL of the following must be configured:
✅ Container Security
- Run as non-root (UID 1000)
-
allowPrivilegeEscalation: true- For SETUID binaries -
capabilities.add: [SETUID, SETGID]- For UID/GID mapping -
capabilities.drop: [ALL]- Minimize attack surface -
appArmorProfile.type: Unconfined- Allow namespace operations -
seccompProfile.type: Unconfined- Allow unshare() syscall - Mount secrets as read-only
✅ Network Security
- Implement NetworkPolicies
- Restrict egress to known registries
- Use private registries when possible
- Enable TLS for all registry connections
✅ Resource Security
- Set resource requests and limits
- Configure ephemeral storage limits
- Use node selectors/taints for isolation
✅ Secrets & Credentials
- Never hardcode credentials
- Use Kubernetes secrets or external secret managers
- Mount credentials as read-only
- Rotate credentials regularly
- Use workload identity when available
✅ Build Security
- Pin base images by digest
- Scan images for vulnerabilities
- Use reproducible builds
- Sign images with Cosign
- Generate SBOMs
✅ Compliance & Audit
- Enable audit logging
- Tag builds with metadata
- Implement RBAC for build jobs
- Monitor build failures
- Regular security reviews
Kimia vs Privileged Containers:
| Aspect | Kimia | Privileged Container |
|---|---|---|
| Host access | Isolated (UID 1000) | Full root access |
| Capabilities | 2 (SETUID, SETGID) | All (~40 capabilities) |
| Container escape | 🟢 UID 1000 only | 🔴 Root on host |
| Kubernetes security | User namespace isolation | Privileged mode required |
| Risk level | 🟢 Low | 🔴 Critical |
Kimia vs Rootless Docker/Podman:
| Aspect | Kimia | Rootless Docker |
|---|---|---|
| Architecture | Kubernetes-native | Daemon-based |
| Isolation | User namespaces | User namespaces |
| Daemon socket | None | Exposed socket risk |
| Multi-tenancy | Native K8s isolation | Shared daemon state |
| Setting | Security Impact | Mitigation |
|---|---|---|
allowPrivilegeEscalation: true |
Allows SETUID binaries | Limited to namespace mapping only |
appArmorProfile: Unconfined |
Removes AppArmor restrictions | User namespace provides isolation |
seccompProfile: Unconfined |
No syscall filtering | Limited capabilities, non-root user |
| Combined | Reduced restrictions | Multiple defense layers remain active |
- User Namespace Isolation: Primary security boundary - root in container = UID 1000 on host
- Minimal Capabilities: Only SETUID and SETGID (2 of ~40 capabilities)
- Non-Root Execution: All processes run as unprivileged user
- No Privileged Mode: Far more secure than
privileged: true - Resource Limits: Prevents resource exhaustion attacks
- Network Policies: Can restrict network access
Kimia's security model is:
- Battle-tested: Based on proven Linux user namespace technology
- Kubernetes-native: Follows Pod Security Standards with justified deviations
- Compliance-friendly: Meets NIST 800-190 and CIS benchmarks
- Zero-trust compatible: Works with network policies and service mesh
The result: Secure container builds with true isolation, even in multi-tenant Kubernetes clusters.
- Installation Guide - Setup instructions
- Troubleshooting Guide - Common issues and solutions
- Pod Security Standards - Kubernetes security best practices
- Linux User Namespaces - Technical documentation