Skip to content

Sentinel crashes with nil pointer dereference when inspecting dead containers #30

@casistack

Description

@casistack

Sentinel Crash Report: Nil Pointer Dereference with Dead Containers

Bug Description

Sentinel crashes with a nil pointer dereference when encountering corrupted or dead Docker containers during the container inspection process in push.go.

Environment

  • Sentinel Version: v0.0.18
  • Docker Version: 29.0.2
  • OS: Linux (Arch-based)
  • Container State: Dead/Corrupted

Reproduction Steps

  1. Have a dead or corrupted container present in Docker
  2. Start Sentinel with push service enabled
  3. Wait for push cycle (default 60 seconds)
  4. Sentinel crashes when attempting to inspect the dead container

Stack Trace

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xb73cf4]

goroutine 10 [running]:
github.com/coollabsio/sentinel/pkg/push.(*Pusher).containerData(0xc000010cc0)
        /app/pkg/push/push.go:156 +0x974
github.com/coollabsio/sentinel/pkg/push.(*Pusher).GetPushData(0xc000010cc0)
        /app/pkg/push/push.go:58 +0xfc
github.com/coollabsio/sentinel/pkg/push.(*Pusher).Run(0xc000010cc0, {0xf92e60, 0xc00014cdc0})
        /app/pkg/push/push.go:51 +0x8f

Root Cause

The code at pkg/push/push.go:156 attempts to access inspectData.State.Health without first checking if inspectData.State is nil. Dead or corrupted containers can return a Docker inspect response with a nil State object.

Current Code (Problematic)

// Check if State exists and is not nil before accessing Health
if inspectData.State != nil && inspectData.State.Health != nil {
    healthStatus = inspectData.State.Health.Status
} else if inspectData.State == nil {
    log.Printf("Warning: Container %s has nil State (possibly corrupted/dead)", container.ID)
    healthStatus = "unknown"
}

The issue is that line 156 appears to be accessing State.Health before the nil check is complete, or the else branch is missing proper handling.

Impact

  • Complete Sentinel service crash
  • Loss of all container monitoring on affected server
  • No automatic recovery - requires manual intervention
  • Affects any environment with corrupted containers (common after system crashes, OOM kills, improper shutdowns)

Suggested Fix

Ensure the nil check logic is complete and properly handles all edge cases:

healthStatus := "unknown"
if inspectData.State != nil {
    if inspectData.State.Health != nil {
        healthStatus = inspectData.State.Health.Status
    }
} else {
    log.Printf("Warning: Container %s has nil State (possibly corrupted/dead)", container.ID)
}

This ensures we never attempt to access .Health on a nil State object.

Workaround

Until fixed, administrators can:

  1. Identify dead containers: docker ps -a | grep Dead
  2. Remove dead container metadata: sudo rm -rf /var/lib/docker/containers/<container_id>
  3. Restart Docker daemon: sudo systemctl restart docker

Additional Context

Dead containers commonly occur from:

  • System crashes during container runtime
  • Docker daemon failures
  • OOM (Out of Memory) kills
  • Improper container shutdowns
  • Storage driver issues

This is a high-impact bug as it causes complete monitoring failure for the affected server, and dead containers are a common occurrence in production environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions