Stabilize core runtime: cgroup v1/v2 detection, container lifecycle, and CLI commands #23

Copilot · 2025-12-31T09:57:33Z

Implements Priority 1 from the roadmap: stabilizing the core container runtime with proper cgroup handling, state tracking, and essential CLI operations.

Changes

Cgroup support (`cgroup.go`)

Auto-detect cgroup v1 vs v2, handle both memory.limit_in_bytes (v1) and memory.max (v2)
Degrade gracefully when cgroups unavailable—containers run without limits, warnings logged
Expose cgroup capabilities via info command

Container lifecycle (`container.go`)

State model: created → running → exited|failed
Persist metadata to state.json (command, args, timestamps, exit code, PID)
Atomic state updates via UpdateContainerState()

CLI commands (`main.go`)

rm <id> - remove stopped containers, cleanup cgroups
logs <id> - read stdout/stderr from persistent log files
inspect <id> - dump full container metadata as JSON
ps - now shows actual states and creation time
info - displays cgroup version, controller availability

Logging

Use io.MultiWriter to send output to both console and log file simultaneously

Testing

Unit tests in container_test.go for state management
Integration tests in verify-new.sh with 12 test scenarios
CodeQL: 0 vulnerabilities

Example

$ sudo ./basic-docker run alpine /bin/echo "test"
Starting container container-1767175530
test

$ sudo ./basic-docker ps
CONTAINER ID         STATE     COMMAND        CREATED
container-1767175530 exited    /bin/echo      2025-12-31 10:05:30

$ sudo ./basic-docker inspect container-1767175530
{
  "state": "exited",
  "exit_code": 0,
  "started_at": "2025-12-31T10:05:30Z",
  ...
}

$ sudo ./basic-docker logs container-1767175530
test

$ sudo ./basic-docker rm container-1767175530
Container removed successfully

Original prompt

This section details on the original issue you should resolve

<issue_title>next</issue_title>
<issue_description>This is a solid, non‑trivial learning project that already goes well beyond “toy runtime”; it’s not something to abandon, but something to narrow, finish in layers, and then position as a reference implementation.[1]

Below is a concrete way to think about it.

What it is today

From the repo, you effectively have:[1]

A basic container runtime: basic-docker run, ps, minimal image/layer logic, filesystem isolation, namespaces, with some cgroup integration (currently permission‑sensitive).[1]
Monitoring abstraction across process/container/host isolation levels, including monitor host|process|container|all|gap|correlation and a clear table of gaps. [1]
Kubernetes integration via a ResourceCapsule CRD, operator, and kubernetes.go/crd_*.go plumbing, plus ADRs for networking, image download, and resource capsules.[1]

This is already a nice intersection of: containers internals, observability, and K8s custom resource design.[1]

What’s missing / incomplete

Reading the README, code list, and docs, the main gaps look like:[1]

Runtime robustness
- Cgroup handling is brittle (permission issues, hardcoded memory cgroup path, limited feature flags).[1]
- Networking stack only lightly tested; no clear story for port mapping, DNS, multi‑container networks beyond basic veth tests.[1]
- Container lifecycle is minimal: no restart policies, logs, stop/kill, or state model beyond simple IDs and directories.[1]
Image and filesystem story
- Image handling exists (image.go, tests, real Docker image download), but no clear CLI for pull, images, rmi, caching policy, or clear OCI boundary.[1]
- Rootfs / layering is described in the architecture doc but not formalized as a “contract” (e.g., what exactly is a layer, where metadata lives, how GC works).[1]
Kubernetes / ResourceCapsule
- CRD and operator exist, but the end‑to‑end story is not obvious: how does a user write a Capsule, attach to a Deployment, and what guarantees does the system provide.[1]
- No versioned spec for “ResourceCapsule” (apiVersion/kind semantics, status, conditions, examples for scaling, limits, etc.).[1]
Monitoring narrative
- Monitoring features are implemented and documented, but they’re not yet framed as a coherent “problem statement → design → implementation → examples → limitations”.[1]
- No export story: e.g., Prometheus metrics, JSON output, or how to plug this into existing infra.[1]
External readiness
- No releases, no clear “v0.1/v0.2” milestones.[1]
- Tests are present but not organized into a crisp “what guarantees do tests give you” section, and CI only runs go build ..[1]

How to approach it (strategy)

Given your profile, this project should serve 3 purposes: concept mastery, a portfolio artifact, and a foundation for writing/talks. Explicitly pick that as the goal, not “replace Docker”.

Suggested approach:

Define a sharp scope: “lean Docker engine with monitoring + K8s capsules”
- Declare this in README: a teaching/runtime prototype focused on isolation, monitoring gaps, and K8s integration, not a production daemon.[1]
- Add a high‑level design doc (or extend RESEARCH.md) with: goals, non‑goals, core constraints (single‑host, Linux only, root required).[1]
Work in small, versioned milestones
- v0.1: Core runtime (run/ps, minimal image, filesystem + process namespaces, verify.sh green).
- v0.2: Monitoring system fully polished: consistent CLI, nice examples, basic JSON export, clearer docs.
- v0.3: ResourceCapsule CRD/operator hardened with at least one end‑to‑end scenario documented.
- Each version gets a GitHub release + short changelog so it looks like a living, coherent project.[1]
Use ADRs aggressively
- You already have adr-00x. For every major decision (image format, network model, monitoring semantics, CRD shape) capture the tradeoffs and link back in README.[1]
- This directly supports future talks/blog posts.

Concrete next priorities

Priority 1: Stabilize core runtime

Goal: “If you follow README steps on a modern Linux host as root, basic flows always work.”

Fix cgroup access + permission errors:
- Detect cgroup v1 vs v2; handle paths correctly; degrade gracefully when limits cannot be set and surface this in info.[1]
- Add tests that simulate “no cgroup access” vs “full cgroup access” and assert feature flags.[1]
Clarify lifecycle:
- Introduce a simple state model: created, running, exited, failed.
- Ensure ps and on‑disk metadata reflect it (e.g., per‑container JSON state file in /tmp/basic-docker/containers/<id>/state.json).[1]
Round out CLI surface:
- Implement rm (cleanup directories), logs (even if it just tails stdout file), and inspect (dump container JSON state).
- Document all comman...

Fixes next #22

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Add cgroup.go with automatic v1/v2 detection - Add container.go with state management (created, running, exited, failed) - Implement rm, logs, and inspect CLI commands - Update info command to show cgroup details - Update ps command to show container states - Store container metadata in state.json files Co-authored-by: j143 <[email protected]>

- Add container_test.go with full test coverage for state management - Create verify-new.sh with structured test framework - Update README with container lifecycle, cgroup support, and new CLI commands - Document graceful degradation and state persistence Co-authored-by: j143 <[email protected]>

- Use io.MultiWriter to send output to both console and log file - Add warning logs for cgroup degradation instead of silent failures - Remove duplicate command/args extraction in run function - Improve verify-new.sh with proper binary validation and sudo usage Co-authored-by: j143 <[email protected]>

Complete documentation of all changes, improvements, and testing results Co-authored-by: j143 <[email protected]>

Copilot AI self-assigned this Dec 31, 2025

Initial plan

b6a38f2

Copilot AI assigned j143 Dec 31, 2025

Copilot started work on behalf of j143 December 31, 2025 09:57 View session

Copilot AI and others added 4 commits December 31, 2025 10:04

Add Priority 1 implementation summary document

9d99c48

Complete documentation of all changes, improvements, and testing results Co-authored-by: j143 <[email protected]>

Copilot AI changed the title ~~[WIP] Narrow down learning project implementation~~ Stabilize core runtime: cgroup v1/v2 detection, container lifecycle, and CLI commands Dec 31, 2025

Copilot AI requested a review from j143 December 31, 2025 10:16

Copilot finished work on behalf of j143 December 31, 2025 10:16

j143 approved these changes Dec 31, 2025

View reviewed changes

j143 marked this pull request as ready for review December 31, 2025 11:22

j143 merged commit ccf6a33 into main Dec 31, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stabilize core runtime: cgroup v1/v2 detection, container lifecycle, and CLI commands #23

Stabilize core runtime: cgroup v1/v2 detection, container lifecycle, and CLI commands #23

Uh oh!

Copilot AI commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stabilize core runtime: cgroup v1/v2 detection, container lifecycle, and CLI commands #23

Stabilize core runtime: cgroup v1/v2 detection, container lifecycle, and CLI commands #23

Uh oh!

Conversation

Copilot AI commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Cgroup support (cgroup.go)

Container lifecycle (container.go)

CLI commands (main.go)

Logging

Testing

Example

What it is today

What’s missing / incomplete

How to approach it (strategy)

Concrete next priorities

Priority 1: Stabilize core runtime

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 31, 2025 •

edited

Loading

Cgroup support (`cgroup.go`)

Container lifecycle (`container.go`)

CLI commands (`main.go`)