Skip to content

next #22

@j143

Description

@j143

This is a solid, non‑trivial learning project that already goes well beyond “toy runtime”; it’s not something to abandon, but something to narrow, finish in layers, and then position as a reference implementation.[1]

Below is a concrete way to think about it.

What it is today

From the repo, you effectively have:[1]

  • A basic container runtime: basic-docker run, ps, minimal image/layer logic, filesystem isolation, namespaces, with some cgroup integration (currently permission‑sensitive).[1]
  • Monitoring abstraction across process/container/host isolation levels, including monitor host|process|container|all|gap|correlation and a clear table of gaps. [1]
  • Kubernetes integration via a ResourceCapsule CRD, operator, and kubernetes.go/crd_*.go plumbing, plus ADRs for networking, image download, and resource capsules.[1]

This is already a nice intersection of: containers internals, observability, and K8s custom resource design.[1]

What’s missing / incomplete

Reading the README, code list, and docs, the main gaps look like:[1]

  • Runtime robustness

    • Cgroup handling is brittle (permission issues, hardcoded memory cgroup path, limited feature flags).[1]
    • Networking stack only lightly tested; no clear story for port mapping, DNS, multi‑container networks beyond basic veth tests.[1]
    • Container lifecycle is minimal: no restart policies, logs, stop/kill, or state model beyond simple IDs and directories.[1]
  • Image and filesystem story

    • Image handling exists (image.go, tests, real Docker image download), but no clear CLI for pull, images, rmi, caching policy, or clear OCI boundary.[1]
    • Rootfs / layering is described in the architecture doc but not formalized as a “contract” (e.g., what exactly is a layer, where metadata lives, how GC works).[1]
  • Kubernetes / ResourceCapsule

    • CRD and operator exist, but the end‑to‑end story is not obvious: how does a user write a Capsule, attach to a Deployment, and what guarantees does the system provide.[1]
    • No versioned spec for “ResourceCapsule” (apiVersion/kind semantics, status, conditions, examples for scaling, limits, etc.).[1]
  • Monitoring narrative

    • Monitoring features are implemented and documented, but they’re not yet framed as a coherent “problem statement → design → implementation → examples → limitations”.[1]
    • No export story: e.g., Prometheus metrics, JSON output, or how to plug this into existing infra.[1]
  • External readiness

    • No releases, no clear “v0.1/v0.2” milestones.[1]
    • Tests are present but not organized into a crisp “what guarantees do tests give you” section, and CI only runs go build ..[1]

How to approach it (strategy)

Given your profile, this project should serve 3 purposes: concept mastery, a portfolio artifact, and a foundation for writing/talks. Explicitly pick that as the goal, not “replace Docker”.

Suggested approach:

  1. Define a sharp scope: “lean Docker engine with monitoring + K8s capsules”

    • Declare this in README: a teaching/runtime prototype focused on isolation, monitoring gaps, and K8s integration, not a production daemon.[1]
    • Add a high‑level design doc (or extend RESEARCH.md) with: goals, non‑goals, core constraints (single‑host, Linux only, root required).[1]
  2. Work in small, versioned milestones

    • v0.1: Core runtime (run/ps, minimal image, filesystem + process namespaces, verify.sh green).
    • v0.2: Monitoring system fully polished: consistent CLI, nice examples, basic JSON export, clearer docs.
    • v0.3: ResourceCapsule CRD/operator hardened with at least one end‑to‑end scenario documented.
    • Each version gets a GitHub release + short changelog so it looks like a living, coherent project.[1]
  3. Use ADRs aggressively

    • You already have adr-00x. For every major decision (image format, network model, monitoring semantics, CRD shape) capture the tradeoffs and link back in README.[1]
    • This directly supports future talks/blog posts.

Concrete next priorities

Priority 1: Stabilize core runtime

Goal: “If you follow README steps on a modern Linux host as root, basic flows always work.”

  • Fix cgroup access + permission errors:

    • Detect cgroup v1 vs v2; handle paths correctly; degrade gracefully when limits cannot be set and surface this in info.[1]
    • Add tests that simulate “no cgroup access” vs “full cgroup access” and assert feature flags.[1]
  • Clarify lifecycle:

    • Introduce a simple state model: created, running, exited, failed.
    • Ensure ps and on‑disk metadata reflect it (e.g., per‑container JSON state file in /tmp/basic-docker/containers/<id>/state.json).[1]
  • Round out CLI surface:

    • Implement rm (cleanup directories), logs (even if it just tails stdout file), and inspect (dump container JSON state).
    • Document all commands in README + --help.[1]

Priority 2: Solidify monitoring system

Goal: “This repo is the clearest code & doc demonstration of Docker’s monitoring gap.”

  • Refine monitor output formats:

    • Add --format json|table and a stable schema for JSON.
    • Ensure correlation tables include PIDs, container IDs, cgroup paths, network interfaces, and hostname in a consistent schema.[1]
  • Document the “monitoring problem” properly:

    • Expand MONITORING.md: problem statement, taxonomy (process/container/host), correlation table explanation, and how your commands map to each layer.[1]
    • Add 2–3 real usage examples (e.g., “given PID X in Kubernetes Pod Y, how to map to host cgroup and network”).
  • Export story:

    • Minimal HTTP endpoint (or monitor export) that prints metrics suitable for Prometheus scrape, even if basic (CPU, memory, network per container).

Priority 3: Make ResourceCapsule CRD actually compelling

Goal: “Show a small but opinionated enhancement to Kubernetes resource modeling using your custom runtime.”

  • Stabilize the CRD spec:

    • In crd_types.go, lock down fields you care about (e.g., CPU/memory ranges, caps, hints about IO/network QoS).[1]
    • Add status and conditions so kubectl get resourcecapsules is meaningful.
  • End‑to‑end example:

    • Create a k8s/examples/ folder with:
      • resourcecapsule.yaml
      • deployment-with-capsule.yaml
      • A short README walking from apply → observe operator logs → see effect in basic-docker info/monitor.[1]
  • GitOps angle:

    • Since you mention GitOps support, add a short doc showing how a capsule spec in Git translates to runtime configurations (even if via a simple reconciliation loop).[1]

Priority 4: Testing & CI polish

Goal: “Anyone can run tests and trust they reflect what’s documented.”

  • Improve verify.sh:

    • Split into smaller steps with set -e, clear headings, and explicit checks (grep expected strings).[1]
    • Optionally add a mode that runs only non‑root checks, plus a root‑required mode.
  • CI:

    • Extend GitHub Actions workflow to run go test ./... and a non‑root subset of verify.sh on push/PR.[1]
    • Add a badge to README (“build passing”).

Whether to abandon or not

Given what is already there, abandoning would throw away: a working mini‑runtime, a nice monitoring abstraction, and a K8s CRD/operator scaffold that is hard to reconstruct later.[1]

A better framing: “graduate” this project into a well‑scoped, finished research artifact with 2–3 tagged releases, strong docs, and at least one blog post / talk derived from it. That is much more valuable for you than chasing a fully‑featured engine.

If you want, the next step can be: paste MONITORING.md and RESEARCH.md here, and a short “target audience” (e.g., “senior infra engineer interviews / tech talks”), and a concrete v0.2 roadmap can be written with exact issues you can open in GitHub.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions