-
Notifications
You must be signed in to change notification settings - Fork 3
Description
This is a solid, non‑trivial learning project that already goes well beyond “toy runtime”; it’s not something to abandon, but something to narrow, finish in layers, and then position as a reference implementation.[1]
Below is a concrete way to think about it.
What it is today
From the repo, you effectively have:[1]
- A basic container runtime:
basic-docker run,ps, minimal image/layer logic, filesystem isolation, namespaces, with some cgroup integration (currently permission‑sensitive).[1] - Monitoring abstraction across process/container/host isolation levels, including
monitor host|process|container|all|gap|correlationand a clear table of gaps. [1] - Kubernetes integration via a
ResourceCapsuleCRD, operator, andkubernetes.go/crd_*.goplumbing, plus ADRs for networking, image download, and resource capsules.[1]
This is already a nice intersection of: containers internals, observability, and K8s custom resource design.[1]
What’s missing / incomplete
Reading the README, code list, and docs, the main gaps look like:[1]
-
Runtime robustness
- Cgroup handling is brittle (permission issues, hardcoded
memorycgroup path, limited feature flags).[1] - Networking stack only lightly tested; no clear story for port mapping, DNS, multi‑container networks beyond basic veth tests.[1]
- Container lifecycle is minimal: no restart policies, logs, stop/kill, or state model beyond simple IDs and directories.[1]
- Cgroup handling is brittle (permission issues, hardcoded
-
Image and filesystem story
- Image handling exists (
image.go, tests, real Docker image download), but no clear CLI forpull,images,rmi, caching policy, or clear OCI boundary.[1] - Rootfs / layering is described in the architecture doc but not formalized as a “contract” (e.g., what exactly is a layer, where metadata lives, how GC works).[1]
- Image handling exists (
-
Kubernetes / ResourceCapsule
- CRD and operator exist, but the end‑to‑end story is not obvious: how does a user write a Capsule, attach to a Deployment, and what guarantees does the system provide.[1]
- No versioned spec for “ResourceCapsule” (apiVersion/kind semantics, status, conditions, examples for scaling, limits, etc.).[1]
-
Monitoring narrative
- Monitoring features are implemented and documented, but they’re not yet framed as a coherent “problem statement → design → implementation → examples → limitations”.[1]
- No export story: e.g., Prometheus metrics, JSON output, or how to plug this into existing infra.[1]
-
External readiness
- No releases, no clear “v0.1/v0.2” milestones.[1]
- Tests are present but not organized into a crisp “what guarantees do tests give you” section, and CI only runs
go build ..[1]
How to approach it (strategy)
Given your profile, this project should serve 3 purposes: concept mastery, a portfolio artifact, and a foundation for writing/talks. Explicitly pick that as the goal, not “replace Docker”.
Suggested approach:
-
Define a sharp scope: “lean Docker engine with monitoring + K8s capsules”
- Declare this in README: a teaching/runtime prototype focused on isolation, monitoring gaps, and K8s integration, not a production daemon.[1]
- Add a high‑level design doc (or extend
RESEARCH.md) with: goals, non‑goals, core constraints (single‑host, Linux only, root required).[1]
-
Work in small, versioned milestones
- v0.1: Core runtime (run/ps, minimal image, filesystem + process namespaces, verify.sh green).
- v0.2: Monitoring system fully polished: consistent CLI, nice examples, basic JSON export, clearer docs.
- v0.3: ResourceCapsule CRD/operator hardened with at least one end‑to‑end scenario documented.
- Each version gets a GitHub release + short changelog so it looks like a living, coherent project.[1]
-
Use ADRs aggressively
- You already have
adr-00x. For every major decision (image format, network model, monitoring semantics, CRD shape) capture the tradeoffs and link back in README.[1] - This directly supports future talks/blog posts.
- You already have
Concrete next priorities
Priority 1: Stabilize core runtime
Goal: “If you follow README steps on a modern Linux host as root, basic flows always work.”
-
Fix cgroup access + permission errors:
- Detect cgroup v1 vs v2; handle paths correctly; degrade gracefully when limits cannot be set and surface this in
info.[1] - Add tests that simulate “no cgroup access” vs “full cgroup access” and assert feature flags.[1]
- Detect cgroup v1 vs v2; handle paths correctly; degrade gracefully when limits cannot be set and surface this in
-
Clarify lifecycle:
- Introduce a simple state model:
created,running,exited,failed. - Ensure
psand on‑disk metadata reflect it (e.g., per‑container JSON state file in/tmp/basic-docker/containers/<id>/state.json).[1]
- Introduce a simple state model:
-
Round out CLI surface:
- Implement
rm(cleanup directories),logs(even if it just tails stdout file), andinspect(dump container JSON state). - Document all commands in README +
--help.[1]
- Implement
Priority 2: Solidify monitoring system
Goal: “This repo is the clearest code & doc demonstration of Docker’s monitoring gap.”
-
Refine
monitoroutput formats:- Add
--format json|tableand a stable schema for JSON. - Ensure correlation tables include PIDs, container IDs, cgroup paths, network interfaces, and hostname in a consistent schema.[1]
- Add
-
Document the “monitoring problem” properly:
- Expand
MONITORING.md: problem statement, taxonomy (process/container/host), correlation table explanation, and how your commands map to each layer.[1] - Add 2–3 real usage examples (e.g., “given PID X in Kubernetes Pod Y, how to map to host cgroup and network”).
- Expand
-
Export story:
- Minimal HTTP endpoint (or
monitor export) that prints metrics suitable for Prometheus scrape, even if basic (CPU, memory, network per container).
- Minimal HTTP endpoint (or
Priority 3: Make ResourceCapsule CRD actually compelling
Goal: “Show a small but opinionated enhancement to Kubernetes resource modeling using your custom runtime.”
-
Stabilize the CRD spec:
- In
crd_types.go, lock down fields you care about (e.g., CPU/memory ranges, caps, hints about IO/network QoS).[1] - Add
statusand conditions sokubectl get resourcecapsulesis meaningful.
- In
-
End‑to‑end example:
- Create a
k8s/examples/folder with:resourcecapsule.yamldeployment-with-capsule.yaml- A short README walking from apply → observe operator logs → see effect in
basic-docker info/monitor.[1]
- Create a
-
GitOps angle:
- Since you mention GitOps support, add a short doc showing how a capsule spec in Git translates to runtime configurations (even if via a simple reconciliation loop).[1]
Priority 4: Testing & CI polish
Goal: “Anyone can run tests and trust they reflect what’s documented.”
-
Improve
verify.sh:- Split into smaller steps with
set -e, clear headings, and explicit checks (grepexpected strings).[1] - Optionally add a mode that runs only non‑root checks, plus a root‑required mode.
- Split into smaller steps with
-
CI:
- Extend GitHub Actions workflow to run
go test ./...and a non‑root subset ofverify.shon push/PR.[1] - Add a badge to README (“build passing”).
- Extend GitHub Actions workflow to run
Whether to abandon or not
Given what is already there, abandoning would throw away: a working mini‑runtime, a nice monitoring abstraction, and a K8s CRD/operator scaffold that is hard to reconstruct later.[1]
A better framing: “graduate” this project into a well‑scoped, finished research artifact with 2–3 tagged releases, strong docs, and at least one blog post / talk derived from it. That is much more valuable for you than chasing a fully‑featured engine.
If you want, the next step can be: paste MONITORING.md and RESEARCH.md here, and a short “target audience” (e.g., “senior infra engineer interviews / tech talks”), and a concrete v0.2 roadmap can be written with exact issues you can open in GitHub.