Skip to content

Commit f90c0b4

Browse files
docs(core): align documentation with encapsulated monorepo structure (#257)
1 parent 1fa0eea commit f90c0b4

18 files changed

+128
-148
lines changed

AGENTS.md

Lines changed: 18 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -8,25 +8,20 @@ Agents must distinguish between the two primary orchestration tiers to avoid "ci
88

99
### 🌌 Hybrid Orchestration Layers
1010

11-
- **Host Tier (Systemd)**: Reserved for hardware-level telemetry, security gates, and GitOps reconciliation. Reliability here is critical for cluster recovery. Core logic is extracted into `pkg/` libraries to ensure reusability and consistency across different execution triggers (CLI, API, and future AI tools).
11+
- **Host Tier (Systemd)**: Reserved for hardware-level telemetry, security gates, and GitOps reconciliation. Reliability here is critical for cluster recovery. Core logic is strictly encapsulated in `internal/` to ensure reusability and enforce project boundaries.
1212
- **Cluster Tier (K3s)**: Handles scalable data services (Postgres, Loki, Prometheus, Grafana, Tempo, MinIO). Orchestrated via **OpenTofu (IaC)** in `tofu/`.
1313

14-
### 📦 Distribution Pattern
14+
### 🏗️ Directory Map (Consolidated Monorepo)
1515

16-
To maintain a clean repository and ensure operational stability, all compiled binaries must be output to the root `dist/` directory. Systemd unit files and automated scripts should reference artifacts from this location.
17-
18-
### 🏗️ Directory Map
19-
20-
- **`pkg/`**: Shared Go modules (DB, brain, env, logger, metrics, secrets, telemetry). Maintain stable interfaces.
21-
- **`services/`**: Standalone binaries and operational entry points.
22-
- **`services/proxy/`**: The \"Central Nervous System.\" API gateway and GitOps webhook listener.
23-
- **`services/system-metrics/`**: Host hardware telemetry collector.
24-
- **`services/second-brain/`**: Knowledge ingestion pipeline.
25-
- **`dist/`**: Production artifacts. Centralized directory for all compiled binaries.
16+
- **`cmd/`**: Minimal entry points for services. Focuses on configuration and orchestration.
17+
- **`cmd/web/`**: Static site generator entry point.
18+
- **`cmd/proxy/`**: API Gateway and GitOps webhook listener entry point.
19+
- **`cmd/collectors/`**: Host telemetry collection daemon entry point.
20+
- **`cmd/ingestion/`**: Daily data sync and second brain integration entry point.
21+
- **`internal/`**: Private Implementation Layer. Enforces Go's internal package visibility rules.
2622
- **`k3s/`**: Kubernetes manifests and Helm values for the data platform.
2723
- **`makefiles/`**: Modular logic for the root automation layer.
2824
- **`systemd/`**: Host-tier unit files for production service management.
29-
- **`web/`**: Static site generator for public-facing portfolio.
3025
- **`scripts/`**: Operational utilities (Traffic gen, ADR creation, Tailscale gate).
3126
- **`docs/`**: Institutional memory. ADRs, Architecture, Incidents, and Notes.
3227

@@ -40,28 +35,22 @@ The project uses a unified automation layer. **Always prefer `make` commands** a
4035
| :--- | :--- | :--- |
4136
| **Governance** | `make adr` | Creates a new Architecture Decision Record. |
4237
| **IaC** | `tofu plan` / `apply` | Manages K3s data services and infrastructure state. |
43-
| **Quality** | `make lint` | Lints markdown and configuration files (`lint-configs`). |
38+
| **Quality** | `make lint` | Lints markdown and configuration files. |
4439
| **Go Dev** | `make go-test` | Runs full test suite across the monorepo. |
4540
| **Security** | `make go-vuln-scan` | Executes `govulncheck` for dependency auditing. |
46-
| **K3s Ops** | `make kube-lint` | Validates K8s manifests for security violations. |
47-
| **Host Ops** | `make reload-services` | Safely reloads host-tier systemd units. |
41+
| **K3s Ops** | `make build-collectors` | Builds and imports the collectors Docker image into K3s. |
42+
| **Host Ops** | `make proxy-build` | Builds proxy server to `bin/` and restarts the service. |
4843

4944
## 3. Engineering Standards
5045

5146
### 🐹 Go (Backend)
5247

53-
- **Library-First**: Move core domain logic to `pkg/` before implementing the service entry point. Services should be thin wrappers around library capabilities.
54-
- **Environment Loading**: Always use `pkg/env` for standardized `.env` discovery. Do not use `godotenv` directly in services.
55-
- **Dependency Management**: Delegate driver registration (e.g., `lib/pq`) to `pkg/db` to avoid redundant blank imports in services.
56-
- **Failure Modes**: Never swallow errors. Use explicit wrapping: `fmt.Errorf(\"context: %w\", err)`.
57-
- **Observability**: Every service must emit JSON-formatted logs to `stdout` using `pkg/logger`.
58-
- **Telemetry**: All instrumentation must be handled through the centralized `pkg/telemetry` library.
59-
- **Testing**: Table-driven tests are the standard. Run `make go-cov` to verify coverage. Maintain a minimum of 80% coverage for `pkg/` libraries.
60-
61-
### 🎨 HTML/CSS (Frontend)
62-
63-
- **Zero Frameworks**: Use native HTML5 and CSS3 only.
64-
- **Styling**: Leverage CSS variables in `:root` for dark-theme consistency.
48+
- **Thin Main**: Entry points in `cmd/` must be minimal. Move all core domain logic to `internal/`.
49+
- **Internal-First**: Shared libraries reside in `internal/` to prevent external logic leakage.
50+
- **Environment Loading**: Always use `internal/env` for standardized `.env` discovery.
51+
- **Observability**: Every service must emit structured JSON logs using `internal/telemetry`.
52+
- **Telemetry**: All instrumentation must be handled through the centralized `internal/telemetry` library.
53+
- **Testing**: Table-driven tests are the standard. Run `make go-cov` to verify coverage.
6554

6655
### 📝 Institutional Memory (Documentation)
6756

@@ -71,15 +60,6 @@ The project uses a unified automation layer. **Always prefer `make` commands** a
7160

7261
## 4. Operational Excellence & Safety
7362

74-
- **Secrets**: NEVER commit secrets. Use `.env` for local dev and OpenBao for production secrets.
63+
- **Secrets**: NEVER commit secrets. Use `.env` for local dev and OpenBao for production.
7564
- **GitOps**: Host-tier changes are applied via `gitops_sync.sh` (triggered by Proxy webhooks).
76-
- **Observability**: Any new service must be integrated into the telemetry pipeline (Logs to Loki, Metrics to Postgres/Prometheus).
7765
- **Security**: All Kubernetes manifests must pass `kube-lint`. All Go code must pass `go-vuln-scan`.
78-
79-
## 5. Failure Mode Analysis (FMA)
80-
81-
Before proposing a change, agents should ask:
82-
83-
1. "Does this create a circular dependency between the host and the cluster?"
84-
2. "How will this be debugged in production if the network is down?"
85-
3. "Is this change recorded in an ADR to preserve the 'Why'?"

README.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@ Built using Go and orchestrated on Kubernetes (K3s), the platform unifies system
1212

1313
This project highlights significant accomplishments in building a modern observability and platform engineering solution:
1414

15-
* **Full OpenTelemetry (LMT) Implementation:** Achieved end-to-end observability with a unified OTel Collector, Tempo (Traces), Prometheus (Metrics), Loki (Logs), and Go SDK for instrumentation. Includes Service Graphs, synthetic transaction monitoring, and comprehensive host-level telemetry.
15+
* **Unified Go Monorepo:** Consolidated fragmented modules into a single root module, eliminating 17 `replace` directives and standardizing dependency management across all services.
16+
* **Encapsulated Architecture:** Transitioned to an `internal/` and `cmd/` layout, enforcing Go's package visibility rules and adopting the "Thin Main" pattern for better testability and system integrity.
17+
* **Full OpenTelemetry (LMT) Implementation:** Achieved end-to-end observability with a unified OTel Collector, Tempo (Traces), Prometheus (Metrics), Loki (Logs), and Go SDK for instrumentation.
1618
* **GitOps Reconciliation Engine:** Implemented a secure, templated GitOps reconciliation engine for automated state enforcement via webhooks, scaled to support multi-tenant synchronization.
1719
* **Kubernetes Migration & Cloud-Native Operations:** All core observability stack components (Loki, Grafana, Tempo, Prometheus, Postgres) are running natively in Kubernetes with persistent storage.
18-
* **Library-First Architecture:** Structural transition into `pkg/` and `services/` layout, decoupling core business logic into transport-agnostic modules for improved reusability and testability.
19-
* **Centralized Secrets Management:** Transitioned to OpenBao for secure secrets storage and retrieval, replacing insecure static configurations.
20+
* **Centralized Secrets Management:** Integrated OpenBao for secure, dynamic credential retrieval across all services, replacing insecure static configurations.
2021
* **Hybrid Cloud Architecture (Store-and-Forward Bridge):** Designed and implemented a secure bridge for ingesting external telemetry without exposing local ports, ensuring reliable data flow from diverse sources.
2122
* **Reproducible Local Development:** Ensures consistent and reproducible developer environments via `shell.nix` and `docker-compose`.
2223
* **Formalized Decision-Making & Incident Response:** Established an Architectural Decision Record (ADR) process and an Incident Response/RCA framework for structured decision-making and operational excellence.
@@ -72,7 +73,7 @@ flowchart TB
7273
Tailscale[Tailscale]
7374
end
7475
75-
GoApps["Go Services (Proxy, Reading Sync, Second Brain)"]
76+
GoApps["Go Services (Proxy, Ingestion)"]
7677
Collectors["Collectors (Host Metrics & Tailscale)"]
7778
end
7879
@@ -128,7 +129,7 @@ This guide will help you set up and run the `observability-hub` locally using **
128129

129130
Ensure you have the following installed on your system:
130131

131-
* [Go](https://go.dev/doc/install) (version 1.25 or newer)
132+
* [Go](https://go.dev/doc/install)
132133
* [K3s](https://k3s.io/) (Lightweight Kubernetes)
133134
* [Helm](https://helm.sh/)
134135
* `make` (GNU Make)
@@ -174,13 +175,10 @@ Build and initialize the automation and telemetry collectors on the host:
174175
```bash
175176
# Build Go binaries
176177
make proxy-build
177-
make reading-build
178+
make ingestion-build
178179

179180
# Install and start Systemd services (requires sudo)
180181
make install-services
181-
182-
# Run Second Brain sync manually
183-
make brain-sync
184182
```
185183

186184
### 3. Verification
@@ -189,7 +187,6 @@ Once the stack is running, you can verify the end-to-end telemetry flow:
189187

190188
* **Cluster Health:** Access Grafana at `http://localhost:30000` (NodePort).
191189
* **Service Logs:** Check logs for host components via Grafana Loki.
192-
* **Knowledge Sync:** Manually trigger a Second Brain ingestion with `make brain-sync`.
193190

194191
### 4. Managing the Cluster
195192

docs/architecture/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Deep dives into the logic and implementation of specific system components.
2626

2727
- **[Collectors Service](./services/collectors.md)**: Host-level telemetry collector for metrics and Tailscale status.
2828
- **[Proxy Service](./services/proxy.md)**: The API Gateway and GitOps listener.
29-
- **[Reading Sync](./services/reading-sync.md)**: The automated MongoDB to Postgres ETL pipeline.
30-
- **[Second Brain](./services/second-brain.md)**: Knowledge ingestion from GitHub into PostgreSQL.
29+
- **[Ingestion Service](./services/ingestion.md)**: Unified data orchestration engine (Reading Analytics + Second Brain).
3130
- **[Tailscale Gate](./services/tailscale-gate.md)**: Logic for the automated funnel gatekeeper.
31+
32+

docs/architecture/core-concepts/automation.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,14 @@ The system consists of several main service families, each with a `.service` uni
1616
| :--- | :--- | :--- | :--- |
1717
| **`tailscale-gate`** | `simple` | Continuous | **Security**: Monitors Proxy health and toggles Tailscale Funnel access. |
1818
| **`proxy`** | `simple` | Continuous | **API Gateway**: Core listener for data pipelines and GitOps webhooks. |
19-
| **`gitops-sync`** | `oneshot` | **Webhook** | **Reconciliation**: Triggered by Proxy to pull latest code and apply changes. |
20-
| **`reading-sync`** | `oneshot` | Twice Daily (00:00, 12:00) | **Data Pipeline Trigger**: Calls Proxy API to sync MongoDB data to Postgres. |
19+
| **`ingestion`** | `oneshot` | Daily (00:00) | **Data Ingestion**: Unified engine for Reading Analytics and Second Brain sync. |
2120

2221
## Operational Excellence
2322

2423
Our systemd configurations employ several production-grade patterns:
2524

2625
- **Security Gating**: The `tailscale-gate` service implements a loop that ensures the public entry point (Funnel) is automatically closed if the underlying `proxy` service stops, preventing "dead" endpoints from being exposed.
27-
- **Persistence (`Persistent=true`)**: Used in `reading-sync`. If the host is powered off during the scheduled time, systemd will trigger the service immediately upon the next boot.
26+
- **Persistence (`Persistent=true`)**: Used in `ingestion`. If the host is powered off during the scheduled time, systemd will trigger the service immediately upon the next boot.
2827
- **Unified Observability**: All units emit logs, metrics, and traces, which are captured, enriched, and forwarded by the host-level OpenTelemetry Collector.
2928

3029
## Architectural Patterns

docs/architecture/core-concepts/observability.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The platform aggregates infrastructure metrics through Prometheus scraping, appl
7575
Distributed tracing is powered by OpenTelemetry for correlation and performance profiling across high-throughput pipelines.
7676

7777
- **Collection Pipeline**:
78-
- **Instrumentation**: Services use the **OpenTelemetry SDK** to generate spans in OTLP format. We follow a **Pure Wrapper** philosophy where shared libraries (`pkg/db`) provide standardized infrastructure spans (e.g., `db.postgres.record_metric`), while services own the root spans (`job.*` or `handler.*`).
78+
- **Instrumentation**: Services use the **OpenTelemetry SDK** to generate spans in OTLP format. We follow a **Pure Wrapper** philosophy where shared libraries (`internal/db`) provide standardized infrastructure spans (e.g., `db.postgres.record_metric`), while services own the root spans (`job.*` or `handler.*`).
7979
- **Ingestion**: Spans are sent to the **OpenTelemetry Collector** via gRPC (NodePort `30317`) or HTTP (NodePort `30318`), which batches and exports them to **Grafana Tempo**.
8080
- **Processing**: Tempo analyzes raw spans to generate derived **Service Graphs** and **Span Metrics**, which are pushed to Prometheus via `remote_write` for operational correlation.
8181
- **Persistence**:

docs/architecture/infrastructure/deployment.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,14 @@ Managed via **OpenTofu (IaC)** in `tofu/`.
2525
| Component | Role | Details |
2626
| :--- | :--- | :--- |
2727
| **Proxy Service** | API Gateway | Handles webhooks, GitOps triggers, and Data Pipelines. |
28-
| **Metrics Collector** | Telemetry Agent | Collects host hardware statistics (CPU, RAM, Disk). |
29-
| **Second Brain** | Knowledge Ingest | Ingests atomic journal entries from GitHub Issues. |
28+
| **Ingestion Service** | Data Orchestrator | Unified engine for syncing Reading Analytics and Second Brain knowledge. |
3029

3130
### 🛠️ Automation & Security (Native Script)
3231

3332
| Component | Role | Details |
3433
| :--- | :--- | :--- |
3534
| **OpenBao** | Secret Store | Centralized, encrypted management for sensitive config. |
3635
| **Tailscale Gate** | Security Agent | Manages public funnel access based on service health. |
37-
| **Reading Sync** | Data Pipeline | Timer-triggered task to sync cloud data to local storage. |
3836

3937
## Data Flow: Unified Observability
4038

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Ingestion Service Architecture
2+
3+
The Ingestion Service (`cmd/ingestion/`) is a unified data orchestration engine responsible for synchronizing external data sources into the platform's local analytical store (PostgreSQL). It operates as a periodic task runner managed by a Systemd timer.
4+
5+
## Component Details
6+
7+
### Task Overview
8+
9+
The service follows a **Task-Oriented Design**, where specific data synchronization logics are encapsulated into independent, testable tasks managed by a centralized engine.
10+
11+
| Task | Source | Purpose |
12+
| :--- | :--- | :--- |
13+
| `reading` | MongoDB Atlas | **Reading Analytics**: Syncs article metadata and engagement metrics from cloud to local store. |
14+
| `brain` | GitHub Issues | **Second Brain**: Ingests journal entries, performs atomization, and calculates token counts. |
15+
16+
### Logic Details
17+
18+
#### Reading Analytics Task (`reading`)
19+
20+
Synchronizes Cloud-based MongoDB data with the local PostgreSQL environment.
21+
22+
1. **Fetch**: Retrieves unprocessed documents from MongoDB Atlas in configurable batches.
23+
2. **Transform**: Maps MongoDB BSON/JSON metadata to the structured PostgreSQL `reading_analytics` schema.
24+
3. **Persist**: Executes UPSERT operations in PostgreSQL to ensure data consistency.
25+
4. **Acknowledge**: Marks documents as "processed" in MongoDB to prevent duplicate ingestion.
26+
27+
#### Second Brain Task (`brain`)
28+
29+
Transforms GitHub-based journaling entries into atomic, searchable database records.
30+
31+
1. **Check**: Queries PostgreSQL for the most recent entry date to determine the sync delta.
32+
2. **Ingest**: Fetches new journal entries from the configured GitHub repository via the GitHub API.
33+
3. **Atomize**: Decomposes long-form markdown logs into granular "thought atoms," including metadata like tags and categories.
34+
4. **Quantify**: Calculates token counts for each atom to support future LLM-based analytical workloads.
35+
36+
## Distributed Tracing
37+
38+
The Ingestion Service is instrumented with the **OpenTelemetry SDK** to provide visibility into the data pipeline performance and task status.
39+
40+
### Configuration
41+
42+
The service initializes a global TracerProvider during startup, controlled by environment variables:
43+
44+
| Variable | Description |
45+
| :--- | :--- |
46+
| `OTEL_EXPORTER_OTLP_ENDPOINT` | The gRPC endpoint of OpenTelemetry (e.g., `localhost:30317`). |
47+
| `OTEL_SERVICE_NAME` | The service identifier used in traces (defaults to `ingestion`). |
48+
49+
### Trace Coverage
50+
51+
Spans are created for the entire job lifecycle:
52+
53+
- **Job Lifecycle**: Root span `job.ingestion` tracks the overall synchronization run.
54+
- **Task Execution**: Child spans (`task.reading`, `task.brain`) provide granular visibility into individual task performance.
55+
- **API/DB Operations**: Sub-spans for GitHub API requests, MongoDB fetches, and PostgreSQL transactions.
56+
57+
Traces are exported to the central **OpenTelemetry Collector** via gRPC and stored in **Grafana Tempo**.
58+
59+
### Instrumentation Strategy
60+
61+
1. **Task Engine Wrapper**: The service uses a centralized `RunTask` engine that automatically wraps every registered task in a named OpenTelemetry span, capturing task-specific attributes and success/failure status.
62+
2. **Manual Spans**: High-latency operations (external API calls and complex database syncs) are manually instrumented to provide precise timing and error context for pipeline optimization.

docs/architecture/services/proxy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Proxy Service Architecture
22

3-
The Proxy Service (`services/proxy/`) is a custom Go application that acts as the API gateway, Data Pipeline engine, and **GitOps automation trigger** for the platform. It runs as a native host process managed by Systemd.
3+
The Proxy Service (`cmd/proxy/`) is a custom Go application that acts as the API gateway, Data Pipeline engine, and **GitOps automation trigger** for the platform. It runs as a native host process managed by Systemd.
44

55
## Component Details
66

0 commit comments

Comments
 (0)