Skip to content

Commit dca31c6

Browse files
committed
feat: Update TFO-Agent capabilities features prometheus, node-exporter, kube-metrics, eBPF
1 parent a021a86 commit dca31c6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+9905
-90
lines changed

.golangci.yml

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,34 @@
1-
# GolangCI-Lint Configuration (v2)
1+
# GolangCI-Lint Configuration
22
# TelemetryFlow Agent - Community Enterprise Observability Platform (CEOP)
33

4-
version: "2"
5-
64
run:
75
timeout: 5m
86
tests: true
97

108
linters:
11-
default: none
9+
disable-all: true
1210
enable:
1311
- staticcheck
1412
- govet
1513
- errcheck
1614
- ineffassign
1715
- unused
18-
exclusions:
19-
paths:
20-
- vendor
2116

22-
settings:
23-
staticcheck:
24-
checks:
25-
- "all"
26-
- "-SA1019"
17+
issues:
18+
exclude-dirs:
19+
- vendor
20+
exclude-files:
21+
- internal/collector/ebpf/types.go
22+
- internal/collector/ebpf/helpers.go
23+
- internal/collector/ebpf/hubble.go
24+
exclude-rules:
25+
- path: internal/collector/ebpf/config.go
26+
linters:
27+
- unused
28+
text: "shouldIncludeProcess"
29+
30+
linters-settings:
31+
staticcheck:
32+
checks:
33+
- "all"
34+
- "-SA1019"

CHANGELOG.md

Lines changed: 47 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<h3>TelemetryFlow Agent (OTEL Agent)</h3>
99

10-
[![Version](https://img.shields.io/badge/Version-1.1.3-orange.svg)](CHANGELOG.md)
10+
[![Version](https://img.shields.io/badge/Version-1.1.4-orange.svg)](CHANGELOG.md)
1111
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
1212
[![Go Version](https://img.shields.io/badge/Go-1.24+-00ADD8?logo=go)](https://golang.org/)
1313
[![OTEL SDK](https://img.shields.io/badge/OpenTelemetry_SDK-1.39.0-blueviolet)](https://opentelemetry.io/)
@@ -24,6 +24,43 @@ All notable changes to TelemetryFlow Agent will be documented in this file.
2424
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.1/),
2525
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
2626

27+
## [1.1.4] - 2026-02-11
28+
29+
### Added
30+
31+
- **eBPF Collector**: Full kernel-level metrics collector using `cilium/ebpf` library
32+
- 6 BPF C programs: syscalls, network, file I/O, scheduler, memory, TCP state transitions
33+
- 7 sub-collectors individually togglable via config flags
34+
- 28 metrics across syscall tracing, TCP/UDP monitoring, VFS I/O, scheduler analysis, memory page faults, and TCP state lifecycle
35+
- Process filtering with regex include/exclude patterns
36+
- Platform-safe: returns empty metrics on non-Linux (build-tagged stubs)
37+
- `bpf2go` code generation directives for CI compilation
38+
- Syscall name mapping (60+ Linux amd64 syscalls) and TCP state name mapping (12 states)
39+
- **Cilium Hubble Integration**: gRPC client for Cilium Hubble Relay
40+
- L3/L4 network flow metrics
41+
- L7 protocol visibility (HTTP, DNS)
42+
- Network policy verdict and drop metrics
43+
- Mutual TLS support for production Cilium clusters
44+
- 6 Hubble metrics: flows, drops, policy_verdicts, http_requests, dns_queries, l7_errors
45+
- **eBPF Configuration**: Full config support under `collectors.ebpf`
46+
- YAML config with sub-collector toggles, process filters, buffer sizes, BTF/pin paths
47+
- Environment variables: `TELEMETRYFLOW_EBPF_ENABLED`, `TELEMETRYFLOW_EBPF_BTF_PATH`, `TELEMETRYFLOW_EBPF_PIN_PATH`
48+
- Cilium sub-config with Hubble address, TLS, and collection toggles
49+
- **eBPF Unit Tests**: 34 tests covering collector lifecycle, config validation, metric structures
50+
- `tests/unit/domain/ebpf/` with 4 test files
51+
- Config validation (sample_rate, buffer sizes, process filters, Cilium config)
52+
- Metric structure verification for all 28 metric types
53+
- Platform-aware tests (Linux vs non-Linux)
54+
- **eBPF Documentation**: 6 documents in `docs/integrations/eBPF/`
55+
- Architecture with mermaid diagrams showing kernel/userspace data flow
56+
- Full YAML configuration reference with tuning guide
57+
- Complete metric catalog with PromQL examples
58+
- BPF C program design: map strategy, tracepoint details, CO-RE support
59+
- Cilium Hubble integration guide
60+
- Operations guide: requirements, deployment, troubleshooting, security
61+
- **Makefile Targets**: Added `test-ebpf`, `generate-ebpf`, `build-ebpf` targets
62+
- **New Dependency**: `github.com/cilium/ebpf v0.20.0` for BPF program loading and map interaction
63+
2764
## [1.1.3] - 2026-02-04
2865

2966
### Added
@@ -277,14 +314,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
277314

278315
## Version History
279316

280-
| Version | Date | OTEL SDK | Description |
281-
| ------- | ---------- | -------- | ------------------------------------------------------------------------------------------------------------------ |
282-
| 1.1.3 | 2026-02-04 | v1.39.0 | Network retransmit metrics, container name/image detection, page faults, IOPS, system calls |
283-
| 1.1.2 | 2026-01-03 | v1.39.0 | OSS observability (SigNoz, Coroot, HyperDX, OpenObserve, Netdata), APM (Dynatrace, Instana, ManageEngine) |
284-
| 1.1.1 | 2024-12-29 | v1.39.0 | Enterprise integrations (GCP, Azure, Alibaba, Proxmox, VMware, Nutanix, Cisco, SNMP, MQTT, eBPF) |
285-
| 1.1.0 | 2024-12-27 | v1.39.0 | OTEL SDK standardization, aligned with TFO-Go-SDK & TFO-Collector |
286-
| 1.0.1 | 2024-12-17 | - | Docker workflow, SBOM, multi-platform support |
287-
| 1.0.0 | 2024-12-17 | - | Initial release |
317+
| Version | Date | OTEL SDK | Description |
318+
| ------- | ---------- | -------- | --------------------------------------------------------------------------------------------------------- |
319+
| 1.1.4 | 2026-02-11 | v1.39.0 | eBPF collector (28 metrics), Cilium Hubble integration, 6 BPF programs, kernel-level observability |
320+
| 1.1.3 | 2026-02-04 | v1.39.0 | Network retransmit metrics, container name/image detection, page faults, IOPS, system calls |
321+
| 1.1.2 | 2026-01-03 | v1.39.0 | OSS observability (SigNoz, Coroot, HyperDX, OpenObserve, Netdata), APM (Dynatrace, Instana, ManageEngine) |
322+
| 1.1.1 | 2024-12-29 | v1.39.0 | Enterprise integrations (GCP, Azure, Alibaba, Proxmox, VMware, Nutanix, Cisco, SNMP, MQTT, eBPF) |
323+
| 1.1.0 | 2024-12-27 | v1.39.0 | OTEL SDK standardization, aligned with TFO-Go-SDK & TFO-Collector |
324+
| 1.0.1 | 2024-12-17 | - | Docker workflow, SBOM, multi-platform support |
325+
| 1.0.0 | 2024-12-17 | - | Initial release |
288326

289327
## Upgrade Guide
290328

Makefile

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ NC := \033[0m
6969
.PHONY: all build build-all build-linux build-darwin build-windows clean \
7070
test test-unit test-integration test-e2e test-all test-coverage test-script test-short bench \
7171
test-run test-list test-verbose test-race \
72+
test-kubernetes test-prometheus test-nodeexporter test-ebpf \
73+
generate-ebpf build-ebpf \
7274
run run-debug dev dev-watch \
7375
deps deps-update deps-verify tidy verify \
7476
lint lint-fix fmt fmt-check vet staticcheck check \
@@ -78,6 +80,7 @@ NC := \033[0m
7880
security govulncheck coverage-merge coverage-report \
7981
test-unit-ci test-integration-ci test-e2e-ci \
8082
docker docker-build docker-push docker-run \
83+
deploy-k8s undeploy-k8s \
8184
release-check docs godoc \
8285
version help info integrations
8386

@@ -122,6 +125,11 @@ help:
122125
@echo " make test-verbose - Run tests with verbose output"
123126
@echo " make test-race - Run tests with race detection"
124127
@echo " make bench - Run benchmarks"
128+
@echo " make test-ebpf - Run eBPF collector tests"
129+
@echo ""
130+
@echo "$(YELLOW)eBPF:$(NC)"
131+
@echo " make generate-ebpf - Generate eBPF bytecode (requires clang)"
132+
@echo " make build-ebpf - Build Linux binary with eBPF support"
125133
@echo ""
126134
@echo "$(YELLOW)Code Quality:$(NC)"
127135
@echo " make lint - Run linters"
@@ -613,6 +621,64 @@ release-check:
613621
# Documentation Targets
614622
# =============================================================================
615623

624+
# =============================================================================
625+
# Kubernetes & Prometheus Targets
626+
# =============================================================================
627+
628+
## Run Kubernetes collector tests
629+
test-kubernetes:
630+
@echo "$(GREEN)Running Kubernetes collector tests...$(NC)"
631+
@$(GOTEST) -v -timeout 5m -coverprofile=coverage-kubernetes.out ./tests/unit/domain/kubernetes/...
632+
633+
## Run Prometheus exporter tests
634+
test-prometheus:
635+
@echo "$(GREEN)Running Prometheus exporter tests...$(NC)"
636+
@$(GOTEST) -v -timeout 5m -coverprofile=coverage-prometheus.out ./tests/unit/infrastructure/exporter/...
637+
638+
## Run Node Exporter collector tests
639+
test-nodeexporter:
640+
@echo "$(GREEN)Running Node Exporter collector tests...$(NC)"
641+
@$(GOTEST) -v -timeout 5m -coverprofile=coverage-nodeexporter.out ./tests/unit/domain/nodeexporter/...
642+
643+
## Run eBPF collector tests
644+
test-ebpf:
645+
@echo "$(GREEN)Running eBPF collector tests...$(NC)"
646+
@$(GOTEST) -v -timeout 5m -coverprofile=coverage-ebpf.out ./tests/unit/domain/ebpf/...
647+
648+
## Generate eBPF bytecode from BPF C sources (requires clang + Linux headers)
649+
generate-ebpf:
650+
@echo "$(GREEN)Generating eBPF bytecode via bpf2go...$(NC)"
651+
@which clang > /dev/null || (echo "$(RED)clang not found. Install with: apt install clang llvm$(NC)" && exit 1)
652+
@$(GOCMD) generate ./internal/collector/ebpf/...
653+
@echo "$(GREEN)eBPF generation complete$(NC)"
654+
655+
## Build for Linux with eBPF support
656+
build-ebpf:
657+
@echo "$(GREEN)Building $(BINARY_NAME) for Linux (eBPF-enabled)...$(NC)"
658+
@mkdir -p $(BUILD_DIR)
659+
@GOOS=linux GOARCH=amd64 $(GOBUILD) -ldflags "$(LDFLAGS)" -o $(BUILD_DIR)/$(BINARY_NAME)-linux-amd64-ebpf ./cmd/tfo-agent
660+
@echo "$(GREEN)eBPF-enabled Linux build complete$(NC)"
661+
662+
## Deploy TFO-Agent to Kubernetes
663+
deploy-k8s:
664+
@echo "$(GREEN)Deploying TFO-Agent to Kubernetes...$(NC)"
665+
@kubectl apply -f deploy/kubernetes/rbac.yaml
666+
@kubectl apply -f deploy/kubernetes/configmap.yaml
667+
@kubectl apply -f deploy/kubernetes/daemonset.yaml
668+
@echo "$(GREEN)TFO-Agent deployed. Check status with: kubectl -n telemetryflow get pods$(NC)"
669+
670+
## Remove TFO-Agent from Kubernetes
671+
undeploy-k8s:
672+
@echo "$(YELLOW)Removing TFO-Agent from Kubernetes...$(NC)"
673+
@kubectl delete -f deploy/kubernetes/daemonset.yaml --ignore-not-found
674+
@kubectl delete -f deploy/kubernetes/configmap.yaml --ignore-not-found
675+
@kubectl delete -f deploy/kubernetes/rbac.yaml --ignore-not-found
676+
@echo "$(GREEN)TFO-Agent removed$(NC)"
677+
678+
# =============================================================================
679+
# Documentation Targets
680+
# =============================================================================
681+
616682
## Show documentation locations
617683
docs:
618684
@echo "$(GREEN)Documentation locations:$(NC)"

README.md

Lines changed: 48 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<h3>TelemetryFlow Agent (OTEL Agent)</h3>
99

10-
[![Version](https://img.shields.io/badge/Version-1.1.3-orange.svg)](CHANGELOG.md)
10+
[![Version](https://img.shields.io/badge/Version-1.1.4-orange.svg)](CHANGELOG.md)
1111
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
1212
[![Go Version](https://img.shields.io/badge/Go-1.24+-00ADD8?logo=go)](https://golang.org/)
1313
[![OTEL SDK](https://img.shields.io/badge/OpenTelemetry_SDK-1.39.0-blueviolet)](https://opentelemetry.io/)
@@ -32,7 +32,7 @@ TFO-Agent is fully aligned with the TelemetryFlow ecosystem, sharing the same Op
3232

3333
```mermaid
3434
graph LR
35-
subgraph "TelemetryFlow Ecosystem v1.1.3"
35+
subgraph "TelemetryFlow Ecosystem v1.1.4"
3636
subgraph "Instrumentation"
3737
SDK[TFO-Go-SDK<br/>OTEL SDK v1.39.0]
3838
end
@@ -59,7 +59,7 @@ graph LR
5959

6060
| Component | Version | OTEL Base | Description |
6161
| ----------------- | ------- | ------------------ | --------------------------- |
62-
| **TFO-Agent** | v1.1.3 | SDK v1.39.0 | Telemetry collection agent |
62+
| **TFO-Agent** | v1.1.4 | SDK v1.39.0 | Telemetry collection agent |
6363
| **TFO-Go-SDK** | v1.1.3 | SDK v1.39.0 | Go instrumentation SDK |
6464
| **TFO-Collector** | v1.1.3 | Collector v0.142.0 | Central telemetry collector |
6565

@@ -97,6 +97,8 @@ graph LR
9797

9898
## Quick Start
9999

100+
> **🚀 New to TFO-Agent?** Check the [Quick Start Guide](docs/QUICK-START.md) for step-by-step setup with Docker, Kubernetes, or binary installation.
101+
100102
### From Source
101103

102104
```bash
@@ -137,11 +139,11 @@ docker-compose down
137139
```bash
138140
# Build image
139141
docker build \
140-
--build-arg VERSION=1.1.3 \
142+
--build-arg VERSION=1.1.4 \
141143
--build-arg GIT_COMMIT=$(git rev-parse --short HEAD) \
142144
--build-arg GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD) \
143145
--build-arg BUILD_TIME=$(date -u '+%Y-%m-%dT%H:%M:%SZ') \
144-
-t telemetryflow/telemetryflow-agent:1.1.3 .
146+
-t telemetryflow/telemetryflow-agent:1.1.4 .
145147

146148
# Run container
147149
docker run -d --name tfo-agent \
@@ -151,7 +153,7 @@ docker run -d --name tfo-agent \
151153
-p 13133:13133 \
152154
-v /path/to/config.yaml:/etc/tfo-agent/tfo-agent.yaml:ro \
153155
-v /var/lib/tfo-agent:/var/lib/tfo-agent \
154-
telemetryflow/telemetryflow-agent:1.1.3
156+
telemetryflow/telemetryflow-agent:1.1.4
155157
```
156158

157159
### OTEL Collector Ports
@@ -190,10 +192,13 @@ POST http://localhost:4318/v1/logs
190192

191193
## Configuration
192194

195+
> **📋 Complete Configuration:** See [`configs/tfo-agent.default.yaml`](configs/tfo-agent.default.yaml) for a full configuration example showing Node Exporter, Kubernetes, and eBPF collectors integrated with TFO Platform.
196+
> **🔗 Integration Guide:** See [TFO Platform Integration Guide](docs/TFO-PLATFORM-INTEGRATION.md) for architecture diagrams, data flow, and production deployment examples.
197+
193198
Create configuration file at `/etc/tfo-agent/tfo-agent.yaml`:
194199

195200
```yaml
196-
# TelemetryFlow Platform Configuration (v1.1.3+)
201+
# TelemetryFlow Platform Configuration (v1.1.4+)
197202
telemetryflow:
198203
api_key_id: "${TELEMETRYFLOW_API_KEY_ID}"
199204
api_key_secret: "${TELEMETRYFLOW_API_KEY_SECRET}"
@@ -282,7 +287,10 @@ tfo-agent/
282287
│ ├── agent/ # Core agent lifecycle
283288
│ ├── buffer/ # Disk-backed retry buffer
284289
│ ├── collector/ # Metric collectors
285-
│ │ └── system/ # System metrics collector
290+
│ │ ├── system/ # System metrics collector
291+
│ │ ├── kubernetes/ # Kubernetes metrics collector
292+
│ │ ├── nodeexporter/ # Node Exporter metrics collector
293+
│ │ └── ebpf/ # eBPF kernel-level metrics collector
286294
│ ├── config/ # Configuration management
287295
│ ├── exporter/ # OTLP data exporters
288296
│ └── version/ # Version and banner info
@@ -343,6 +351,20 @@ p.Start()
343351
| `system.network.bytes_sent` | counter | Total bytes sent |
344352
| `system.network.bytes_recv` | counter | Total bytes received |
345353

354+
### eBPF Metrics (Linux-only)
355+
356+
The eBPF collector provides 28 kernel-level metrics across 7 categories:
357+
358+
- **Syscall**: `ebpf.syscall.{count,latency_ns,errors}` with `pid`, `comm`, `syscall` labels
359+
- **Network**: `ebpf.tcp.{connections,bytes_sent,bytes_recv,rtt_ns,retransmits}`, `ebpf.udp.{packets_sent,packets_recv}`
360+
- **File I/O**: `ebpf.fileio.{operations,bytes,latency_ns}` with `operation` label
361+
- **Scheduler**: `ebpf.sched.{context_switches,runq_latency_ns,oncpu_ns,migrations}`
362+
- **Memory**: `ebpf.memory.{page_faults,major_faults,minor_faults}`
363+
- **TCP State**: `ebpf.tcp.state_transitions` with `old_state`, `new_state` labels
364+
- **Hubble**: `hubble.{flows,drops,policy_verdicts,http_requests,dns_queries,l7_errors}`
365+
366+
See [eBPF Metrics Documentation](docs/integrations/eBPF/METRICS.md) for complete catalog.
367+
346368
## Development
347369

348370
### Prerequisites
@@ -419,6 +441,9 @@ make ci-test # Run with race detection (CI mode)
419441
| ------------------------- | --------------------------- | ---------- |
420442
| `application` | CLI commands, configuration | 3 |
421443
| `domain/agent` | Agent lifecycle management | 2 |
444+
| `domain/ebpf` | eBPF collector | 4 |
445+
| `domain/kubernetes` | Kubernetes collector | 1 |
446+
| `domain/nodeexporter` | Node Exporter collector | 1 |
422447
| `domain/plugin` | Plugin registry | 1 |
423448
| `domain/telemetry` | Telemetry collection | 2 |
424449
| `infrastructure/api` | API client | 1 |
@@ -583,18 +608,21 @@ See [Integration Documentation](docs/integrations/README.md) for detailed config
583608

584609
## Documentation
585610

586-
| Document | Description |
587-
| -------------------------------------------- | ----------------------------------------- |
588-
| [README](docs/README.md) | Documentation overview |
589-
| [ARCHITECTURE](docs/ARCHITECTURE.md) | System architecture with Mermaid diagrams |
590-
| [INSTALLATION](docs/INSTALLATION.md) | Installation guide for all platforms |
591-
| [CONFIGURATION](docs/CONFIGURATION.md) | Configuration options and examples |
592-
| [COMMANDS](docs/COMMANDS.md) | CLI commands reference |
593-
| [DEVELOPMENT](docs/DEVELOPMENT.md) | Development guide and coding standards |
594-
| [TROUBLESHOOTING](docs/TROUBLESHOOTING.md) | Troubleshooting guide and common issues |
595-
| [GITHUB-WORKFLOWS](docs/GITHUB-WORKFLOWS.md) | CI/CD workflows documentation |
596-
| [INTEGRATIONS](docs/integrations/README.md) | 3rd party integration guides |
597-
| [CHANGELOG](CHANGELOG.md) | Version history and changes |
611+
| Document | Description |
612+
| --------------------------------------------------- | -------------------------------------------------------------- |
613+
| [README](docs/README.md) | Documentation overview |
614+
| [ARCHITECTURE](docs/ARCHITECTURE.md) | System architecture with Mermaid diagrams |
615+
| [INSTALLATION](docs/INSTALLATION.md) | Installation guide for all platforms |
616+
| [CONFIGURATION](docs/CONFIGURATION.md) | Configuration options and examples |
617+
| [COMMANDS](docs/COMMANDS.md) | CLI commands reference |
618+
| [DEVELOPMENT](docs/DEVELOPMENT.md) | Development guide and coding standards |
619+
| [TROUBLESHOOTING](docs/TROUBLESHOOTING.md) | Troubleshooting guide and common issues |
620+
| [GITHUB-WORKFLOWS](docs/GITHUB-WORKFLOWS.md) | CI/CD workflows documentation |
621+
| [INTEGRATIONS](docs/integrations/README.md) | 3rd party integration guides |
622+
| [eBPF](docs/integrations/eBPF/README.md) | eBPF kernel-level observability (28 metrics) |
623+
| [QUICK-START](docs/QUICK-START.md) | Quick start guide (Docker/K8s/Binary) |
624+
| [TFO-INTEGRATION](docs/TFO-PLATFORM-INTEGRATION.md) | TFO Platform integration guide (architecture, metrics catalog) |
625+
| [CHANGELOG](CHANGELOG.md) | Version history and changes |
598626

599627
## License
600628

0 commit comments

Comments
 (0)