Skip to content

Commit 49fa9e5

Browse files
committed
add control plane telemetry via csv
1 parent b6f8ae4 commit 49fa9e5

File tree

4 files changed

+362
-26
lines changed

4 files changed

+362
-26
lines changed

edgeless_telemetry/README.md

Lines changed: 54 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,65 @@
11
# Edgeless Telemetry
22

3-
This is an initial take on telemetry (to be extended / replaced as needed). Uses
4-
OpenMetrics through this library: https://github.com/prometheus/client_rust.
3+
Telemetry support for EDGELESS nodes and control plane components.
54

6-
Dockerfiles for running a prometheus instance and Grafana istance are provided
7-
with a basic dashboard for a one-node Edgeless s=ystem. Currently used by the
8-
edgeless_node to provide metrics such as:
9-
- function_count - number of function instances that are present
10-
- execution_times_count - number of times a function was executed
11-
etc.
12-
- execution_times_bucket -
5+
## Components
136

14-
This has been tested on a local setup with Docker for Mac and edgeless_inabox
15-
running locally.
7+
### Data Plane Telemetry (Nodes)
168

17-
## How to run Prometheus / Grafana as docker containers?
18-
Make sure you have Docker Compose installed. Navigate to `components` and run:
9+
Event-based metrics collection for function execution and resource usage. Both targets track the same events but export them differently:
1910

20-
```bash
21-
docker-compose up --build -d
11+
**PerformanceTarget** - Collects raw samples from function execution:
12+
- Stores execution times, transfer times, and log entries per function
13+
- Periodically sent from nodes to orchestrator via node registration refresh
14+
- Orchestrator writes to Redis and optionally to CSV file
15+
- Used for post-processing and analysis of function-level metrics
16+
17+
**PrometheusTarget** - Aggregates metrics for real-time monitoring using e.g. dashboards and alerts:
18+
- Exposes HTTP endpoint (`metrics_url`) for Prometheus scraping
19+
- Tracks node-level and function-level metrics in histograms
20+
- Includes: `function_count`, `execution_times`, `transfer_times`
21+
- Used for live dashboards (Grafana) and alerting
22+
23+
Both targets receive the same telemetry events from function runtimes:
24+
- Function lifecycle: instantiate, init, exit
25+
- Function execution: invocation completed times, transfer times
26+
- Function logs: correlated to function instances
27+
28+
Uses OpenMetrics through https://github.com/prometheus/client_rust.
29+
30+
### Control Plane Tracer
31+
32+
Lightweight span-based tracer for orchestrator and controller operations. Designed for control plane, not data plane.
33+
34+
Exports traces to CSV format for analyzing orchestrator decision-making and workflow lifecycle.
35+
36+
**API**:
37+
- `ControlPlaneTracer::new(output_path)` - create tracer (stdout or file)
38+
- `tracer.start_span(name)` - create root span with correlation ID
39+
- `span.child(name)` - create child span with parent reference
40+
- `span.log(level, message)` - log correlated to span
41+
- Automatic span end on drop (RAII)
42+
43+
**CSV Output Format**:
44+
```
45+
timestamp_sec,timestamp_ns,event_type,correlation_id,parent_id,name,level,message
2246
```
2347

24-
Then just open `localhost:3000` in your browser and open the `Edgeless default
25-
dashboard` on the left.
48+
**Why not OpenTelemetry?**
49+
50+
We implement a minimal subset of OpenTelemetry's tracing functionality tailored for orchestrator
51+
observability as we need a simple, low-overhead solution without external dependencies.
52+
53+
For production orchestrator observability with distributed tracing and existing infrastructure integration, use OpenTelemetry instead.
54+
55+
## Running Prometheus / Grafana
2656

27-
## How to add new metric types?
28-
TODO
57+
> This dashboard has not been updated for the latest telemetry metrics. Use it as a reference only.
2958
30-
## How to instrument your code?
31-
Instrumentation is currently added to edgeless_node, check it out to learn more.
59+
Navigate to `components` and run:
60+
61+
```bash
62+
docker-compose up --build -d
63+
```
3264

33-
## Next steps:
34-
- [x] Grafana dashboard for a cluster of one node
35-
- [ ] Add function_class as a label for metrics
36-
- [ ] expand Instructions on how to add metrics
37-
- [x] default anonymous user
65+
Open `localhost:3000` and access the Edgeless dashboard.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// Example usage of the control plane tracer for orchestrator metrics
2+
3+
use edgeless_telemetry::control_plane_tracer::ControlPlaneTracer;
4+
5+
fn main() {
6+
// create tracer that writes to stdout
7+
let tracer = ControlPlaneTracer::new(String::new()).unwrap();
8+
9+
// or write to a file
10+
// let tracer = ControlPlaneTracer::new("orchestrator_traces.csv".to_string()).unwrap();
11+
12+
// start a span for a workflow deployment
13+
let deployment_span = tracer.start_span("deploy_workflow");
14+
deployment_span.log("info", "starting workflow deployment");
15+
16+
{
17+
// child span for resource allocation
18+
let allocation_span = deployment_span.child("resource_allocation");
19+
allocation_span.log("debug", "selecting nodes");
20+
allocation_span.log("info", "allocated 3 nodes");
21+
}
22+
23+
{
24+
// child span for function instantiation
25+
let instantiation_span = deployment_span.child("function_instantiation");
26+
instantiation_span.log("debug", "creating function instances");
27+
instantiation_span.log("info", "instantiated 5 functions");
28+
}
29+
30+
deployment_span.log("info", "workflow deployment completed");
31+
32+
// global log without correlation
33+
tracer.log("warn", "system load high");
34+
}

0 commit comments

Comments
 (0)