|
1 | 1 | # Edgeless Telemetry |
2 | 2 |
|
3 | | -This is an initial take on telemetry (to be extended / replaced as needed). Uses |
4 | | -OpenMetrics through this library: https://github.com/prometheus/client_rust. |
| 3 | +Telemetry support for EDGELESS nodes and control plane components. |
5 | 4 |
|
6 | | -Dockerfiles for running a prometheus instance and Grafana istance are provided |
7 | | -with a basic dashboard for a one-node Edgeless s=ystem. Currently used by the |
8 | | -edgeless_node to provide metrics such as: |
9 | | -- function_count - number of function instances that are present |
10 | | -- execution_times_count - number of times a function was executed |
11 | | -etc. |
12 | | -- execution_times_bucket - |
| 5 | +## Components |
13 | 6 |
|
14 | | -This has been tested on a local setup with Docker for Mac and edgeless_inabox |
15 | | -running locally. |
| 7 | +### Data Plane Telemetry (Nodes) |
16 | 8 |
|
17 | | -## How to run Prometheus / Grafana as docker containers? |
18 | | -Make sure you have Docker Compose installed. Navigate to `components` and run: |
| 9 | +Event-based metrics collection for function execution and resource usage. Both targets track the same events but export them differently: |
19 | 10 |
|
20 | | -```bash |
21 | | -docker-compose up --build -d |
| 11 | +**PerformanceTarget** - Collects raw samples from function execution: |
| 12 | +- Stores execution times, transfer times, and log entries per function |
| 13 | +- Periodically sent from nodes to orchestrator via node registration refresh |
| 14 | +- Orchestrator writes to Redis and optionally to CSV file |
| 15 | +- Used for post-processing and analysis of function-level metrics |
| 16 | + |
| 17 | +**PrometheusTarget** - Aggregates metrics for real-time monitoring using e.g. dashboards and alerts: |
| 18 | +- Exposes HTTP endpoint (`metrics_url`) for Prometheus scraping |
| 19 | +- Tracks node-level and function-level metrics in histograms |
| 20 | +- Includes: `function_count`, `execution_times`, `transfer_times` |
| 21 | +- Used for live dashboards (Grafana) and alerting |
| 22 | + |
| 23 | +Both targets receive the same telemetry events from function runtimes: |
| 24 | +- Function lifecycle: instantiate, init, exit |
| 25 | +- Function execution: invocation completed times, transfer times |
| 26 | +- Function logs: correlated to function instances |
| 27 | + |
| 28 | +Uses OpenMetrics through https://github.com/prometheus/client_rust. |
| 29 | + |
| 30 | +### Control Plane Tracer |
| 31 | + |
| 32 | +Lightweight span-based tracer for orchestrator and controller operations. Designed for control plane, not data plane. |
| 33 | + |
| 34 | +Exports traces to CSV format for analyzing orchestrator decision-making and workflow lifecycle. |
| 35 | + |
| 36 | +**API**: |
| 37 | +- `ControlPlaneTracer::new(output_path)` - create tracer (stdout or file) |
| 38 | +- `tracer.start_span(name)` - create root span with correlation ID |
| 39 | +- `span.child(name)` - create child span with parent reference |
| 40 | +- `span.log(level, message)` - log correlated to span |
| 41 | +- Automatic span end on drop (RAII) |
| 42 | + |
| 43 | +**CSV Output Format**: |
| 44 | +``` |
| 45 | +timestamp_sec,timestamp_ns,event_type,correlation_id,parent_id,name,level,message |
22 | 46 | ``` |
23 | 47 |
|
24 | | -Then just open `localhost:3000` in your browser and open the `Edgeless default |
25 | | -dashboard` on the left. |
| 48 | +**Why not OpenTelemetry?** |
| 49 | + |
| 50 | +We implement a minimal subset of OpenTelemetry's tracing functionality tailored for orchestrator |
| 51 | +observability as we need a simple, low-overhead solution without external dependencies. |
| 52 | + |
| 53 | +For production orchestrator observability with distributed tracing and existing infrastructure integration, use OpenTelemetry instead. |
| 54 | + |
| 55 | +## Running Prometheus / Grafana |
26 | 56 |
|
27 | | -## How to add new metric types? |
28 | | -TODO |
| 57 | +> This dashboard has not been updated for the latest telemetry metrics. Use it as a reference only. |
29 | 58 |
|
30 | | -## How to instrument your code? |
31 | | -Instrumentation is currently added to edgeless_node, check it out to learn more. |
| 59 | +Navigate to `components` and run: |
| 60 | + |
| 61 | +```bash |
| 62 | +docker-compose up --build -d |
| 63 | +``` |
32 | 64 |
|
33 | | -## Next steps: |
34 | | -- [x] Grafana dashboard for a cluster of one node |
35 | | -- [ ] Add function_class as a label for metrics |
36 | | -- [ ] expand Instructions on how to add metrics |
37 | | -- [x] default anonymous user |
| 65 | +Open `localhost:3000` and access the Edgeless dashboard. |
0 commit comments