Skip to content

Commit 309828d

Browse files
committed
Adding Tempo for distributed tracing (linera-io#4556)
Distributed tracing is a great way to debug different types of issues, including for example latency issues. So this is something we definitely want in general, and probably want by default in production as well. Implement Distributed Tracing using Grafana Tempo. As it is a Grafana product, it integrates well with it, which is great for us. The visualizations also seem to be decent. Deployed a network with this code and the `linera-infra` portion of this, and everything works as expected, and I can see the latency breakdowns (I got a really high latency outlier example): ![Screenshot 2025-09-16 at 13.48.59.png](https://app.graphite.dev/user-attachments/assets/98f49272-d04a-4b7e-aa83-c04f90ec7347.png) I also chose this because it shows we might be waiting in the chain worker channel's queue for a while here 🤔 which might be worth investigating, which I'll do next. - Nothing to do / These changes follow the usual release cycle.
1 parent bac34d9 commit 309828d

File tree

8 files changed

+311
-24
lines changed

8 files changed

+311
-24
lines changed

Cargo.lock

Lines changed: 150 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,14 @@ num-format = "0.4.4"
157157
num-traits = "0.2.18"
158158
octocrab = "0.42.1"
159159
oneshot = "0.1.6"
160+
opentelemetry = { version = "0.30.0", features = ["trace"] }
161+
opentelemetry-http = "0.30.0"
162+
opentelemetry-otlp = { version = "0.30.0", features = [
163+
"grpc-tonic",
164+
"trace",
165+
"tls-roots",
166+
] }
167+
opentelemetry_sdk = { version = "0.30.0", features = ["trace", "rt-tokio"] }
160168
papaya = "0.1.5"
161169
pathdiff = "0.2.1"
162170
port-selector = "0.1.6"
@@ -250,6 +258,7 @@ tonic-web-wasm-client = "0.6.0"
250258
tower = "0.4.13"
251259
tower-http = "0.6.6"
252260
tracing = { version = "0.1.40", features = ["release_max_level_debug"] }
261+
tracing-opentelemetry = "0.31.0"
253262
tracing-subscriber = { version = "0.3.18", default-features = false, features = [
254263
"env-filter",
255264
] }

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ ARG binaries=
2525
ARG copy=${binaries:+_copy}
2626
ARG build_flag=--release
2727
ARG build_folder=release
28-
ARG build_features=scylladb,metrics,memory-profiling
28+
ARG build_features=scylladb,metrics,memory-profiling,tempo
2929
ARG rustflags="-C force-frame-pointers=yes"
3030

3131
FROM rust:1.74-slim-bookworm AS builder

kubernetes/linera-validator/helmfile.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,3 +154,10 @@ releases:
154154
set:
155155
- name: crds.enabled
156156
value: "true"
157+
- name: tempo
158+
version: 1.23.3
159+
namespace: tempo
160+
chart: grafana/tempo
161+
timeout: 900
162+
values:
163+
- {{ env "LINERA_HELMFILE_VALUES_LINERA_CORE" | default "values-local.yaml.gotmpl" }}

linera-base/Cargo.toml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ workspace = true
1818
metrics = ["prometheus"]
1919
reqwest = ["dep:reqwest"]
2020
revm = []
21+
tempo = [
22+
"opentelemetry",
23+
"opentelemetry-otlp",
24+
"opentelemetry_sdk",
25+
"tracing-opentelemetry",
26+
]
2127
test = ["test-strategy", "proptest"]
2228
web = [
2329
"getrandom/js",
@@ -80,6 +86,10 @@ tracing-web = { optional = true, workspace = true }
8086

8187
[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
8288
chrono.workspace = true
89+
opentelemetry = { workspace = true, optional = true }
90+
opentelemetry-otlp = { workspace = true, optional = true }
91+
opentelemetry_sdk = { workspace = true, optional = true }
92+
tracing-opentelemetry = { workspace = true, optional = true }
8393
rand = { workspace = true, features = ["getrandom", "std", "std_rng"] }
8494
tokio = { workspace = true, features = [
8595
"process",

0 commit comments

Comments
 (0)