Skip to content

Conversation

@ndr-ds
Copy link
Contributor

@ndr-ds ndr-ds commented Sep 12, 2025

Motivation

Distributed tracing is a great way to debug different types of issues, including for example latency issues. So this is something we definitely want in general, and probably want by default in production as well.

Proposal

Implement Distributed Tracing using Grafana Tempo. As it is a Grafana product, it integrates well with it, which is great for us. The visualizations also seem to be decent.

Test Plan

Deployed a network with this code and the linera-infra portion of this, and everything works as expected, and I can see the latency breakdowns (I got a really high latency outlier example):

Screenshot 2025-09-16 at 13.48.59.png

I also chose this because it shows we might be waiting in the chain worker channel's queue for a while here 🤔 which might be worth investigating, which I'll do next.

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

Copy link
Contributor Author

ndr-ds commented Sep 12, 2025

@ndr-ds ndr-ds changed the base branch from 09-10-add_more_instrumentation to graphite-base/4556 September 12, 2025 16:26
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch from 395665d to 1224256 Compare September 12, 2025 16:26
@ndr-ds ndr-ds changed the base branch from graphite-base/4556 to 09-10-use_release_max_level_trace_instead September 12, 2025 16:26
@ndr-ds ndr-ds mentioned this pull request Sep 12, 2025
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch 9 times, most recently from 7131de0 to d6d672c Compare September 12, 2025 22:57
@ndr-ds ndr-ds changed the base branch from 09-10-use_release_max_level_trace_instead to graphite-base/4556 September 15, 2025 13:38
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch from d6d672c to dd4dd3f Compare September 15, 2025 13:39
@ndr-ds ndr-ds changed the base branch from graphite-base/4556 to 09-10-use_release_max_level_trace_instead September 15, 2025 13:39
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch from dd4dd3f to 43c36fd Compare September 15, 2025 13:49
@ndr-ds ndr-ds force-pushed the 09-10-use_release_max_level_trace_instead branch from bf70753 to a7b6f9b Compare September 16, 2025 17:58
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch from ed95829 to d621078 Compare September 16, 2025 17:58
@ndr-ds ndr-ds force-pushed the 09-10-use_release_max_level_trace_instead branch from a7b6f9b to dfb6602 Compare September 17, 2025 14:29
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch 2 times, most recently from 17b970e to 39ba3b2 Compare September 17, 2025 16:10
@ndr-ds ndr-ds force-pushed the 09-10-use_release_max_level_trace_instead branch from dfb6602 to dd8730c Compare September 17, 2025 16:10
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch 2 times, most recently from 9dcc0e2 to f62b7d2 Compare September 17, 2025 20:26
@ndr-ds ndr-ds requested review from Twey, afck, deuszx, eldios and ma2bd September 18, 2025 01:48
@ndr-ds ndr-ds marked this pull request as ready for review September 18, 2025 01:50
Copy link
Contributor

@deuszx deuszx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but let's wait for more input.

@ndr-ds ndr-ds changed the base branch from 09-10-use_release_max_level_trace_instead to graphite-base/4556 September 22, 2025 13:50
@ndr-ds ndr-ds force-pushed the 09-09-adding_tempo_for_open_tracing branch from f62b7d2 to cfa2f37 Compare September 22, 2025 13:50
@ndr-ds ndr-ds changed the base branch from graphite-base/4556 to main September 22, 2025 13:50
@ndr-ds ndr-ds added this pull request to the merge queue Sep 22, 2025
Merged via the queue into main with commit 36275a3 Sep 22, 2025
34 of 59 checks passed
@ndr-ds ndr-ds deleted the 09-09-adding_tempo_for_open_tracing branch September 22, 2025 15:07
github-merge-queue bot pushed a commit that referenced this pull request Sep 22, 2025
## Motivation

Now that we have distributed tracing (after
#4556), we need more
instrumentation so we have data about more functions in the breakdowns.

## Proposal

Instrument more functions with `telemetry_only` so that we don't get
spammed in our logs, but the spans still get sent to Tempo.

## Test Plan

Tested this with #4556,
saw the spans properly show in the breakdowns.

## Release Plan

- Nothing to do / These changes follow the usual release cycle.
ndr-ds added a commit that referenced this pull request Sep 22, 2025
Distributed tracing is a great way to debug different types of issues,
including for example latency issues. So this is something we definitely
want in general, and probably want by default in production as well.

Implement Distributed Tracing using Grafana Tempo. As it is a Grafana
product, it integrates well with it, which is great for us. The
visualizations also seem to be decent.

Deployed a network with this code and the `linera-infra` portion of
this, and everything works as expected, and I can see the latency
breakdowns (I got a really high latency outlier example):

![Screenshot 2025-09-16 at
13.48.59.png](https://app.graphite.dev/user-attachments/assets/98f49272-d04a-4b7e-aa83-c04f90ec7347.png)

I also chose this because it shows we might be waiting in the chain
worker channel's queue for a while here 🤔 which might be worth
investigating, which I'll do next.

- Nothing to do / These changes follow the usual release cycle.
ndr-ds added a commit that referenced this pull request Sep 22, 2025
Now that we have distributed tracing (after
#4556), we need more
instrumentation so we have data about more functions in the breakdowns.

Instrument more functions with `telemetry_only` so that we don't get
spammed in our logs, but the spans still get sent to Tempo.

Tested this with #4556,
saw the spans properly show in the breakdowns.

- Nothing to do / These changes follow the usual release cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants