[DPE-9444] Switch to ops tracing (16/edge)#1264
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (68.96%) is below the target coverage (70.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## 16/edge #1264 +/- ##
===========================================
- Coverage 68.97% 68.96% -0.01%
===========================================
Files 16 16
Lines 3816 3809 -7
Branches 575 574 -1
===========================================
- Hits 2632 2627 -5
+ Misses 982 980 -2
Partials 202 202 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @trace_charm( | ||
| tracing_endpoint="tracing_endpoint", | ||
| extra_types=( | ||
| GrafanaDashboardProvider, | ||
| LogProxyConsumer, | ||
| MetricsEndpointProvider, | ||
| Patroni, | ||
| PostgreSQL, | ||
| PostgreSQLAsyncReplication, | ||
| PostgreSQLBackups, | ||
| PostgreSQLLDAP, | ||
| PostgreSQLProvider, | ||
| TLS, | ||
| RollingOpsManager, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
There's no shared auto-instrumentation. Traces would show up as juju hooks.
There was a problem hiding this comment.
do individual ops events (e.g. if there's more than one ops event in a juju hook) show up?
if not, does the tracing add any value? since we can get timings from the debug-log
There was a problem hiding this comment.
I am planning a talk with Ops team the next office house (to restore profiling capabilities without re-implementing it in charm).
yes, it is a functional regression, but ops[tracing] if currently focused on distributed tracing and not on charm profiling. Anyway we have to migrate as charm lib has been deprecated and CVE affected.
There was a problem hiding this comment.
to restore profiling capabilities
I'm not sure how useful they were initially (never used in production afaik because of the overhead), and they came at a cost of harder to understand tracebacks. I think detailed profiling could be interesting, but using an approach similar to parca—i.e. sampling instead of tracing; not modifying the workload
what I was wondering is what the existing ops tracing currently adds over the debug log
Ops creates the root span and a separate span when calling each observer
https://documentation.ubuntu.com/ops/latest/explanation/tracing/
it sounds like multiple ops events in a juju event are traced separately? not sure if I am understanding correctly
There was a problem hiding this comment.
do individual ops events (e.g. if there's more than one ops event in a juju hook) show up?
if not, does the tracing add any value? since we can get timings from the debug-log
It's a mixed bag, IMHO. We get more details from ops itself, but less from the charm code:

According to https://documentation.ubuntu.com/ops/latest/explanation/tracing/#division-of-responsibilities only some of the charm code is expected to be instrumented.
There was a problem hiding this comment.
oh it's nice the hook commands are instrumented!
| [[package]] | ||
| name = "protobuf" | ||
| version = "4.25.8" | ||
| version = "6.33.5" |
There was a problem hiding this comment.
Updated protobuf.
| @trace_charm( | ||
| tracing_endpoint="tracing_endpoint", | ||
| extra_types=( | ||
| GrafanaDashboardProvider, | ||
| LogProxyConsumer, | ||
| MetricsEndpointProvider, | ||
| Patroni, | ||
| PostgreSQL, | ||
| PostgreSQLAsyncReplication, | ||
| PostgreSQLBackups, | ||
| PostgreSQLLDAP, | ||
| PostgreSQLProvider, | ||
| TLS, | ||
| RollingOpsManager, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
do individual ops events (e.g. if there's more than one ops event in a juju hook) show up?
if not, does the tracing add any value? since we can get timings from the debug-log
| @trace_charm( | ||
| tracing_endpoint="tracing_endpoint", | ||
| extra_types=( | ||
| GrafanaDashboardProvider, | ||
| LogProxyConsumer, | ||
| MetricsEndpointProvider, | ||
| Patroni, | ||
| PostgreSQL, | ||
| PostgreSQLAsyncReplication, | ||
| PostgreSQLBackups, | ||
| PostgreSQLLDAP, | ||
| PostgreSQLProvider, | ||
| TLS, | ||
| RollingOpsManager, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
I am planning a talk with Ops team the next office house (to restore profiling capabilities without re-implementing it in charm).
yes, it is a functional regression, but ops[tracing] if currently focused on distributed tracing and not on charm profiling. Anyway we have to migrate as charm lib has been deprecated and CVE affected.
pyproject.toml
Outdated
| @@ -32,8 +32,6 @@ poetry-core = "*" | |||
There was a problem hiding this comment.
This comment can be removed, right? If so, I believe it can also be removed in canonical/pgbouncer-k8s-operator#687 and #1279.
Switch to ops tracing and remove the old charm libs.
Checklist