You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[observability] most basic OpenTelemetry integration into MCK (#93)
# Summary
This pull request introduces OpenTelemetry tracing support to the
MongoDB Kubernetes Operator and its related components. Key changes
include the integration of OpenTelemetry libraries, the addition of
tracing configuration, and updates to ensure trace propagation across
the application. These changes enhance observability and debugging
capabilities.
In our CI suite this means we will have the following kind of traces:
```
trace_id: abc123
┌────────────────────┐
│ Evergreen │
│ span_id: ROOT │
│ parent_id: none │
└─────────┬──────────┘
│
┌──────────────────────────┼─────────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────────┐ ┌────────────────────┐
│ E2E Test │ │ Operator │ │ (Other…) │
│ span_id: A1 │ │ span_id: B1 │ │ │
│ parent: ROOT │ │ parent: ROOT │ │ │
└──────┬───────┘ └──────┬─────────┘ └────────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌────────────────────┐
│ E2E Function │ │ Reconcile Loop │
│ span_id: A2 │ │ span_id: B2 │
│ parent: A1 │ │ parent: B1 │
└──────────────┘ └────────────────────┘
```
### OpenTelemetry Integration:
* **Tracing in `main.go`:**
- Added OpenTelemetry setup in the `main` function, including trace and
span ID extraction from environment variables and the creation of a root
span for the operator. Tracing context is propagated across controllers
and shutdown processes are handled gracefully.
* **Telemetry in `pkg/telemetry/client.go`:** <--- this is good to know
if we happen to make a change and happen to send to prod atlas
- Added a span to the `SendEventWithRetry` function to capture telemetry
events and include the Atlas base URL as a span attribute.
### Helm Chart Updates:
* **Operator configuration:**
- Added OpenTelemetry-specific environment variables (`OTEL_TRACE_ID`,
`OTEL_PARENT_ID`, `OTEL_EXPORTER_OTLP_ENDPOINT`) to the operator's
deployment template.
(`[helm_chart/templates/operator.yamlR83-R90](diffhunk://#diff-5d2e377a6806023ca9eff60be4d7e5cd879803de2bd3800b630f479f8728f322R83-R90)`)
- Introduced OpenTelemetry configuration options (`enabled`, `traceID`,
`parentID`, `collectorEndpoint`) in the Helm chart's `values.yaml`.
### Dependency Updates:
* **Go module dependencies:**
- Added OpenTelemetry-related libraries (`otel`, `otel/sdk`,
`otel/trace`, etc.) to `go.mod`.
## Proof of Work
- e.g.
[patch](https://spruce.mongodb.com/task/mongodb_kubernetes_e2e_mdb_kind_ubi_cloudqa_e2e_replica_set_pv_patch_943128faa1f738781f2b0e7442c8d63077c9ecd5_682dce94ed55bd000781c215_25_05_21_13_01_10/logs?execution=0&sortBy=STATUS&sortDir=ASC)
- generated traces in our ci:
[Link](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/evergreen-agent/trace/uNv82G92XFD?fields[]=s_name&fields[]=s_serviceName&span=418471a7ba33179f)

## Checklist
- [ ] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [ ] Have you checked for release_note changes?
## Reminder (Please remove this when merging)
- Please try to Approve or Reject Changes the PR, keep PRs in review as
short as possible
- Our Short Guide for PRs:
[Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0)
- Remember the following Communication Standards - use comment prefixes
for clarity:
* **blocking**: Must be addressed before approval.
* **follow-up**: Can be addressed in a later PR or ticket.
* **q**: Clarifying question.
* **nit**: Non-blocking suggestions.
* **note**: Side-note, non-actionable. Example: Praise
* --> no prefix is considered a question
0 commit comments