Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
afd3604
add blog post draft for stability oteps
austinlparker Oct 23, 2025
477b714
fixall
austinlparker Oct 23, 2025
1399c8c
update java agent claim
austinlparker Oct 23, 2025
e79910c
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 24, 2025
f7765a7
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 24, 2025
951307c
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 24, 2025
a6aa277
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 27, 2025
f0c0b25
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 27, 2025
50a3116
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 27, 2025
25af399
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 27, 2025
9d91f14
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 27, 2025
c51f3c7
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Oct 27, 2025
6dd36b0
feedback
austinlparker Oct 27, 2025
0c168f2
mega-feedback pass
austinlparker Nov 1, 2025
c1d0aac
nit
austinlparker Nov 3, 2025
d4a0968
Apply suggestion from @svrnm
austinlparker Nov 3, 2025
4ef11ce
feedback
austinlparker Nov 3, 2025
690d45c
feedback
austinlparker Nov 3, 2025
959017e
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 3, 2025
1f66ca2
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 3, 2025
f9b0c2e
feedback
austinlparker Nov 4, 2025
32c2807
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 4, 2025
7203b1a
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 4, 2025
fefb107
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 4, 2025
4bfa151
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 4, 2025
bf74d4a
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 4, 2025
710d246
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 5, 2025
bac80fd
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 6, 2025
84f2fd6
address feedback
austinlparker Nov 6, 2025
fdaf70d
tweak language
austinlparker Nov 6, 2025
8c3b66a
address trask's inline PR comments from nov 6
austinlparker Nov 6, 2025
71ba313
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 6, 2025
2fe8b49
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 6, 2025
a226c26
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 6, 2025
22eb98e
chore: update refcache after rebase
austinlparker Nov 7, 2025
c27dbb6
fix: resolve markdown linting errors
austinlparker Nov 7, 2025
000bf2f
chore: add words to cspell dictionary
austinlparker Nov 7, 2025
5be7064
fix: use page-local cspell dictionary for blog post
austinlparker Nov 7, 2025
4b05fb2
Revert changes to hugo.yaml
chalin Nov 7, 2025
b930dc7
Merge branch 'main' into ap/addStabProjAnnounceBlog
austinlparker Nov 7, 2025
37a4796
tweak to intro
austinlparker Nov 7, 2025
c3d72de
fix lints
austinlparker Nov 7, 2025
2b07a0c
Update content/en/blog/2025/stability-proposal-announcement.md
austinlparker Nov 8, 2025
5c38ae6
Apply suggestions from code review
austinlparker Nov 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
306 changes: 306 additions & 0 deletions content/en/blog/2025/stability-proposal-announcement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
---
title: Evolving OpenTelemetry's Stabilization and Release Practices
linkTitle: Stability Proposal Announcement
date: 2025-11-07
author: OpenTelemetry Governance Committee
sig: Governance Committee
cSpell:ignore: deprioritize incentivized rollouts
---

## Summary

OpenTelemetry is, by any metric, one of the largest and most exciting projects
in the cloud native space. Over the past five years, this community has come
together to build one of the most essential observability projects in history.
We're not resting on our laurels, though. The project consistently seeks out,
and listens to, feedback from a wide array of stakeholders. What we're hearing
from you is that in order to move to the next level, we need to adjust our
priorities and focus on stability, reliability, and organization of project
releases and artifacts like documentation and examples.

Over the past year, we've run a variety of user interviews, surveys, and had
open discussions across a range of venues. These discussions have demonstrated
that the complexity and lack of stability in OpenTelemetry creates impediments
to production deployments.

This blog post lays out the objectives and goals that the Governance Committee
believes are crucial to addressing this feedback. We're starting with this post
in order to have these discussions in public.

### Our Goals

- Ensure that all OpenTelemetry distributions are 'stable by default' and
provide standardized mechanisms for users to opt-in to experimental or
unstable features.
- Have a single, clear, and consistent set of criteria for stability that
includes documentation, performance testing, benchmarks, etc.
- Make it easier for instrumentation libraries to stabilize and encourage
federation of semantic conventions.
- Introduce 'epoch releases' that are easier for end-user organizations to
consume.

**We'd appreciate your feedback!**

From maintainers and contributors, we'd appreciate your feedback on this
proposal in general and on specifics, such as implementation timelines, the
requirements for moving stability levels, and how to handle telemetry output
migrations.

From end-users, we'd appreciate your feedback on how you'd prefer to adopt
releases of OpenTelemetry, and how you currently do so. As we evaluate different
versioning and release strategies, it would be helpful to understand how you're
currently rolling out changes -- especially in polyglot environments. We also
would appreciate your feedback on documentation and performance benchmarking for
components such as instrumentation libraries, the Collector, etc.

From integrators, vendors, and the wider ecosystem, we would appreciate feedback
and constructive proposals on instrumentation and semantic convention metadata
and discovery. For integrators that are building on top of, or alongside,
OpenTelemetry we would love to know how we can make it easier for you and your
users to consume OpenTelemetry, as well as how we can make it easier for you to
publish and maintain your own instrumentation.

Further sections of this blog have other specific asks that we'd appreciate your
feedback on. Please remember that the specific ways we accomplish these goals
are not set in stone -- that's why we want your feedback on the proposals! If
you think there's a better way to accomplish these goals, please use the
discussion to let us know.

[Join the discussion!](https://github.com/open-telemetry/community/discussions/3098).

## Why are we doing this?

OpenTelemetry has grown into a massive, complex ecosystem. We support four
different telemetry signals (tracing, metrics, logs, and profiles) across more
than a dozen programming languages. Each language has its own runtime
requirements and execution environments. The
[specification compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md)
shows just how much we're trying to accomplish – and it's overwhelming.

This complexity creates real barriers to adoption. Organizations ready to deploy
OpenTelemetry in production encounter unexpected roadblocks: configuration that
breaks between minor versions, performance regressions that only appear at
scale, and the challenge of coordinating rollouts across hundreds or thousands
of services. Many teams end up delaying or scaling back their OpenTelemetry
deployments as a result.

For maintainers, this complexity makes their job harder than it needs to be.
There's a lack of clear milestones and guidance about what's 'most important' at
any given time. Stability efforts involve a lot of churn and there's often
conflicting guidance about where you should focus your time.

Addressing these concerns should be a high priority for the project, both for
the health of our maintainers and contributors, but also allowing us to continue
to grow and scale as we mature, especially as we become more deeply integrated
into the cloud native ecosystem.

The Governance Committee believes that these changes need community involvement
and discussion to be a success, so we’re taking this opportunity to announce our
intention and open a
[GitHub discussion](https://github.com/open-telemetry/community/discussions/3098)
in order to get feedback from users, maintainers, and contributors. We do not
anticipate that these changes will be completed overnight, and want to assure
everyone that we will continue to prioritize our existing commitments to users
and maintainers even as we consider necessary changes for the overall wellbeing
and maturity of the project.

## 1. Stable By Default

Stability guarantees have been a long-held principle in OpenTelemetry, with
exceedingly high bars. There is a tension between this and user needs that we'd
like to discuss.

### Background

OpenTelemetry is a specification for how cloud native software -- libraries,
frameworks, infrastructure abstractions, executable code, etc. -- produces and
communicates telemetry data about its operation. This specification is designed
to be exhaustive, comprehensive, and low-level. Many of the elements of the
specification are hard-won knowledge from the combined decades of experience its
authors have with building, operating, or designing telemetry systems at planet
scale.

A specification with no implementation is not a useful thing for end users,
though. Developers and operators approach telemetry through a variety of lenses;
Some organizations have high standards for observability, with entire teams
dedicated to building internal monitoring and instrumentation frameworks. Other
organizations view observability and monitoring as a second or third order
priority -- something that needs to happen, but not something that's
incentivized. OpenTelemetry, as a specification, needs to serve all of these
users and their use cases.

To make OpenTelemetry useful, we need to provide an 'on-ramp' from existing
methods and modes, existing tools and strategies, which means we need to provide
implementations of not just the specification, but _applications_ of it as well.
In practice, this means we need to distribute libraries to add OpenTelemetry
instrumentation to existing HTTP servers and clients, or Collector receivers to
scrape metrics from MySQL and translate them into OTLP.

Most of the value our community derives from OpenTelemetry comes directly from
instrumentation libraries and Collector components – not the core SDKs. While we
organize these as `contrib` repositories to distinguish them from core
components, end users don't see or care about this distinction. They just want
instrumentation that works.

For maintainers and project leadership, our stability goals and the nature of
`contrib` present a significant challenge. Users want stable, well-tested, and
performant releases -- that _also_ perform the same function as commercial
instrumentation agents.

### Goals and Objectives

At a high level, these are the three points in this area:

1. All components across all repositories (including semantic conventions)
should adhere to a consistent way of communicating stability, through a
metadata file/information, that can be discovered and parsed in a
programmatic way. The exact format should be defined through an OTEP and
incorporated into the specification.
2. Stability requirements should be expanded to include more requirements around
documentation and where it's hosted, example code, performance benchmarks
(where applicable), implementation cookbooks, and other artifacts as
necessary.
3. Stable distributions of OpenTelemetry should only enable stable components by
default. Users should be able to select a desired minimum stability level
with a documented and consistent configuration option.

We appreciate that these would be a big change for maintainers, especially those
who have shipped v1+ of their libraries. We would deeply appreciate your
feedback on these objectives in the
[discussion](https://github.com/open-telemetry/community/discussions/3098).

## 2. Instrumentation Stability and Semantic Conventions

In order to achieve our stability goals, we'll need to address semantic
convention stability and processes as well.

### Semantic Convention Challenges

Semantic conventions evolve slowly and deliberately because they must work
across diverse telemetry systems. While OpenTelemetry is designed for
interconnected signals flowing together, users deploy many different storage and
analysis engines to consume this data. Each backend has its own constraints and
capabilities. Maintainers must balance competing concerns – keeping cardinality
manageable, ensuring attributes are useful but not overly specific, and making
conventions that work well regardless of where the data ends up.

The downside of this is that progress on semantic conventions can be slow, and
this slowness impacts all consumers of the conventions. Many instrumentation
libraries are currently stuck on pre-release versions because they depend on
experimental semantic conventions. Outside contributors are stuck between
emitting unspecified telemetry or trying to engage in the process, which
requires a long commitment. Finally, we're internally inconsistent in
instrumentation across the project; some libraries are mapped to conventions,
others exist independently of it.

### Instrumentation and Convention Goals

Our goals here are designed to achieve three outcomes.

1. Instrumentation stability should be decoupled from semantic convention
stability. We have a lot of stable instrumentation that is safe to run in
production, but has data that may change in the future. Users have told us
that conflating these two levels of stability is confusing and limits their
options.
2. Semantic conventions should be more federated; OpenTelemetry should not be
the final word on what conventions exist, and instead should focus on
creating core conventions that can be extended and built upon.
3. Semantic convention development and iteration should not be a blocker on
distribution maintainers.

To this end, we have a few recommendations we'd like to codify into the
specification. First, our position around instrumentation libraries in
OpenTelemetry is that they exist as concrete implementations of the semantic
conventions. This gives us a concrete target for 'first party' instrumentation
libraries that we wish to support in distributions. In addition, maintainers
should prioritize instrumentations that align to existing conventions and
deprioritize others.

Second, we'd like to make it easier for maintainers to ship stable
instrumentations. If an instrumentation's API surface is stable, then we believe
that semantic convention stability should not block the stabilization of that
instrumentation library. This means that we'll need to be thoughtful in
providing migration pathways for telemetry as operators upgrade to new major
versions of instrumentation libraries.

Finally, we'd like to make it easier for third-parties to publish their own
semantic conventions by formalizing and stabilizing necessary parts of the
semantic conventions in order for other organizations to ship conventions for
their libraries, frameworks, tech stacks, etc.

In order to accomplish this, we're looking for feedback on several areas from
maintainers and end-users -- especially around the maturity/lifecycle of
semantic conventions, as well as what's missing in terms of federating semantic
conventions. We are more flexible on proposals here, but our outcomes aren't.
Remember, a core goal of the project is to encourage other libraries, tools, and
frameworks to
[natively adopt OpenTelemetry](https://www.youtube.com/watch?v=l8xiNOCIdLY) --
semantic conventions are a big part of that.

## 3. Confident and Stable Releases

### The Challenge

OpenTelemetry isn't just a single binary deployed into a Kubernetes cluster.
Subtle differences in everything from configuration to telemetry output between
different versions of instrumentation libraries, Collector receivers, and SDKs
can cause a real headache for adopters. In addition, the rapid release cadence
of many components causes real difficulty for end users, especially around the
Collector. Enterprise deployments and upgrades are slow, deliberate things --
teams simply do not have the bandwidth to validate and roll out changes at the
cadence we ship.

### Release Goals and Strategy

Ultimately, our goal here is to make it easier for large organizations to deploy
OpenTelemetry. Please keep in mind that in many organizations, 'deployment' and
'upgrades' are non-trivial tasks that involve many teams and stakeholders across
different business units or areas of responsibility including security.

Our current proposal is the creation of a Release SIG that will be responsible
for creating a schedule of 'epoch' releases for OpenTelemetry. These epoch
versions would essentially be a manifest pointing to a tested, documented, and
stable set of components that meet project stability requirements.

This is not a trivial undertaking, to be clear. Those efforts will communicate
many of the requirements these epoch releases must follow, after all. To our
maintainers and contributors, this effort is not intended to change how
individual components, SDKs, or APIs are versioned or released. Rather, we want
to provide tested, stable release combinations that work well together for end
users who need that stability.

For end-users, we would appreciate feedback on how you are currently managing
upgrades, what you'd like to see in this area, and your current challenges in
deployment and upgrade of both SDKs and Collectors.

## Looking Forward

These changes are a reflection of the impact and importance of OpenTelemetry to
the cloud native software community.
[OpenTelemetry has been the second highest velocity project in the CNCF over the past few years](https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf),
and
[nearly 50% of surveyed cloud native end user companies have adopted the project](https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf).
These changes are setting up the next chapter of our success, and becoming truly
ubiquitous.

Our mission as a project is not changing, but our priorities are.

1. Stability and usability for all developers and users.
2. Clear packaging, installation, and usage paths.
3. Predictability and consistency.

For contributors and maintainers, what does this mean? We'll fast-track
proposals that align with these priorities. If there's feature work or
instrumentation that doesn't align to this, that's fine -- we'd ask that you
work on it outside the project and discover where our existing integration
points and patterns don't work. That's good feedback, and will help us improve
the specification for everyone.

For maintainers, contributors, and integrators -- we would appreciate your
feedback in
[this GitHub Discussion](https://github.com/open-telemetry/community/discussions/3098)
on the topics and proposals raised here. You can also send feedback on this
proposal to [[email protected]](mailto:[email protected]) or on
the CNCF Slack in the #opentelemetry channel. We also look forward to meeting
the cloud native community in person at KubeCon next week -- please join us
there with comments!
24 changes: 24 additions & 0 deletions static/refcache.json
Original file line number Diff line number Diff line change
Expand Up @@ -6487,6 +6487,10 @@
"StatusCode": 206,
"LastSeen": "2025-11-06T13:18:04.403726264Z"
},
"https://github.com/open-telemetry/community/discussions/3098": {
"StatusCode": 206,
"LastSeen": "2025-10-27T12:41:04.136861-04:00"
},
"https://github.com/open-telemetry/community/issues": {
"StatusCode": 206,
"LastSeen": "2025-11-06T13:18:07.573050123Z"
Expand Down Expand Up @@ -6555,6 +6559,10 @@
"StatusCode": 206,
"LastSeen": "2025-11-05T17:54:23.020851678Z"
},
"https://github.com/open-telemetry/community/issues/3086": {
"StatusCode": 206,
"LastSeen": "2025-10-23T12:46:15.816184-04:00"
},
"https://github.com/open-telemetry/community/issues/3119": {
"StatusCode": 206,
"LastSeen": "2025-11-07T09:45:57.34650427Z"
Expand Down Expand Up @@ -20143,6 +20151,14 @@
"StatusCode": 200,
"LastSeen": "2025-11-06T13:12:47.713467099Z"
},
"https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf": {
"StatusCode": 200,
"LastSeen": "2025-10-23T12:46:22.058029-04:00"
},
"https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf": {
"StatusCode": 200,
"LastSeen": "2025-10-23T12:46:24.016534-04:00"
},
"https://www.cockroachlabs.com/": {
"StatusCode": 200,
"LastSeen": "2025-11-02T09:41:43.333917291Z"
Expand Down Expand Up @@ -20335,6 +20351,10 @@
"StatusCode": 200,
"LastSeen": "2025-11-05T17:54:01.831754937Z"
},
"https://www.google.com/url?q=https://www.youtube.com/watch?v%3Dl8xiNOCIdLY&sa=D&source=docs&ust=1761158059548569&usg=AOvVaw3rCMkjmo1CMucSQtkHjDI3": {
"StatusCode": 200,
"LastSeen": "2025-10-23T12:46:17.523765-04:00"
},
"https://www.graalvm.org/latest/reference-manual/native-image/": {
"StatusCode": 206,
"LastSeen": "2025-11-05T17:53:51.225544197Z"
Expand Down Expand Up @@ -21747,6 +21767,10 @@
"StatusCode": 200,
"LastSeen": "2025-11-07T09:46:03.359087525Z"
},
"https://www.youtube.com/watch?v=l8xiNOCIdLY": {
"StatusCode": 200,
"LastSeen": "2025-10-27T12:13:47.481412-04:00"
},
"https://www.youtube.com/watch?v=t550FzDi054": {
"StatusCode": 200,
"LastSeen": "2025-11-06T13:12:44.05853989Z"
Expand Down