Skip to content

Commit e5d8ad0

Browse files
austinlparkerjulianocosta89dmathieumx-psievan-bradley
authored
add blog post announcing proposed changes/oteps for stability work (#8208)
Co-authored-by: Juliano Costa <[email protected]> Co-authored-by: Damien Mathieu <[email protected]> Co-authored-by: Pablo Baeyens <[email protected]> Co-authored-by: Evan Bradley <[email protected]> Co-authored-by: Severin Neumann <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]> Co-authored-by: Ted Young <[email protected]> Co-authored-by: Juraci Paixão Kröhling <[email protected]> Co-authored-by: Patrice Chalin <[email protected]> Co-authored-by: Tiffany Hrabusa <[email protected]> Co-authored-by: Patrice Chalin <[email protected]>
1 parent ab78663 commit e5d8ad0

File tree

2 files changed

+330
-0
lines changed

2 files changed

+330
-0
lines changed
Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
---
2+
title: Evolving OpenTelemetry's Stabilization and Release Practices
3+
linkTitle: Stability Proposal Announcement
4+
date: 2025-11-07
5+
author: OpenTelemetry Governance Committee
6+
sig: Governance Committee
7+
cSpell:ignore: deprioritize incentivized rollouts
8+
---
9+
10+
## Summary
11+
12+
OpenTelemetry is, by any metric, one of the largest and most exciting projects
13+
in the cloud native space. Over the past five years, this community has come
14+
together to build one of the most essential observability projects in history.
15+
We're not resting on our laurels, though. The project consistently seeks out,
16+
and listens to, feedback from a wide array of stakeholders. What we're hearing
17+
from you is that in order to move to the next level, we need to adjust our
18+
priorities and focus on stability, reliability, and organization of project
19+
releases and artifacts like documentation and examples.
20+
21+
Over the past year, we've run a variety of user interviews, surveys, and had
22+
open discussions across a range of venues. These discussions have demonstrated
23+
that the complexity and lack of stability in OpenTelemetry creates impediments
24+
to production deployments.
25+
26+
This blog post lays out the objectives and goals that the Governance Committee
27+
believes are crucial to addressing this feedback. We're starting with this post
28+
in order to have these discussions in public.
29+
30+
### Our Goals
31+
32+
- Ensure that all OpenTelemetry distributions are 'stable by default' and
33+
provide standardized mechanisms for users to opt-in to experimental or
34+
unstable features.
35+
- Have a single, clear, and consistent set of criteria for stability that
36+
includes documentation, performance testing, benchmarks, etc.
37+
- Make it easier for instrumentation libraries to stabilize and encourage
38+
federation of semantic conventions.
39+
- Introduce 'epoch releases' that are easier for end-user organizations to
40+
consume.
41+
42+
**We'd appreciate your feedback!**
43+
44+
From maintainers and contributors, we'd appreciate your feedback on this
45+
proposal in general and on specifics, such as implementation timelines, the
46+
requirements for moving stability levels, and how to handle telemetry output
47+
migrations.
48+
49+
From end-users, we'd appreciate your feedback on how you'd prefer to adopt
50+
releases of OpenTelemetry, and how you currently do so. As we evaluate different
51+
versioning and release strategies, it would be helpful to understand how you're
52+
currently rolling out changes -- especially in polyglot environments. We also
53+
would appreciate your feedback on documentation and performance benchmarking for
54+
components such as instrumentation libraries, the Collector, etc.
55+
56+
From integrators, vendors, and the wider ecosystem, we would appreciate feedback
57+
and constructive proposals on instrumentation and semantic convention metadata
58+
and discovery. For integrators that are building on top of, or alongside,
59+
OpenTelemetry we would love to know how we can make it easier for you and your
60+
users to consume OpenTelemetry, as well as how we can make it easier for you to
61+
publish and maintain your own instrumentation.
62+
63+
Further sections of this blog have other specific asks that we'd appreciate your
64+
feedback on. Please remember that the specific ways we accomplish these goals
65+
are not set in stone -- that's why we want your feedback on the proposals! If
66+
you think there's a better way to accomplish these goals, please use the
67+
discussion to let us know.
68+
69+
[Join the discussion!](https://github.com/open-telemetry/community/discussions/3098).
70+
71+
## Why are we doing this?
72+
73+
OpenTelemetry has grown into a massive, complex ecosystem. We support four
74+
different telemetry signals (tracing, metrics, logs, and profiles) across more
75+
than a dozen programming languages. Each language has its own runtime
76+
requirements and execution environments. The
77+
[specification compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md)
78+
shows just how much we're trying to accomplish – and it's overwhelming.
79+
80+
This complexity creates real barriers to adoption. Organizations ready to deploy
81+
OpenTelemetry in production encounter unexpected roadblocks: configuration that
82+
breaks between minor versions, performance regressions that only appear at
83+
scale, and the challenge of coordinating rollouts across hundreds or thousands
84+
of services. Many teams end up delaying or scaling back their OpenTelemetry
85+
deployments as a result.
86+
87+
For maintainers, this complexity makes their job harder than it needs to be.
88+
There's a lack of clear milestones and guidance about what's 'most important' at
89+
any given time. Stability efforts involve a lot of churn and there's often
90+
conflicting guidance about where you should focus your time.
91+
92+
Addressing these concerns should be a high priority for the project, both for
93+
the health of our maintainers and contributors, but also allowing us to continue
94+
to grow and scale as we mature, especially as we become more deeply integrated
95+
into the cloud native ecosystem.
96+
97+
The Governance Committee believes that these changes need community involvement
98+
and discussion to be a success, so we’re taking this opportunity to announce our
99+
intention and open a
100+
[GitHub discussion](https://github.com/open-telemetry/community/discussions/3098)
101+
in order to get feedback from users, maintainers, and contributors. We do not
102+
anticipate that these changes will be completed overnight, and want to assure
103+
everyone that we will continue to prioritize our existing commitments to users
104+
and maintainers even as we consider necessary changes for the overall wellbeing
105+
and maturity of the project.
106+
107+
## 1. Stable By Default
108+
109+
Stability guarantees have been a long-held principle in OpenTelemetry, with
110+
exceedingly high bars. There is a tension between this and user needs that we'd
111+
like to discuss.
112+
113+
### Background
114+
115+
OpenTelemetry is a specification for how cloud native software -- libraries,
116+
frameworks, infrastructure abstractions, executable code, etc. -- produces and
117+
communicates telemetry data about its operation. This specification is designed
118+
to be exhaustive, comprehensive, and low-level. Many of the elements of the
119+
specification are hard-won knowledge from the combined decades of experience its
120+
authors have with building, operating, or designing telemetry systems at planet
121+
scale.
122+
123+
A specification with no implementation is not a useful thing for end users,
124+
though. Developers and operators approach telemetry through a variety of lenses;
125+
Some organizations have high standards for observability, with entire teams
126+
dedicated to building internal monitoring and instrumentation frameworks. Other
127+
organizations view observability and monitoring as a second or third order
128+
priority -- something that needs to happen, but not something that's
129+
incentivized. OpenTelemetry, as a specification, needs to serve all of these
130+
users and their use cases.
131+
132+
To make OpenTelemetry useful, we need to provide an 'on-ramp' from existing
133+
methods and modes, existing tools and strategies, which means we need to provide
134+
implementations of not just the specification, but _applications_ of it as well.
135+
In practice, this means we need to distribute libraries to add OpenTelemetry
136+
instrumentation to existing HTTP servers and clients, or Collector receivers to
137+
scrape metrics from MySQL and translate them into OTLP.
138+
139+
Most of the value our community derives from OpenTelemetry comes directly from
140+
instrumentation libraries and Collector components – not the core SDKs. While we
141+
organize these as `contrib` repositories to distinguish them from core
142+
components, end users don't see or care about this distinction. They just want
143+
instrumentation that works.
144+
145+
For maintainers and project leadership, our stability goals and the nature of
146+
`contrib` present a significant challenge. Users want stable, well-tested, and
147+
performant releases -- that _also_ perform the same function as commercial
148+
instrumentation agents.
149+
150+
### Goals and Objectives
151+
152+
At a high level, these are the three points in this area:
153+
154+
1. All components across all repositories (including semantic conventions)
155+
should adhere to a consistent way of communicating stability, through a
156+
metadata file/information, that can be discovered and parsed in a
157+
programmatic way. The exact format should be defined through an OTEP and
158+
incorporated into the specification.
159+
2. Stability requirements should be expanded to include more requirements around
160+
documentation and where it's hosted, example code, performance benchmarks
161+
(where applicable), implementation cookbooks, and other artifacts as
162+
necessary.
163+
3. Stable distributions of OpenTelemetry should only enable stable components by
164+
default. Users should be able to select a desired minimum stability level
165+
with a documented and consistent configuration option.
166+
167+
We appreciate that these would be a big change for maintainers, especially those
168+
who have shipped v1+ of their libraries. We would deeply appreciate your
169+
feedback on these objectives in the
170+
[discussion](https://github.com/open-telemetry/community/discussions/3098).
171+
172+
## 2. Instrumentation Stability and Semantic Conventions
173+
174+
In order to achieve our stability goals, we'll need to address semantic
175+
convention stability and processes as well.
176+
177+
### Semantic Convention Challenges
178+
179+
Semantic conventions evolve slowly and deliberately because they must work
180+
across diverse telemetry systems. While OpenTelemetry is designed for
181+
interconnected signals flowing together, users deploy many different storage and
182+
analysis engines to consume this data. Each backend has its own constraints and
183+
capabilities. Maintainers must balance competing concerns – keeping cardinality
184+
manageable, ensuring attributes are useful but not overly specific, and making
185+
conventions that work well regardless of where the data ends up.
186+
187+
The downside of this is that progress on semantic conventions can be slow, and
188+
this slowness impacts all consumers of the conventions. Many instrumentation
189+
libraries are currently stuck on pre-release versions because they depend on
190+
experimental semantic conventions. Outside contributors are stuck between
191+
emitting unspecified telemetry or trying to engage in the process, which
192+
requires a long commitment. Finally, we're internally inconsistent in
193+
instrumentation across the project; some libraries are mapped to conventions,
194+
others exist independently of it.
195+
196+
### Instrumentation and Convention Goals
197+
198+
Our goals here are designed to achieve three outcomes.
199+
200+
1. Instrumentation stability should be decoupled from semantic convention
201+
stability. We have a lot of stable instrumentation that is safe to run in
202+
production, but has data that may change in the future. Users have told us
203+
that conflating these two levels of stability is confusing and limits their
204+
options.
205+
2. Semantic conventions should be more federated; OpenTelemetry should not be
206+
the final word on what conventions exist, and instead should focus on
207+
creating core conventions that can be extended and built upon.
208+
3. Semantic convention development and iteration should not be a blocker on
209+
distribution maintainers.
210+
211+
To this end, we have a few recommendations we'd like to codify into the
212+
specification. First, our position around instrumentation libraries in
213+
OpenTelemetry is that they exist as concrete implementations of the semantic
214+
conventions. This gives us a concrete target for 'first party' instrumentation
215+
libraries that we wish to support in distributions. In addition, maintainers
216+
should prioritize instrumentations that align to existing conventions and
217+
deprioritize others.
218+
219+
Second, we'd like to make it easier for maintainers to ship stable
220+
instrumentations. If an instrumentation's API surface is stable, then we believe
221+
that semantic convention stability should not block the stabilization of that
222+
instrumentation library. This means that we'll need to be thoughtful in
223+
providing migration pathways for telemetry as operators upgrade to new major
224+
versions of instrumentation libraries.
225+
226+
Finally, we'd like to make it easier for third-parties to publish their own
227+
semantic conventions by formalizing and stabilizing necessary parts of the
228+
semantic conventions in order for other organizations to ship conventions for
229+
their libraries, frameworks, tech stacks, etc.
230+
231+
In order to accomplish this, we're looking for feedback on several areas from
232+
maintainers and end-users -- especially around the maturity/lifecycle of
233+
semantic conventions, as well as what's missing in terms of federating semantic
234+
conventions. We are more flexible on proposals here, but our outcomes aren't.
235+
Remember, a core goal of the project is to encourage other libraries, tools, and
236+
frameworks to
237+
[natively adopt OpenTelemetry](https://www.youtube.com/watch?v=l8xiNOCIdLY) --
238+
semantic conventions are a big part of that.
239+
240+
## 3. Confident and Stable Releases
241+
242+
### The Challenge
243+
244+
OpenTelemetry isn't just a single binary deployed into a Kubernetes cluster.
245+
Subtle differences in everything from configuration to telemetry output between
246+
different versions of instrumentation libraries, Collector receivers, and SDKs
247+
can cause a real headache for adopters. In addition, the rapid release cadence
248+
of many components causes real difficulty for end users, especially around the
249+
Collector. Enterprise deployments and upgrades are slow, deliberate things --
250+
teams simply do not have the bandwidth to validate and roll out changes at the
251+
cadence we ship.
252+
253+
### Release Goals and Strategy
254+
255+
Ultimately, our goal here is to make it easier for large organizations to deploy
256+
OpenTelemetry. Please keep in mind that in many organizations, 'deployment' and
257+
'upgrades' are non-trivial tasks that involve many teams and stakeholders across
258+
different business units or areas of responsibility including security.
259+
260+
Our current proposal is the creation of a Release SIG that will be responsible
261+
for creating a schedule of 'epoch' releases for OpenTelemetry. These epoch
262+
versions would essentially be a manifest pointing to a tested, documented, and
263+
stable set of components that meet project stability requirements.
264+
265+
This is not a trivial undertaking, to be clear. Those efforts will communicate
266+
many of the requirements these epoch releases must follow, after all. To our
267+
maintainers and contributors, this effort is not intended to change how
268+
individual components, SDKs, or APIs are versioned or released. Rather, we want
269+
to provide tested, stable release combinations that work well together for end
270+
users who need that stability.
271+
272+
For end-users, we would appreciate feedback on how you are currently managing
273+
upgrades, what you'd like to see in this area, and your current challenges in
274+
deployment and upgrade of both SDKs and Collectors.
275+
276+
## Looking Forward
277+
278+
These changes are a reflection of the impact and importance of OpenTelemetry to
279+
the cloud native software community.
280+
[OpenTelemetry has been the second highest velocity project in the CNCF over the past few years](https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf),
281+
and
282+
[nearly 50% of surveyed cloud native end user companies have adopted the project](https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf).
283+
These changes are setting up the next chapter of our success, and becoming truly
284+
ubiquitous.
285+
286+
Our mission as a project is not changing, but our priorities are.
287+
288+
1. Stability and usability for all developers and users.
289+
2. Clear packaging, installation, and usage paths.
290+
3. Predictability and consistency.
291+
292+
For contributors and maintainers, what does this mean? We'll fast-track
293+
proposals that align with these priorities. If there's feature work or
294+
instrumentation that doesn't align to this, that's fine -- we'd ask that you
295+
work on it outside the project and discover where our existing integration
296+
points and patterns don't work. That's good feedback, and will help us improve
297+
the specification for everyone.
298+
299+
For maintainers, contributors, and integrators -- we would appreciate your
300+
feedback in
301+
[this GitHub Discussion](https://github.com/open-telemetry/community/discussions/3098)
302+
on the topics and proposals raised here. You can also send feedback on this
303+
proposal to [[email protected]](mailto:[email protected]) or on
304+
the CNCF Slack in the #opentelemetry channel. We also look forward to meeting
305+
the cloud native community in person at KubeCon next week -- please join us
306+
there with comments!

static/refcache.json

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6487,6 +6487,10 @@
64876487
"StatusCode": 206,
64886488
"LastSeen": "2025-11-06T13:18:04.403726264Z"
64896489
},
6490+
"https://github.com/open-telemetry/community/discussions/3098": {
6491+
"StatusCode": 206,
6492+
"LastSeen": "2025-10-27T12:41:04.136861-04:00"
6493+
},
64906494
"https://github.com/open-telemetry/community/issues": {
64916495
"StatusCode": 206,
64926496
"LastSeen": "2025-11-06T13:18:07.573050123Z"
@@ -6555,6 +6559,10 @@
65556559
"StatusCode": 206,
65566560
"LastSeen": "2025-11-05T17:54:23.020851678Z"
65576561
},
6562+
"https://github.com/open-telemetry/community/issues/3086": {
6563+
"StatusCode": 206,
6564+
"LastSeen": "2025-10-23T12:46:15.816184-04:00"
6565+
},
65586566
"https://github.com/open-telemetry/community/issues/3119": {
65596567
"StatusCode": 206,
65606568
"LastSeen": "2025-11-07T09:45:57.34650427Z"
@@ -20143,6 +20151,14 @@
2014320151
"StatusCode": 200,
2014420152
"LastSeen": "2025-11-06T13:12:47.713467099Z"
2014520153
},
20154+
"https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf": {
20155+
"StatusCode": 200,
20156+
"LastSeen": "2025-10-23T12:46:22.058029-04:00"
20157+
},
20158+
"https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf": {
20159+
"StatusCode": 200,
20160+
"LastSeen": "2025-10-23T12:46:24.016534-04:00"
20161+
},
2014620162
"https://www.cockroachlabs.com/": {
2014720163
"StatusCode": 200,
2014820164
"LastSeen": "2025-11-02T09:41:43.333917291Z"
@@ -20335,6 +20351,10 @@
2033520351
"StatusCode": 200,
2033620352
"LastSeen": "2025-11-05T17:54:01.831754937Z"
2033720353
},
20354+
"https://www.google.com/url?q=https://www.youtube.com/watch?v%3Dl8xiNOCIdLY&sa=D&source=docs&ust=1761158059548569&usg=AOvVaw3rCMkjmo1CMucSQtkHjDI3": {
20355+
"StatusCode": 200,
20356+
"LastSeen": "2025-10-23T12:46:17.523765-04:00"
20357+
},
2033820358
"https://www.graalvm.org/latest/reference-manual/native-image/": {
2033920359
"StatusCode": 206,
2034020360
"LastSeen": "2025-11-05T17:53:51.225544197Z"
@@ -21747,6 +21767,10 @@
2174721767
"StatusCode": 200,
2174821768
"LastSeen": "2025-11-07T09:46:03.359087525Z"
2174921769
},
21770+
"https://www.youtube.com/watch?v=l8xiNOCIdLY": {
21771+
"StatusCode": 200,
21772+
"LastSeen": "2025-10-27T12:13:47.481412-04:00"
21773+
},
2175021774
"https://www.youtube.com/watch?v=t550FzDi054": {
2175121775
"StatusCode": 200,
2175221776
"LastSeen": "2025-11-06T13:12:44.05853989Z"

0 commit comments

Comments
 (0)