|
| 1 | +--- |
| 2 | +title: Evolving OpenTelemetry's Stabilization and Release Practices |
| 3 | +linkTitle: Stability Proposal Announcement |
| 4 | +date: 2025-11-07 |
| 5 | +author: OpenTelemetry Governance Committee |
| 6 | +sig: Governance Committee |
| 7 | +cSpell:ignore: deprioritize incentivized rollouts |
| 8 | +--- |
| 9 | + |
| 10 | +## Summary |
| 11 | + |
| 12 | +OpenTelemetry is, by any metric, one of the largest and most exciting projects |
| 13 | +in the cloud native space. Over the past five years, this community has come |
| 14 | +together to build one of the most essential observability projects in history. |
| 15 | +We're not resting on our laurels, though. The project consistently seeks out, |
| 16 | +and listens to, feedback from a wide array of stakeholders. What we're hearing |
| 17 | +from you is that in order to move to the next level, we need to adjust our |
| 18 | +priorities and focus on stability, reliability, and organization of project |
| 19 | +releases and artifacts like documentation and examples. |
| 20 | + |
| 21 | +Over the past year, we've run a variety of user interviews, surveys, and had |
| 22 | +open discussions across a range of venues. These discussions have demonstrated |
| 23 | +that the complexity and lack of stability in OpenTelemetry creates impediments |
| 24 | +to production deployments. |
| 25 | + |
| 26 | +This blog post lays out the objectives and goals that the Governance Committee |
| 27 | +believes are crucial to addressing this feedback. We're starting with this post |
| 28 | +in order to have these discussions in public. |
| 29 | + |
| 30 | +### Our Goals |
| 31 | + |
| 32 | +- Ensure that all OpenTelemetry distributions are 'stable by default' and |
| 33 | + provide standardized mechanisms for users to opt-in to experimental or |
| 34 | + unstable features. |
| 35 | +- Have a single, clear, and consistent set of criteria for stability that |
| 36 | + includes documentation, performance testing, benchmarks, etc. |
| 37 | +- Make it easier for instrumentation libraries to stabilize and encourage |
| 38 | + federation of semantic conventions. |
| 39 | +- Introduce 'epoch releases' that are easier for end-user organizations to |
| 40 | + consume. |
| 41 | + |
| 42 | +**We'd appreciate your feedback!** |
| 43 | + |
| 44 | +From maintainers and contributors, we'd appreciate your feedback on this |
| 45 | +proposal in general and on specifics, such as implementation timelines, the |
| 46 | +requirements for moving stability levels, and how to handle telemetry output |
| 47 | +migrations. |
| 48 | + |
| 49 | +From end-users, we'd appreciate your feedback on how you'd prefer to adopt |
| 50 | +releases of OpenTelemetry, and how you currently do so. As we evaluate different |
| 51 | +versioning and release strategies, it would be helpful to understand how you're |
| 52 | +currently rolling out changes -- especially in polyglot environments. We also |
| 53 | +would appreciate your feedback on documentation and performance benchmarking for |
| 54 | +components such as instrumentation libraries, the Collector, etc. |
| 55 | + |
| 56 | +From integrators, vendors, and the wider ecosystem, we would appreciate feedback |
| 57 | +and constructive proposals on instrumentation and semantic convention metadata |
| 58 | +and discovery. For integrators that are building on top of, or alongside, |
| 59 | +OpenTelemetry we would love to know how we can make it easier for you and your |
| 60 | +users to consume OpenTelemetry, as well as how we can make it easier for you to |
| 61 | +publish and maintain your own instrumentation. |
| 62 | + |
| 63 | +Further sections of this blog have other specific asks that we'd appreciate your |
| 64 | +feedback on. Please remember that the specific ways we accomplish these goals |
| 65 | +are not set in stone -- that's why we want your feedback on the proposals! If |
| 66 | +you think there's a better way to accomplish these goals, please use the |
| 67 | +discussion to let us know. |
| 68 | + |
| 69 | +[Join the discussion!](https://github.com/open-telemetry/community/discussions/3098). |
| 70 | + |
| 71 | +## Why are we doing this? |
| 72 | + |
| 73 | +OpenTelemetry has grown into a massive, complex ecosystem. We support four |
| 74 | +different telemetry signals (tracing, metrics, logs, and profiles) across more |
| 75 | +than a dozen programming languages. Each language has its own runtime |
| 76 | +requirements and execution environments. The |
| 77 | +[specification compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) |
| 78 | +shows just how much we're trying to accomplish – and it's overwhelming. |
| 79 | + |
| 80 | +This complexity creates real barriers to adoption. Organizations ready to deploy |
| 81 | +OpenTelemetry in production encounter unexpected roadblocks: configuration that |
| 82 | +breaks between minor versions, performance regressions that only appear at |
| 83 | +scale, and the challenge of coordinating rollouts across hundreds or thousands |
| 84 | +of services. Many teams end up delaying or scaling back their OpenTelemetry |
| 85 | +deployments as a result. |
| 86 | + |
| 87 | +For maintainers, this complexity makes their job harder than it needs to be. |
| 88 | +There's a lack of clear milestones and guidance about what's 'most important' at |
| 89 | +any given time. Stability efforts involve a lot of churn and there's often |
| 90 | +conflicting guidance about where you should focus your time. |
| 91 | + |
| 92 | +Addressing these concerns should be a high priority for the project, both for |
| 93 | +the health of our maintainers and contributors, but also allowing us to continue |
| 94 | +to grow and scale as we mature, especially as we become more deeply integrated |
| 95 | +into the cloud native ecosystem. |
| 96 | + |
| 97 | +The Governance Committee believes that these changes need community involvement |
| 98 | +and discussion to be a success, so we’re taking this opportunity to announce our |
| 99 | +intention and open a |
| 100 | +[GitHub discussion](https://github.com/open-telemetry/community/discussions/3098) |
| 101 | +in order to get feedback from users, maintainers, and contributors. We do not |
| 102 | +anticipate that these changes will be completed overnight, and want to assure |
| 103 | +everyone that we will continue to prioritize our existing commitments to users |
| 104 | +and maintainers even as we consider necessary changes for the overall wellbeing |
| 105 | +and maturity of the project. |
| 106 | + |
| 107 | +## 1. Stable By Default |
| 108 | + |
| 109 | +Stability guarantees have been a long-held principle in OpenTelemetry, with |
| 110 | +exceedingly high bars. There is a tension between this and user needs that we'd |
| 111 | +like to discuss. |
| 112 | + |
| 113 | +### Background |
| 114 | + |
| 115 | +OpenTelemetry is a specification for how cloud native software -- libraries, |
| 116 | +frameworks, infrastructure abstractions, executable code, etc. -- produces and |
| 117 | +communicates telemetry data about its operation. This specification is designed |
| 118 | +to be exhaustive, comprehensive, and low-level. Many of the elements of the |
| 119 | +specification are hard-won knowledge from the combined decades of experience its |
| 120 | +authors have with building, operating, or designing telemetry systems at planet |
| 121 | +scale. |
| 122 | + |
| 123 | +A specification with no implementation is not a useful thing for end users, |
| 124 | +though. Developers and operators approach telemetry through a variety of lenses; |
| 125 | +Some organizations have high standards for observability, with entire teams |
| 126 | +dedicated to building internal monitoring and instrumentation frameworks. Other |
| 127 | +organizations view observability and monitoring as a second or third order |
| 128 | +priority -- something that needs to happen, but not something that's |
| 129 | +incentivized. OpenTelemetry, as a specification, needs to serve all of these |
| 130 | +users and their use cases. |
| 131 | + |
| 132 | +To make OpenTelemetry useful, we need to provide an 'on-ramp' from existing |
| 133 | +methods and modes, existing tools and strategies, which means we need to provide |
| 134 | +implementations of not just the specification, but _applications_ of it as well. |
| 135 | +In practice, this means we need to distribute libraries to add OpenTelemetry |
| 136 | +instrumentation to existing HTTP servers and clients, or Collector receivers to |
| 137 | +scrape metrics from MySQL and translate them into OTLP. |
| 138 | + |
| 139 | +Most of the value our community derives from OpenTelemetry comes directly from |
| 140 | +instrumentation libraries and Collector components – not the core SDKs. While we |
| 141 | +organize these as `contrib` repositories to distinguish them from core |
| 142 | +components, end users don't see or care about this distinction. They just want |
| 143 | +instrumentation that works. |
| 144 | + |
| 145 | +For maintainers and project leadership, our stability goals and the nature of |
| 146 | +`contrib` present a significant challenge. Users want stable, well-tested, and |
| 147 | +performant releases -- that _also_ perform the same function as commercial |
| 148 | +instrumentation agents. |
| 149 | + |
| 150 | +### Goals and Objectives |
| 151 | + |
| 152 | +At a high level, these are the three points in this area: |
| 153 | + |
| 154 | +1. All components across all repositories (including semantic conventions) |
| 155 | + should adhere to a consistent way of communicating stability, through a |
| 156 | + metadata file/information, that can be discovered and parsed in a |
| 157 | + programmatic way. The exact format should be defined through an OTEP and |
| 158 | + incorporated into the specification. |
| 159 | +2. Stability requirements should be expanded to include more requirements around |
| 160 | + documentation and where it's hosted, example code, performance benchmarks |
| 161 | + (where applicable), implementation cookbooks, and other artifacts as |
| 162 | + necessary. |
| 163 | +3. Stable distributions of OpenTelemetry should only enable stable components by |
| 164 | + default. Users should be able to select a desired minimum stability level |
| 165 | + with a documented and consistent configuration option. |
| 166 | + |
| 167 | +We appreciate that these would be a big change for maintainers, especially those |
| 168 | +who have shipped v1+ of their libraries. We would deeply appreciate your |
| 169 | +feedback on these objectives in the |
| 170 | +[discussion](https://github.com/open-telemetry/community/discussions/3098). |
| 171 | + |
| 172 | +## 2. Instrumentation Stability and Semantic Conventions |
| 173 | + |
| 174 | +In order to achieve our stability goals, we'll need to address semantic |
| 175 | +convention stability and processes as well. |
| 176 | + |
| 177 | +### Semantic Convention Challenges |
| 178 | + |
| 179 | +Semantic conventions evolve slowly and deliberately because they must work |
| 180 | +across diverse telemetry systems. While OpenTelemetry is designed for |
| 181 | +interconnected signals flowing together, users deploy many different storage and |
| 182 | +analysis engines to consume this data. Each backend has its own constraints and |
| 183 | +capabilities. Maintainers must balance competing concerns – keeping cardinality |
| 184 | +manageable, ensuring attributes are useful but not overly specific, and making |
| 185 | +conventions that work well regardless of where the data ends up. |
| 186 | + |
| 187 | +The downside of this is that progress on semantic conventions can be slow, and |
| 188 | +this slowness impacts all consumers of the conventions. Many instrumentation |
| 189 | +libraries are currently stuck on pre-release versions because they depend on |
| 190 | +experimental semantic conventions. Outside contributors are stuck between |
| 191 | +emitting unspecified telemetry or trying to engage in the process, which |
| 192 | +requires a long commitment. Finally, we're internally inconsistent in |
| 193 | +instrumentation across the project; some libraries are mapped to conventions, |
| 194 | +others exist independently of it. |
| 195 | + |
| 196 | +### Instrumentation and Convention Goals |
| 197 | + |
| 198 | +Our goals here are designed to achieve three outcomes. |
| 199 | + |
| 200 | +1. Instrumentation stability should be decoupled from semantic convention |
| 201 | + stability. We have a lot of stable instrumentation that is safe to run in |
| 202 | + production, but has data that may change in the future. Users have told us |
| 203 | + that conflating these two levels of stability is confusing and limits their |
| 204 | + options. |
| 205 | +2. Semantic conventions should be more federated; OpenTelemetry should not be |
| 206 | + the final word on what conventions exist, and instead should focus on |
| 207 | + creating core conventions that can be extended and built upon. |
| 208 | +3. Semantic convention development and iteration should not be a blocker on |
| 209 | + distribution maintainers. |
| 210 | + |
| 211 | +To this end, we have a few recommendations we'd like to codify into the |
| 212 | +specification. First, our position around instrumentation libraries in |
| 213 | +OpenTelemetry is that they exist as concrete implementations of the semantic |
| 214 | +conventions. This gives us a concrete target for 'first party' instrumentation |
| 215 | +libraries that we wish to support in distributions. In addition, maintainers |
| 216 | +should prioritize instrumentations that align to existing conventions and |
| 217 | +deprioritize others. |
| 218 | + |
| 219 | +Second, we'd like to make it easier for maintainers to ship stable |
| 220 | +instrumentations. If an instrumentation's API surface is stable, then we believe |
| 221 | +that semantic convention stability should not block the stabilization of that |
| 222 | +instrumentation library. This means that we'll need to be thoughtful in |
| 223 | +providing migration pathways for telemetry as operators upgrade to new major |
| 224 | +versions of instrumentation libraries. |
| 225 | + |
| 226 | +Finally, we'd like to make it easier for third-parties to publish their own |
| 227 | +semantic conventions by formalizing and stabilizing necessary parts of the |
| 228 | +semantic conventions in order for other organizations to ship conventions for |
| 229 | +their libraries, frameworks, tech stacks, etc. |
| 230 | + |
| 231 | +In order to accomplish this, we're looking for feedback on several areas from |
| 232 | +maintainers and end-users -- especially around the maturity/lifecycle of |
| 233 | +semantic conventions, as well as what's missing in terms of federating semantic |
| 234 | +conventions. We are more flexible on proposals here, but our outcomes aren't. |
| 235 | +Remember, a core goal of the project is to encourage other libraries, tools, and |
| 236 | +frameworks to |
| 237 | +[natively adopt OpenTelemetry](https://www.youtube.com/watch?v=l8xiNOCIdLY) -- |
| 238 | +semantic conventions are a big part of that. |
| 239 | + |
| 240 | +## 3. Confident and Stable Releases |
| 241 | + |
| 242 | +### The Challenge |
| 243 | + |
| 244 | +OpenTelemetry isn't just a single binary deployed into a Kubernetes cluster. |
| 245 | +Subtle differences in everything from configuration to telemetry output between |
| 246 | +different versions of instrumentation libraries, Collector receivers, and SDKs |
| 247 | +can cause a real headache for adopters. In addition, the rapid release cadence |
| 248 | +of many components causes real difficulty for end users, especially around the |
| 249 | +Collector. Enterprise deployments and upgrades are slow, deliberate things -- |
| 250 | +teams simply do not have the bandwidth to validate and roll out changes at the |
| 251 | +cadence we ship. |
| 252 | + |
| 253 | +### Release Goals and Strategy |
| 254 | + |
| 255 | +Ultimately, our goal here is to make it easier for large organizations to deploy |
| 256 | +OpenTelemetry. Please keep in mind that in many organizations, 'deployment' and |
| 257 | +'upgrades' are non-trivial tasks that involve many teams and stakeholders across |
| 258 | +different business units or areas of responsibility including security. |
| 259 | + |
| 260 | +Our current proposal is the creation of a Release SIG that will be responsible |
| 261 | +for creating a schedule of 'epoch' releases for OpenTelemetry. These epoch |
| 262 | +versions would essentially be a manifest pointing to a tested, documented, and |
| 263 | +stable set of components that meet project stability requirements. |
| 264 | + |
| 265 | +This is not a trivial undertaking, to be clear. Those efforts will communicate |
| 266 | +many of the requirements these epoch releases must follow, after all. To our |
| 267 | +maintainers and contributors, this effort is not intended to change how |
| 268 | +individual components, SDKs, or APIs are versioned or released. Rather, we want |
| 269 | +to provide tested, stable release combinations that work well together for end |
| 270 | +users who need that stability. |
| 271 | + |
| 272 | +For end-users, we would appreciate feedback on how you are currently managing |
| 273 | +upgrades, what you'd like to see in this area, and your current challenges in |
| 274 | +deployment and upgrade of both SDKs and Collectors. |
| 275 | + |
| 276 | +## Looking Forward |
| 277 | + |
| 278 | +These changes are a reflection of the impact and importance of OpenTelemetry to |
| 279 | +the cloud native software community. |
| 280 | +[OpenTelemetry has been the second highest velocity project in the CNCF over the past few years](https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf), |
| 281 | +and |
| 282 | +[nearly 50% of surveyed cloud native end user companies have adopted the project](https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf). |
| 283 | +These changes are setting up the next chapter of our success, and becoming truly |
| 284 | +ubiquitous. |
| 285 | + |
| 286 | +Our mission as a project is not changing, but our priorities are. |
| 287 | + |
| 288 | +1. Stability and usability for all developers and users. |
| 289 | +2. Clear packaging, installation, and usage paths. |
| 290 | +3. Predictability and consistency. |
| 291 | + |
| 292 | +For contributors and maintainers, what does this mean? We'll fast-track |
| 293 | +proposals that align with these priorities. If there's feature work or |
| 294 | +instrumentation that doesn't align to this, that's fine -- we'd ask that you |
| 295 | +work on it outside the project and discover where our existing integration |
| 296 | +points and patterns don't work. That's good feedback, and will help us improve |
| 297 | +the specification for everyone. |
| 298 | + |
| 299 | +For maintainers, contributors, and integrators -- we would appreciate your |
| 300 | +feedback in |
| 301 | +[this GitHub Discussion](https://github.com/open-telemetry/community/discussions/3098) |
| 302 | +on the topics and proposals raised here. You can also send feedback on this |
| 303 | + |
| 304 | +the CNCF Slack in the #opentelemetry channel. We also look forward to meeting |
| 305 | +the cloud native community in person at KubeCon next week -- please join us |
| 306 | +there with comments! |
0 commit comments