diff --git a/content/en/blog/2025/stability-proposal-announcement.md b/content/en/blog/2025/stability-proposal-announcement.md new file mode 100644 index 000000000000..ee242e35c947 --- /dev/null +++ b/content/en/blog/2025/stability-proposal-announcement.md @@ -0,0 +1,306 @@ +--- +title: Evolving OpenTelemetry's Stabilization and Release Practices +linkTitle: Stability Proposal Announcement +date: 2025-11-07 +author: OpenTelemetry Governance Committee +sig: Governance Committee +cSpell:ignore: deprioritize incentivized rollouts +--- + +## Summary + +OpenTelemetry is, by any metric, one of the largest and most exciting projects +in the cloud native space. Over the past five years, this community has come +together to build one of the most essential observability projects in history. +We're not resting on our laurels, though. The project consistently seeks out, +and listens to, feedback from a wide array of stakeholders. What we're hearing +from you is that in order to move to the next level, we need to adjust our +priorities and focus on stability, reliability, and organization of project +releases and artifacts like documentation and examples. + +Over the past year, we've run a variety of user interviews, surveys, and had +open discussions across a range of venues. These discussions have demonstrated +that the complexity and lack of stability in OpenTelemetry creates impediments +to production deployments. + +This blog post lays out the objectives and goals that the Governance Committee +believes are crucial to addressing this feedback. We're starting with this post +in order to have these discussions in public. + +### Our Goals + +- Ensure that all OpenTelemetry distributions are 'stable by default' and + provide standardized mechanisms for users to opt-in to experimental or + unstable features. +- Have a single, clear, and consistent set of criteria for stability that + includes documentation, performance testing, benchmarks, etc. +- Make it easier for instrumentation libraries to stabilize and encourage + federation of semantic conventions. +- Introduce 'epoch releases' that are easier for end-user organizations to + consume. + +**We'd appreciate your feedback!** + +From maintainers and contributors, we'd appreciate your feedback on this +proposal in general and on specifics, such as implementation timelines, the +requirements for moving stability levels, and how to handle telemetry output +migrations. + +From end-users, we'd appreciate your feedback on how you'd prefer to adopt +releases of OpenTelemetry, and how you currently do so. As we evaluate different +versioning and release strategies, it would be helpful to understand how you're +currently rolling out changes -- especially in polyglot environments. We also +would appreciate your feedback on documentation and performance benchmarking for +components such as instrumentation libraries, the Collector, etc. + +From integrators, vendors, and the wider ecosystem, we would appreciate feedback +and constructive proposals on instrumentation and semantic convention metadata +and discovery. For integrators that are building on top of, or alongside, +OpenTelemetry we would love to know how we can make it easier for you and your +users to consume OpenTelemetry, as well as how we can make it easier for you to +publish and maintain your own instrumentation. + +Further sections of this blog have other specific asks that we'd appreciate your +feedback on. Please remember that the specific ways we accomplish these goals +are not set in stone -- that's why we want your feedback on the proposals! If +you think there's a better way to accomplish these goals, please use the +discussion to let us know. + +[Join the discussion!](https://github.com/open-telemetry/community/discussions/3098). + +## Why are we doing this? + +OpenTelemetry has grown into a massive, complex ecosystem. We support four +different telemetry signals (tracing, metrics, logs, and profiles) across more +than a dozen programming languages. Each language has its own runtime +requirements and execution environments. The +[specification compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) +shows just how much we're trying to accomplish – and it's overwhelming. + +This complexity creates real barriers to adoption. Organizations ready to deploy +OpenTelemetry in production encounter unexpected roadblocks: configuration that +breaks between minor versions, performance regressions that only appear at +scale, and the challenge of coordinating rollouts across hundreds or thousands +of services. Many teams end up delaying or scaling back their OpenTelemetry +deployments as a result. + +For maintainers, this complexity makes their job harder than it needs to be. +There's a lack of clear milestones and guidance about what's 'most important' at +any given time. Stability efforts involve a lot of churn and there's often +conflicting guidance about where you should focus your time. + +Addressing these concerns should be a high priority for the project, both for +the health of our maintainers and contributors, but also allowing us to continue +to grow and scale as we mature, especially as we become more deeply integrated +into the cloud native ecosystem. + +The Governance Committee believes that these changes need community involvement +and discussion to be a success, so we’re taking this opportunity to announce our +intention and open a +[GitHub discussion](https://github.com/open-telemetry/community/discussions/3098) +in order to get feedback from users, maintainers, and contributors. We do not +anticipate that these changes will be completed overnight, and want to assure +everyone that we will continue to prioritize our existing commitments to users +and maintainers even as we consider necessary changes for the overall wellbeing +and maturity of the project. + +## 1. Stable By Default + +Stability guarantees have been a long-held principle in OpenTelemetry, with +exceedingly high bars. There is a tension between this and user needs that we'd +like to discuss. + +### Background + +OpenTelemetry is a specification for how cloud native software -- libraries, +frameworks, infrastructure abstractions, executable code, etc. -- produces and +communicates telemetry data about its operation. This specification is designed +to be exhaustive, comprehensive, and low-level. Many of the elements of the +specification are hard-won knowledge from the combined decades of experience its +authors have with building, operating, or designing telemetry systems at planet +scale. + +A specification with no implementation is not a useful thing for end users, +though. Developers and operators approach telemetry through a variety of lenses; +Some organizations have high standards for observability, with entire teams +dedicated to building internal monitoring and instrumentation frameworks. Other +organizations view observability and monitoring as a second or third order +priority -- something that needs to happen, but not something that's +incentivized. OpenTelemetry, as a specification, needs to serve all of these +users and their use cases. + +To make OpenTelemetry useful, we need to provide an 'on-ramp' from existing +methods and modes, existing tools and strategies, which means we need to provide +implementations of not just the specification, but _applications_ of it as well. +In practice, this means we need to distribute libraries to add OpenTelemetry +instrumentation to existing HTTP servers and clients, or Collector receivers to +scrape metrics from MySQL and translate them into OTLP. + +Most of the value our community derives from OpenTelemetry comes directly from +instrumentation libraries and Collector components – not the core SDKs. While we +organize these as `contrib` repositories to distinguish them from core +components, end users don't see or care about this distinction. They just want +instrumentation that works. + +For maintainers and project leadership, our stability goals and the nature of +`contrib` present a significant challenge. Users want stable, well-tested, and +performant releases -- that _also_ perform the same function as commercial +instrumentation agents. + +### Goals and Objectives + +At a high level, these are the three points in this area: + +1. All components across all repositories (including semantic conventions) + should adhere to a consistent way of communicating stability, through a + metadata file/information, that can be discovered and parsed in a + programmatic way. The exact format should be defined through an OTEP and + incorporated into the specification. +2. Stability requirements should be expanded to include more requirements around + documentation and where it's hosted, example code, performance benchmarks + (where applicable), implementation cookbooks, and other artifacts as + necessary. +3. Stable distributions of OpenTelemetry should only enable stable components by + default. Users should be able to select a desired minimum stability level + with a documented and consistent configuration option. + +We appreciate that these would be a big change for maintainers, especially those +who have shipped v1+ of their libraries. We would deeply appreciate your +feedback on these objectives in the +[discussion](https://github.com/open-telemetry/community/discussions/3098). + +## 2. Instrumentation Stability and Semantic Conventions + +In order to achieve our stability goals, we'll need to address semantic +convention stability and processes as well. + +### Semantic Convention Challenges + +Semantic conventions evolve slowly and deliberately because they must work +across diverse telemetry systems. While OpenTelemetry is designed for +interconnected signals flowing together, users deploy many different storage and +analysis engines to consume this data. Each backend has its own constraints and +capabilities. Maintainers must balance competing concerns – keeping cardinality +manageable, ensuring attributes are useful but not overly specific, and making +conventions that work well regardless of where the data ends up. + +The downside of this is that progress on semantic conventions can be slow, and +this slowness impacts all consumers of the conventions. Many instrumentation +libraries are currently stuck on pre-release versions because they depend on +experimental semantic conventions. Outside contributors are stuck between +emitting unspecified telemetry or trying to engage in the process, which +requires a long commitment. Finally, we're internally inconsistent in +instrumentation across the project; some libraries are mapped to conventions, +others exist independently of it. + +### Instrumentation and Convention Goals + +Our goals here are designed to achieve three outcomes. + +1. Instrumentation stability should be decoupled from semantic convention + stability. We have a lot of stable instrumentation that is safe to run in + production, but has data that may change in the future. Users have told us + that conflating these two levels of stability is confusing and limits their + options. +2. Semantic conventions should be more federated; OpenTelemetry should not be + the final word on what conventions exist, and instead should focus on + creating core conventions that can be extended and built upon. +3. Semantic convention development and iteration should not be a blocker on + distribution maintainers. + +To this end, we have a few recommendations we'd like to codify into the +specification. First, our position around instrumentation libraries in +OpenTelemetry is that they exist as concrete implementations of the semantic +conventions. This gives us a concrete target for 'first party' instrumentation +libraries that we wish to support in distributions. In addition, maintainers +should prioritize instrumentations that align to existing conventions and +deprioritize others. + +Second, we'd like to make it easier for maintainers to ship stable +instrumentations. If an instrumentation's API surface is stable, then we believe +that semantic convention stability should not block the stabilization of that +instrumentation library. This means that we'll need to be thoughtful in +providing migration pathways for telemetry as operators upgrade to new major +versions of instrumentation libraries. + +Finally, we'd like to make it easier for third-parties to publish their own +semantic conventions by formalizing and stabilizing necessary parts of the +semantic conventions in order for other organizations to ship conventions for +their libraries, frameworks, tech stacks, etc. + +In order to accomplish this, we're looking for feedback on several areas from +maintainers and end-users -- especially around the maturity/lifecycle of +semantic conventions, as well as what's missing in terms of federating semantic +conventions. We are more flexible on proposals here, but our outcomes aren't. +Remember, a core goal of the project is to encourage other libraries, tools, and +frameworks to +[natively adopt OpenTelemetry](https://www.youtube.com/watch?v=l8xiNOCIdLY) -- +semantic conventions are a big part of that. + +## 3. Confident and Stable Releases + +### The Challenge + +OpenTelemetry isn't just a single binary deployed into a Kubernetes cluster. +Subtle differences in everything from configuration to telemetry output between +different versions of instrumentation libraries, Collector receivers, and SDKs +can cause a real headache for adopters. In addition, the rapid release cadence +of many components causes real difficulty for end users, especially around the +Collector. Enterprise deployments and upgrades are slow, deliberate things -- +teams simply do not have the bandwidth to validate and roll out changes at the +cadence we ship. + +### Release Goals and Strategy + +Ultimately, our goal here is to make it easier for large organizations to deploy +OpenTelemetry. Please keep in mind that in many organizations, 'deployment' and +'upgrades' are non-trivial tasks that involve many teams and stakeholders across +different business units or areas of responsibility including security. + +Our current proposal is the creation of a Release SIG that will be responsible +for creating a schedule of 'epoch' releases for OpenTelemetry. These epoch +versions would essentially be a manifest pointing to a tested, documented, and +stable set of components that meet project stability requirements. + +This is not a trivial undertaking, to be clear. Those efforts will communicate +many of the requirements these epoch releases must follow, after all. To our +maintainers and contributors, this effort is not intended to change how +individual components, SDKs, or APIs are versioned or released. Rather, we want +to provide tested, stable release combinations that work well together for end +users who need that stability. + +For end-users, we would appreciate feedback on how you are currently managing +upgrades, what you'd like to see in this area, and your current challenges in +deployment and upgrade of both SDKs and Collectors. + +## Looking Forward + +These changes are a reflection of the impact and importance of OpenTelemetry to +the cloud native software community. +[OpenTelemetry has been the second highest velocity project in the CNCF over the past few years](https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf), +and +[nearly 50% of surveyed cloud native end user companies have adopted the project](https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf). +These changes are setting up the next chapter of our success, and becoming truly +ubiquitous. + +Our mission as a project is not changing, but our priorities are. + +1. Stability and usability for all developers and users. +2. Clear packaging, installation, and usage paths. +3. Predictability and consistency. + +For contributors and maintainers, what does this mean? We'll fast-track +proposals that align with these priorities. If there's feature work or +instrumentation that doesn't align to this, that's fine -- we'd ask that you +work on it outside the project and discover where our existing integration +points and patterns don't work. That's good feedback, and will help us improve +the specification for everyone. + +For maintainers, contributors, and integrators -- we would appreciate your +feedback in +[this GitHub Discussion](https://github.com/open-telemetry/community/discussions/3098) +on the topics and proposals raised here. You can also send feedback on this +proposal to [feedback@opentelemetry.io](mailto:feedback@opentelemetry.io) or on +the CNCF Slack in the #opentelemetry channel. We also look forward to meeting +the cloud native community in person at KubeCon next week -- please join us +there with comments! diff --git a/static/refcache.json b/static/refcache.json index 42aeb8ebf365..0788184995b2 100644 --- a/static/refcache.json +++ b/static/refcache.json @@ -6487,6 +6487,10 @@ "StatusCode": 206, "LastSeen": "2025-11-06T13:18:04.403726264Z" }, + "https://github.com/open-telemetry/community/discussions/3098": { + "StatusCode": 206, + "LastSeen": "2025-10-27T12:41:04.136861-04:00" + }, "https://github.com/open-telemetry/community/issues": { "StatusCode": 206, "LastSeen": "2025-11-06T13:18:07.573050123Z" @@ -6555,6 +6559,10 @@ "StatusCode": 206, "LastSeen": "2025-11-05T17:54:23.020851678Z" }, + "https://github.com/open-telemetry/community/issues/3086": { + "StatusCode": 206, + "LastSeen": "2025-10-23T12:46:15.816184-04:00" + }, "https://github.com/open-telemetry/community/issues/3119": { "StatusCode": 206, "LastSeen": "2025-11-07T09:45:57.34650427Z" @@ -20143,6 +20151,14 @@ "StatusCode": 200, "LastSeen": "2025-11-06T13:12:47.713467099Z" }, + "https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf": { + "StatusCode": 200, + "LastSeen": "2025-10-23T12:46:22.058029-04:00" + }, + "https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf": { + "StatusCode": 200, + "LastSeen": "2025-10-23T12:46:24.016534-04:00" + }, "https://www.cockroachlabs.com/": { "StatusCode": 200, "LastSeen": "2025-11-02T09:41:43.333917291Z" @@ -20335,6 +20351,10 @@ "StatusCode": 200, "LastSeen": "2025-11-05T17:54:01.831754937Z" }, + "https://www.google.com/url?q=https://www.youtube.com/watch?v%3Dl8xiNOCIdLY&sa=D&source=docs&ust=1761158059548569&usg=AOvVaw3rCMkjmo1CMucSQtkHjDI3": { + "StatusCode": 200, + "LastSeen": "2025-10-23T12:46:17.523765-04:00" + }, "https://www.graalvm.org/latest/reference-manual/native-image/": { "StatusCode": 206, "LastSeen": "2025-11-05T17:53:51.225544197Z" @@ -21747,6 +21767,10 @@ "StatusCode": 200, "LastSeen": "2025-11-07T09:46:03.359087525Z" }, + "https://www.youtube.com/watch?v=l8xiNOCIdLY": { + "StatusCode": 200, + "LastSeen": "2025-10-27T12:13:47.481412-04:00" + }, "https://www.youtube.com/watch?v=t550FzDi054": { "StatusCode": 200, "LastSeen": "2025-11-06T13:12:44.05853989Z"