feat(target-allocator): Add support for scrape-classes by ChristianCiach · Pull Request #4216 · open-telemetry/opentelemetry-operator

ChristianCiach · 2025-07-24T18:44:20Z

Description:

Adds support for ScrapeClasses as supported by the Prometheus Operator. Users can use these to add global configurations to multiple (or even all) PodMonitors and ServiceMonitors.

I need this feature to get rid of the default labels (pod, container, namespace, ...) that the Prometheus-Operator automatically adds to all PodMonitors and ServiceMonitors as described in https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/running-exporters.md#podmonitors. I consider these labels problematic and redundant, because they are already present as resource-attributes using proper Otel Semconv names. But I cannot safely drop these labels at the collector, because at this stage I cannot distinguish actual metric labels from the labels added by the Prometheus Operator. So I want to use a default ScrapeClass to globally drop these problematic labels before the metrics are scraped.

In #3600 (comment) @swiatekm raised a concern:

One point worth mentioning is that unlike in prometheus-operator, in a target allocator + otel collector setup, service discovery and scraping happen in different applications. Right now we solve the issue around credentials by encrypting traffic between the target allocator and prometheus receiver, and simply exposing them via target allocator endpoints. Not sure if that makes any difference for scrape classes specifically.

Looking at the code, I believe this concern does not apply. The configuration of the scrape-classes are simply merged with the configurations found in PodMonitors and ServiceMonitors. The TargetAllocator simply retrieves the merged configuration from the Prometheus Operator code, so the Target Allocator cannot even distinguish whether the configuration originates from a PodMonitor or a ScrapeClass.

Link to tracking Issue(s): #3600

Resolves: Support Prometheus Operator ScrapeClass. #3600

Testing:

I added a test-case that adds a simple scrape-class to the PrometheusCR configuration and let a PodMonitor reference it. The configuration of the scrape-class is correctly added to the resulting prometheus scrape-config.

Documentation:

Mentioned in README and re-generated API docs.

ChristianCiach · 2025-07-24T19:00:52Z

@nicolastakashi I just see that you offered to work on this in #3600 (comment). This was a while ago though and the changes are pretty small, so I hope you are okay with me taking the plunge on this.

nicolastakashi · 2025-07-24T19:15:32Z

@nicolastakashi I just see that you offered to work on this in #3600 (comment). This was a while ago though and the changes are pretty small, so I hope you are okay with me taking the plunge on this.

Thank you very much for working on that @ChristianCiach 🙏🏽

nicolastakashi

LGTM

ChristianCiach · 2025-07-24T19:24:49Z

The OpenTelemetryCollector CRD doesn't yet offer to configure these ScrapeClasses via .spec.targetAllocator.prometheusCR.scrapeClasses. I use the TargetAllocator standalone, so I don't really care about the rest of the Operator and its CRDs. If possible I would like this PR being merged without touching the rest of the operator. If you want me to, I would gladly try to extend the operator (and the CRDs) in a follow-up PR.

ChristianCiach · 2025-07-24T20:49:02Z

Scratch the comment above. I've pushed a commit to extend the OpenTelemetryCollector CRD, added a test case, extended the README and re-generated the API docs.

Please don't be alarmed that I've touched the surrounding tests. Adding my test was a miserable experience, because some of the existing test cases changed some common variables without any regard for the following tests. I've cleaned this up a bit, so the test cases in this function are more independent. I've taken great care to ensure that the tests still test what they're designed to test.

ChristianCiach · 2025-07-25T14:06:38Z

Looks like I didn't properly generate the CRD yamls (hence the failed pipelines). Sorry, I know next to nothing about building Operators and CRDs. I will look into it.

Edit: Should be all good now. https://github.com/open-telemetry/opentelemetry-operator/blob/main/CONTRIBUTING.md#local-development-cheat-sheet is a life-saver for people new to Go.

github-actions · 2025-07-25T14:15:35Z

E2E Test Results

33 files ±0 221 suites ±0 3h 47m 19s ⏱️ - 3m 52s
85 tests ±0 85 ✅ ±0 0 💤 ±0 0 ❌ ±0
225 runs ±0 225 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit cabb803. ± Comparison against base commit 087f27e.

♻️ This comment has been updated with latest results.

swiatekm · 2025-08-12T11:46:44Z

Sorry for not reviewing this earlier @ChristianCiach. Your changes look good to me, in general. Something that only became clear to me once I saw this PR, though, is that scrape classes add a large definition to our CRDs, and also make us directly dependent on the prometheus-operator API package. This would be much easier to accept if a Scrape Class was an independent CR like ServiceMonitor, that could exist in the cluster without impacting our definitions directly. I think this is something we'll have to discuss during a SIG meeting and figure out if we're willing to accept the added maintenance burden.

I apologize for letting you implement this before realizing that this might be a problem.

ChristianCiach · 2025-08-12T11:54:45Z

@swiatekm No worries! I noticed the same thing when I added the import, but I couldn't think of any alternative.

Thank you for taking this to the SIG meeting. I look forward to the decision.

If we decide to not add scrape-classes like this, I would still like to see any kind of global relabeling rules in the future, for the reasons outlined in the PR description. There is currently no other way to remove the default labels added by Pod/ServiceMonitors.

apis/v1beta1/targetallocator_types.go

ChristianCiach · 2025-09-08T12:43:05Z

I am back from vacation and I wonder if there have been any news regarding this PR.

As far as I understand it, the only point of contention is whether to expose part of the Prometheus-Operator-API inside the OpentelemetryCollector CRD, making the CRD a lot larger.

The more I think about this, the more I think that this is the right thing to do. If the size of the CRD is the main concern, I could probably expose the scrapeClasses attribute as type ~~[]map[string]any~~ *runtime.RawExtension and then convert it back to []*monitoringv1.ScrapeClass internally. The main downside of this would be the lack of validation in editors and on admission.

ChristianCiach · 2025-09-17T11:22:09Z

The main downside of this would be the lack of validation in editors and on admission.

But this is also the case when configuring your prometheus rules in spec.receivers.prometheus.config. There, too, you need to make sure that your prometheus receiver configuration is compatible with the prometheus-version the receiver is importing. In other words, exposing the scrape-classes as runtime.RawExtension is analogues to exposing the raw prometheus configuration as the config attribute in the prometheus-receiver. I would be fine with that.

swiatekm · 2025-09-17T11:35:34Z

@ChristianCiach apologies for the late response. We've had a lot of long vacations and other life events among the maintainers as well recently, so we're not as prompt in responding as we would've liked.

For reference, I ran a quick check on the size increase of the TargetAllocator CRD. We go from ~140KB to ~150KB, with a practical limit of around 250KB (the maximum size of an annotation value in K8s). I think that's acceptable, but we'll need to properly evaluate both this change and the dependency it creates for us.

@nicolastakashi do you know what kind of stability guarantees we can expect for the struct we're importing in this PR? Us using it this way would also introduce more friction if prometheus-operator ever wanted to make breaking changes to it, too.

simonpasquier · 2025-09-17T12:35:20Z

👋 prometheus-operator maintainer here!

Regarding our change policy, we stick to the Kubernetes API conventions as described in https://prometheus-operator.dev/docs/community/contributing/#changes-to-the-apis

For stable API versions (e.g. v1), we don’t allow to break backward and forward compatibility.

Regarding the CRD size, going above the 250KB is fine as long as users know how to bypass the potential issue wrt annotations: https://prometheus-operator.dev/docs/platform/troubleshooting/#customresourcedefinition--is-invalid-metadataannotations-too-long-issue

ChristianCiach · 2025-12-03T09:22:14Z

I am back from a prolonged absence and I still would like to see this merged eventually. Please let me know if there is still anything to discuss.

After having thought about this for a while, I kinda prefer my own suggestion from before to change the scrapeClasses CRD attribute to *runtime.RawExtension. I will try this locally sometime this or next week.

ChristianCiach · 2025-12-04T14:25:30Z

I've experimented with using RawExtension for the scrapeClasses field, but it doesn't feel right.

scrapeClasses is an array, but runtime.RawExtension can only hold an object. I could change the type to []runtime.RawExtension, but this needs awkward unwrapping when unmarshalling the CR.

I could use a simple *string:

spec:
  targetAllocator:
    prometheusCR:
      enabled: true
      # scrapeClasses is a multiline yaml string!
      scrapeClasses: |
        - name: istio-mtls
          default: true
          tlsConfig:
            caFile: "/etc/istio-certs/root-cert.pem"
            certFile: "/etc/istio-certs/cert-chain.pem"
            keyFile: "/etc/istio-certs/key.pem"
            insecureSkipVerify: true

But this doesn't feel very operator'y.

Importing the ScrapeClass type of the Prometheus-CRD into our own CRD is at least honest, because this is the type that can be actually used. Should the imported ScrapeClass type ever change, our own CRD should change as well to indicate a broken configuration that would otherwise fail at runtime.

So, if the size of the CRD is not of major concern, I think this PR is good to go.

swiatekm · 2025-12-04T16:37:20Z

@ChristianCiach how about using []v1beta1.AnyConfig, the same as we do for scrape configs embedded in the target allocator?

The size of the CRD is a concern, and I, for one, would be against adding this to the OpenTelemetryCollector CRD, which is too big as-is. For TargetAllocator it's less of an issue. This may be fine, given that we prefer that users with more sophisticated needs use the TargetAllocator CRD regardless.

There's also the concern about breaking changes in prometheus-operator's struct definition. I suppose it's included in Prometheus, which is stable. This is something we can probably live with, but I'd like opinions from more maintainers and approvers here. @open-telemetry/operator-approvers

jaronoff97 · 2025-12-04T17:18:23Z

agreed with Mikolaj's opinion, i think it makes sense to keep this in the TA alone. If a user wants to take advantage of this, they probably know what they're doing and would benefit from standalone TA CRD anyway, this also limits the blast radius of the otel CR in case prometheus were to push breaking changes for whatever reason.

ChristianCiach · 2025-12-05T15:38:30Z

@swiatekm Thanks, I didn't know about v1beta1.AnyConfig. If I had known, I would've used that to begin with :)

I've just changed the CRD to use this type. Feel free to review!

swiatekm

The changes look good to me now. One thing this PR is missing is a e2e test showing the scrape class is actually applied to the scrape configs.

Signed-off-by: Christian Ciach <christian.ciach@gmail.com>

jaronoff97 · 2025-12-18T17:03:44Z

Thank you very much for your contribution 🙇 I really appreciate the back and forth here.

ChristianCiach requested a review from a team as a code owner July 24, 2025 18:44

ChristianCiach force-pushed the allocator-scrapeclasses branch 3 times, most recently from e2652bc to ae7b1bd Compare July 24, 2025 19:13

nicolastakashi approved these changes Jul 24, 2025

View reviewed changes

ChristianCiach force-pushed the allocator-scrapeclasses branch 3 times, most recently from 8f9c35a to 86414e8 Compare July 25, 2025 14:24

ChristianCiach marked this pull request as draft July 25, 2025 14:26

ChristianCiach force-pushed the allocator-scrapeclasses branch from 86414e8 to 03037d7 Compare July 25, 2025 14:30

ChristianCiach marked this pull request as ready for review July 25, 2025 14:56

ChristianCiach force-pushed the allocator-scrapeclasses branch 4 times, most recently from b3a115b to bee85d1 Compare July 28, 2025 12:55

frzifus added the discuss-at-sig This issue or PR should be discussed at the next SIG meeting label Aug 4, 2025

ChristianCiach force-pushed the allocator-scrapeclasses branch 3 times, most recently from 32f2850 to cabb803 Compare August 5, 2025 17:32

frzifus reviewed Aug 14, 2025

View reviewed changes

apis/v1beta1/targetallocator_types.go Outdated Show resolved Hide resolved

ChristianCiach force-pushed the allocator-scrapeclasses branch 2 times, most recently from 091b4bd to 3d90d7d Compare December 4, 2025 11:29

ChristianCiach force-pushed the allocator-scrapeclasses branch 2 times, most recently from f04e614 to 8be16dc Compare December 4, 2025 14:42

ChristianCiach force-pushed the allocator-scrapeclasses branch 2 times, most recently from 9151c3c to 357b06c Compare December 5, 2025 15:31

ChristianCiach force-pushed the allocator-scrapeclasses branch from 357b06c to 1c29619 Compare December 10, 2025 08:57

swiatekm reviewed Dec 13, 2025

View reviewed changes

ChristianCiach marked this pull request as draft December 18, 2025 13:07

ChristianCiach force-pushed the allocator-scrapeclasses branch 3 times, most recently from dd06ce5 to 7fb86d7 Compare December 18, 2025 14:12

ChristianCiach added 3 commits December 18, 2025 15:18

target-allocator: Add support for scrape classes

708a69a

Signed-off-by: Christian Ciach <christian.ciach@gmail.com>

Add scrapeClasses attribute to CRDs

e9eed47

Signed-off-by: Christian Ciach <christian.ciach@gmail.com>

Add e2e test for prometheus scrape classes

1d98549

Signed-off-by: Christian Ciach <christian.ciach@gmail.com>

ChristianCiach force-pushed the allocator-scrapeclasses branch 2 times, most recently from c998fff to 1d98549 Compare December 18, 2025 14:38

ChristianCiach marked this pull request as ready for review December 18, 2025 15:04

swiatekm requested a review from frzifus December 18, 2025 16:11

swiatekm approved these changes Dec 18, 2025

View reviewed changes

jaronoff97 approved these changes Dec 18, 2025

View reviewed changes

jaronoff97 merged commit d8953ec into open-telemetry:main Dec 18, 2025
63 of 66 checks passed

Conversation

ChristianCiach commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChristianCiach commented Jul 24, 2025

Uh oh!

nicolastakashi commented Jul 24, 2025

Uh oh!

nicolastakashi left a comment

Choose a reason for hiding this comment

Uh oh!

ChristianCiach commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChristianCiach commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChristianCiach commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

swiatekm commented Aug 12, 2025

Uh oh!

ChristianCiach commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ChristianCiach commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChristianCiach commented Sep 17, 2025

Uh oh!

swiatekm commented Sep 17, 2025

Uh oh!

simonpasquier commented Sep 17, 2025

Uh oh!

ChristianCiach commented Dec 3, 2025

Uh oh!

ChristianCiach commented Dec 4, 2025

Uh oh!

swiatekm commented Dec 4, 2025

Uh oh!

jaronoff97 commented Dec 4, 2025

Uh oh!

ChristianCiach commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swiatekm left a comment

Choose a reason for hiding this comment

Uh oh!

jaronoff97 commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ChristianCiach commented Jul 24, 2025 •

edited

Loading

ChristianCiach commented Jul 24, 2025 •

edited

Loading

ChristianCiach commented Jul 24, 2025 •

edited

Loading

ChristianCiach commented Jul 25, 2025 •

edited

Loading

github-actions bot commented Jul 25, 2025 •

edited

Loading

ChristianCiach commented Aug 12, 2025 •

edited

Loading

ChristianCiach commented Sep 8, 2025 •

edited

Loading

ChristianCiach commented Dec 5, 2025 •

edited

Loading