feat(target-allocator): Add support for scrape-classes#4216
feat(target-allocator): Add support for scrape-classes#4216jaronoff97 merged 3 commits intoopen-telemetry:mainfrom
Conversation
|
@nicolastakashi I just see that you offered to work on this in #3600 (comment). This was a while ago though and the changes are pretty small, so I hope you are okay with me taking the plunge on this. |
e2652bc to
ae7b1bd
Compare
Thank you very much for working on that @ChristianCiach 🙏🏽 |
|
The |
|
Scratch the comment above. I've pushed a commit to extend the Please don't be alarmed that I've touched the surrounding tests. Adding my test was a miserable experience, because some of the existing test cases changed some common variables without any regard for the following tests. I've cleaned this up a bit, so the test cases in this function are more independent. I've taken great care to ensure that the tests still test what they're designed to test. |
|
Looks like I didn't properly generate the CRD yamls (hence the failed pipelines). Sorry, I know next to nothing about building Operators and CRDs. I will look into it. Edit: Should be all good now. https://github.com/open-telemetry/opentelemetry-operator/blob/main/CONTRIBUTING.md#local-development-cheat-sheet is a life-saver for people new to Go. |
8f9c35a to
86414e8
Compare
86414e8 to
03037d7
Compare
b3a115b to
bee85d1
Compare
32f2850 to
cabb803
Compare
|
Sorry for not reviewing this earlier @ChristianCiach. Your changes look good to me, in general. Something that only became clear to me once I saw this PR, though, is that scrape classes add a large definition to our CRDs, and also make us directly dependent on the prometheus-operator API package. This would be much easier to accept if a Scrape Class was an independent CR like ServiceMonitor, that could exist in the cluster without impacting our definitions directly. I think this is something we'll have to discuss during a SIG meeting and figure out if we're willing to accept the added maintenance burden. I apologize for letting you implement this before realizing that this might be a problem. |
|
@swiatekm No worries! I noticed the same thing when I added the import, but I couldn't think of any alternative. Thank you for taking this to the SIG meeting. I look forward to the decision. If we decide to not add scrape-classes like this, I would still like to see any kind of global relabeling rules in the future, for the reasons outlined in the PR description. There is currently no other way to remove the default labels added by Pod/ServiceMonitors. |
|
I am back from vacation and I wonder if there have been any news regarding this PR. As far as I understand it, the only point of contention is whether to expose part of the Prometheus-Operator-API inside the The more I think about this, the more I think that this is the right thing to do. If the size of the CRD is the main concern, I could probably expose the |
But this is also the case when configuring your prometheus rules in |
|
@ChristianCiach apologies for the late response. We've had a lot of long vacations and other life events among the maintainers as well recently, so we're not as prompt in responding as we would've liked. For reference, I ran a quick check on the size increase of the TargetAllocator CRD. We go from ~140KB to ~150KB, with a practical limit of around 250KB (the maximum size of an annotation value in K8s). I think that's acceptable, but we'll need to properly evaluate both this change and the dependency it creates for us. @nicolastakashi do you know what kind of stability guarantees we can expect for the struct we're importing in this PR? Us using it this way would also introduce more friction if prometheus-operator ever wanted to make breaking changes to it, too. |
|
👋 prometheus-operator maintainer here! Regarding our change policy, we stick to the Kubernetes API conventions as described in https://prometheus-operator.dev/docs/community/contributing/#changes-to-the-apis
Regarding the CRD size, going above the 250KB is fine as long as users know how to bypass the potential issue wrt annotations: https://prometheus-operator.dev/docs/platform/troubleshooting/#customresourcedefinition--is-invalid-metadataannotations-too-long-issue |
|
I am back from a prolonged absence and I still would like to see this merged eventually. Please let me know if there is still anything to discuss. After having thought about this for a while, I kinda prefer my own suggestion from before to change the |
091b4bd to
3d90d7d
Compare
|
I've experimented with using
I could use a simple spec:
targetAllocator:
prometheusCR:
enabled: true
# scrapeClasses is a multiline yaml string!
scrapeClasses: |
- name: istio-mtls
default: true
tlsConfig:
caFile: "/etc/istio-certs/root-cert.pem"
certFile: "/etc/istio-certs/cert-chain.pem"
keyFile: "/etc/istio-certs/key.pem"
insecureSkipVerify: trueBut this doesn't feel very operator'y. Importing the So, if the size of the CRD is not of major concern, I think this PR is good to go. |
f04e614 to
8be16dc
Compare
|
@ChristianCiach how about using The size of the CRD is a concern, and I, for one, would be against adding this to the OpenTelemetryCollector CRD, which is too big as-is. For TargetAllocator it's less of an issue. This may be fine, given that we prefer that users with more sophisticated needs use the TargetAllocator CRD regardless. There's also the concern about breaking changes in prometheus-operator's struct definition. I suppose it's included in Prometheus, which is stable. This is something we can probably live with, but I'd like opinions from more maintainers and approvers here. @open-telemetry/operator-approvers |
|
agreed with Mikolaj's opinion, i think it makes sense to keep this in the TA alone. If a user wants to take advantage of this, they probably know what they're doing and would benefit from standalone TA CRD anyway, this also limits the blast radius of the otel CR in case prometheus were to push breaking changes for whatever reason. |
9151c3c to
357b06c
Compare
|
@swiatekm Thanks, I didn't know about I've just changed the CRD to use this type. Feel free to review! |
357b06c to
1c29619
Compare
swiatekm
left a comment
There was a problem hiding this comment.
The changes look good to me now. One thing this PR is missing is a e2e test showing the scrape class is actually applied to the scrape configs.
dd06ce5 to
7fb86d7
Compare
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
c998fff to
1d98549
Compare
|
Thank you very much for your contribution 🙇 I really appreciate the back and forth here. |
Description:
Adds support for ScrapeClasses as supported by the Prometheus Operator. Users can use these to add global configurations to multiple (or even all) PodMonitors and ServiceMonitors.
I need this feature to get rid of the default labels (pod, container, namespace, ...) that the Prometheus-Operator automatically adds to all PodMonitors and ServiceMonitors as described in https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/running-exporters.md#podmonitors. I consider these labels problematic and redundant, because they are already present as resource-attributes using proper Otel Semconv names. But I cannot safely drop these labels at the collector, because at this stage I cannot distinguish actual metric labels from the labels added by the Prometheus Operator. So I want to use a default ScrapeClass to globally drop these problematic labels before the metrics are scraped.
In #3600 (comment) @swiatekm raised a concern:
Looking at the code, I believe this concern does not apply. The configuration of the scrape-classes are simply merged with the configurations found in PodMonitors and ServiceMonitors. The TargetAllocator simply retrieves the merged configuration from the Prometheus Operator code, so the Target Allocator cannot even distinguish whether the configuration originates from a PodMonitor or a ScrapeClass.
Link to tracking Issue(s): #3600
Testing:
I added a test-case that adds a simple scrape-class to the PrometheusCR configuration and let a PodMonitor reference it. The configuration of the scrape-class is correctly added to the resulting prometheus scrape-config.
Documentation:
Mentioned in README and re-generated API docs.