OCPBUGS-62861: pkg/cvo/metrics: Do not require auth when --hypershift is set #1241

wking · 2025-10-08T21:55:14Z

This is addressing the same HyperShift-scraping issue as #1240. While 1240 is trying to find a long-term path, it requires HyperShift-repo changes to wire up, and those haven't been written yet. This pull request buys time by by wiring the existing --hypershift option to code that disables the authentication requirement in that environment. Standalone clusters will continue to require prometheus-k8s ServiceAccount tokens.

openshift-ci-robot · 2025-10-08T21:55:23Z

@wking: This pull request references Jira Issue OCPBUGS-62861, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.21.0) matches configured target version for branch (4.21.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is addressing the same HyperShift-scraping issue as #1240. While 1240 is trying to find a long-term path, it requires HyperShift-repo changes to wire up, and those haven't been written yet. This pull request buys time by by wiring the existing --hypershift option to code that disables the authentication requirement in that environment. Standalone clusters will continue to require prometheus-k8s ServiceAccount tokens.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-08T21:55:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In 313f8fb (CVO protects /metrics with authorization, 2025-07-22, openshift#1215) and 833a491 (CVO protects /metrics with authorization, 2025-07-22, openshift#1215), the /metrics endpoint began requiring client auth. The only authentication system was Bearer tokens, and the only authorization system was validating that the token belonged to system:serviceaccount:openshift-monitoring:prometheus-k8s. That worked well for standalone clusters, where the ServiceMonitor scraper is the Prometheus from the openshift-monitoring namespace. But it broke scraping on HyperShift [1], where the ServiceMonitor does not request any client authorization [2]. Getting ServiceAccount tokens (and keeping them fresh [3]) from the hosted cluster into a Prometheus scraper running on the management cluster is hard. This commit buys time to sort out a HyperShift metrics authentication strategy by wiring the existing --hypershift option to code that disables the authentication requirement in that environment. Standalone clusters will continue to require prometheus-k8s ServiceAccount tokens.

wking · 2025-10-08T23:34:09Z

Looking for a signal we can use for verification, #1215 ran e2e-hypershift. It passed (although openshift/hypershift#6965 is in flight to get the tests failing on this kind of issue in the future), but digging into gathered artifacts:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1215/pull-ci-openshift-cluster-version-operator-main-e2e-hypershift/1952739873462947840/artifacts/e2e-hypershift/dump-management-cluster/artifacts/artifacts.tar | tar -xOz logs/artifacts/output/hostedcluster-d44932313dd1be2d3560-mgmt/namespaces/openshift-monitoring/pods/prometheus-k8s-0/prometheus/prometheus/logs/current.log | grep cluster-version-operator
2025-08-05T15:53:48.316469617Z time=2025-08-05T15:53:48.316Z level=ERROR source=manager.go:176 msg="error reloading target set" component="scrape manager" err="invalid config id:serviceMonitor/e2e-clusters-ghd95-node-pool-6dl4k/cluster-version-operator/0"
2025-08-05T15:53:48.316543150Z time=2025-08-05T15:53:48.316Z level=ERROR source=manager.go:176 msg="error reloading target set" component="scrape manager" err="invalid config id:serviceMonitor/e2e-clusters-bmg8g-proxy-jplkn/cluster-version-operator/0"
2025-08-05T15:53:48.316617911Z time=2025-08-05T15:53:48.316Z level=ERROR source=manager.go:176 msg="error reloading target set" component="scrape manager" err="invalid config id:serviceMonitor/e2e-clusters-qnv7p-create-cluster-sxsvl/cluster-version-operator/0"

Not all that clear to me what it thought was invalid about the config. Maybe that's the scraping 401 that we're trying to address? Maybe not?

wking · 2025-10-08T23:34:38Z

David's got a more straightforward take in this approach in #1242, so let's pivot to that.

wking · 2025-10-09T00:12:41Z

Looks like error reloading target set...invalid config id: means a scrape pool without a scrape config. I can try to cross-ref the Prometheus errors against the e2e run completing and tearing down the namespace. From the test-case's destroy.log:

{"level":"info","ts":1754409030.0325892,"msg":"Deleting hosted cluster","namespace":"e2e-clusters-qnv7p","name":"create-cluster-sxsvl"}

Converting from Unix to UTC:

$ date --utc --iso=s --date '@1754409030'
2025-08-05T15:50:30+00:00

Which indeed predates the 2025-08-05T15:53:48 error log. So hooray, I understand what those logs are about. But I'm back to now knowing how to verify if this fix is working or not.

openshift-ci · 2025-10-09T01:16:43Z

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-techpreview	`a526efe`	link	true	`/test e2e-aws-ovn-techpreview`
ci/prow/okd-scos-e2e-aws-ovn	`a526efe`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

wking · 2025-10-09T02:58:01Z

Closing in favor of the #1242 approach, now picked forward onto main in #1243.

DavidHurta · 2025-10-09T12:46:05Z

Closing in favor of the #1242 approach, now picked forward onto main in #1243.

The PR is not closed. Closing. If I misunderstood, please reopen.

/close

openshift-ci · 2025-10-09T12:47:23Z

@DavidHurta: Closed this PR.

In response to this:

Closing in favor of the #1242 approach, now picked forward onto main in #1243.

The PR is not closed. Closing. If I misunderstood, please reopen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 8, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2025

wking force-pushed the disable-metrics-auth-on-hypershift branch from f35f20a to e16ed47 Compare October 8, 2025 21:58

wking force-pushed the disable-metrics-auth-on-hypershift branch from e16ed47 to a526efe Compare October 8, 2025 22:17

openshift-ci bot closed this Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCPBUGS-62861: pkg/cvo/metrics: Do not require auth when --hypershift is set #1241

OCPBUGS-62861: pkg/cvo/metrics: Do not require auth when --hypershift is set #1241

Uh oh!

wking commented Oct 8, 2025

Uh oh!

openshift-ci-robot commented Oct 8, 2025

Uh oh!

openshift-ci bot commented Oct 8, 2025

Uh oh!

wking commented Oct 8, 2025

Uh oh!

wking commented Oct 8, 2025

Uh oh!

wking commented Oct 9, 2025

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

wking commented Oct 9, 2025

Uh oh!

DavidHurta commented Oct 9, 2025

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OCPBUGS-62861: pkg/cvo/metrics: Do not require auth when --hypershift is set #1241

OCPBUGS-62861: pkg/cvo/metrics: Do not require auth when --hypershift is set #1241

Uh oh!

Conversation

wking commented Oct 8, 2025

Uh oh!

openshift-ci-robot commented Oct 8, 2025

Uh oh!

openshift-ci bot commented Oct 8, 2025

Uh oh!

wking commented Oct 8, 2025

Uh oh!

wking commented Oct 8, 2025

Uh oh!

wking commented Oct 9, 2025

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

wking commented Oct 9, 2025

Uh oh!

DavidHurta commented Oct 9, 2025

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants