-
Notifications
You must be signed in to change notification settings - Fork 73
WIP: OCPBUGS-59768: add kube-rbac-proxy sidecars to olm and catalog operators #1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@anik120: This pull request references Jira Issue OCPBUGS-59768, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest |
Looks like the cluster failed to launch in the previous failure because of infra issues /retest |
/retest |
1 similar comment
/retest |
/jira refresh |
@perdasilva: This pull request references Jira Issue OCPBUGS-59768, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anik120, perdasilva The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @Xia-Zhao-rh , could you help verify it? Thanks! |
/retest-required |
cluster launch failed
|
23298d8
to
8ada54f
Compare
@perdasilva @Xia-Zhao-rh apologies, looks like the manifests ordering wasn't correct the first time around. It looks correct now, tests are passing. |
/test unit-registry |
/retest |
Hi, @anik120
And catalog-operator-metrics/olm-operator-metrics endpoints still expose all the information without any authorization
|
/hold on comment above |
8ada54f
to
ab50f29
Compare
New changes are detected. LGTM label has been removed. |
23458cd
to
517e260
Compare
95c7915
to
d1dedf5
Compare
@anik120: This pull request references Jira Issue OCPBUGS-59768, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
d1dedf5
to
51abe03
Compare
Secure metrics endpoint access of both controllers using kube-rbac-proxy
51abe03
to
dabdd4f
Compare
@anik120: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
func (sc *serverConfig) getAddress(tlsEnabled bool) string { | ||
if tlsEnabled { | ||
return ":8443" | ||
return "127.0.0.1:8443" | ||
} | ||
return ":8080" | ||
return "127.0.0.1:8080" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the churn on this one @perdasilva @Xia-Zhao-rh, but I've finally managed to get some time with it and come up with the solution.
We essentially need to make sure the metrics end points are available only on localhost (and not any and every service endpoint), and then safeguard those endpoints using kube-rbac-proxy (the same way that package-server-manager does it, only it was easier to set that up there because it was built using kubebuilder (and therefore has the strandard --metrics
endpoint setup).
Which means we need to make this change upstream, and then pull it in downstream. With these changes, I was able to confirm the endpoints are secured from unauthorized access:
$ oc get pods -n openshift-operator-lifecycle-manager
NAME READY STATUS RESTARTS AGE
catalog-operator-5d8bbf7ccd-nx2mc 2/2 Running 0 6s
olm-operator-fc4fb8f95-vxlnd 2/2 Running 0 7s
package-server-manager-74cfbb9d4-6b896 2/2 Running 3 (99m ago) 112m
$ oc get svc -n openshift-operator-lifecycle-manager
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
catalog-operator-metrics ClusterIP 172.30.19.132 <none> 8443/TCP 108m
olm-operator-metrics ClusterIP 172.30.57.17 <none> 8443/TCP 108m
Port Mapping Details:
ports:
- name: https-metrics
port: 8443 # External service port
protocol: TCP
targetPort: 9443 # kube-rbac-proxy container port
☝🏽 Services correctly route external port 8443 to kube-rbac-proxy port 9443
$ oc exec -n openshift-operator-lifecycle-manager catalog-operator-5d8bbf7ccd-nx2mc -c catalog-operator -- curl -k
https://catalog-operator-metrics.openshift-operator-lifecycle-manager.svc.cluster.local:8443/metrics
Unauthorized
$ oc exec -n openshift-operator-lifecycle-manager olm-operator-fc4fb8f95-vxlnd -c olm-operator -- curl -k
https://olm-operator-metrics.openshift-operator-lifecycle-manager.svc.cluster.local:8443/metrics
Unauthorized
RBAC properly configured to allow prometheus-k8s service account access:
$ oc logs -n openshift-operator-lifecycle-manager catalog-operator-5d8bbf7ccd-nx2mc -c kube-rbac-proxy --tail=3
I0914 16:37:40.257036 1 kube-rbac-proxy.go:347] Reading certificate files
I0914 16:37:40.257404 1 kube-rbac-proxy.go:395] Starting TCP socket on 0.0.0.0:9443
I0914 16:37:40.257658 1 kube-rbac-proxy.go:402] Listening securely on 0.0.0.0:9443
$ oc get rolebinding operator-lifecycle-manager-metrics -n openshift-operator-lifecycle-manager -o yaml
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
ServiceMonitor correctly configured for authenticated HTTPS scraping:
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
serverName: catalog-operator-metrics.openshift-operator-lifecycle-manager.svc
I'll make a PR upstream (that'll need to set up kube-rbac-proxy for upstream too), and then once that's downstream-ed this PR will be ready to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
Integration with OCP monitoring shouldn't rely on service account's bearer token and should use mutual TLS instead (see https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus for details).
As always I'm speaking for the ideal implementation and I understand that it may require several iterations to reach the final state :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonpasquier looks like there previously was an attempt to implement mutual TLS authentication, but it had to be reverted.
This PR did not introduce the ServiceMonitor's reliance on bearer tokens. If we need to revisit that discussion we can do so in a new PR.
Bind metrics server endpoints to 127.0.0.1 instead of all interfaces to improve security by preventing direct external access to metrics endpoints. This change supports downstream kube-rbac-proxy integration (see openshift/operator-framework-olm#1061 for more info) by ensuring metrics are only accessible locally within the pod. Signed-off-by: Anik Bhattacharjee <[email protected]>
Fyi looking into another implementation here #1111 |
@anik120: This pull request references Jira Issue OCPBUGS-59768. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Secure metrics endpoint access of both controllers using kube-rbac-proxy