|
| 1 | +This file provides guidance to AI agents when working with code in this repository. |
| 2 | + |
| 3 | +This is the Cluster Monitoring Operator (CMO) - the operator that manages the Prometheus-based monitoring stack in OpenShift. CMO is deployed by the Cluster Version Operator (CVO). |
| 4 | + |
| 5 | +## Architecture Overview |
| 6 | + |
| 7 | +### Jsonnet Manifest Generation |
| 8 | +CMO generates Kubernetes manifests using Jsonnet (`jsonnet/`): |
| 9 | +- Source: `jsonnet/components/*.libsonnet` and vendored upstream jsonnet |
| 10 | +- Output: `assets/` directory with generated YAML manifests |
| 11 | +- The operator reads manifests from `assets/` at runtime via `pkg/manifests/` |
| 12 | + |
| 13 | +**Critical**: When modifying manifests, changes must be made in jsonnet source files, then regenerated with `make generate`. Direct edits to `assets/*.yaml` will be overwritten. |
| 14 | + |
| 15 | +### Configuration API |
| 16 | +Two ConfigMaps control monitoring behavior: |
| 17 | +- `cluster-monitoring-config` (openshift-monitoring) - Platform monitoring |
| 18 | +- `user-workload-monitoring-config` (openshift-user-workload-monitoring) - User workload monitoring |
| 19 | + |
| 20 | +Types defined in `pkg/manifests/types.go` with validation rules. |
| 21 | + |
| 22 | +## Development Commands |
| 23 | + |
| 24 | +### Local Development |
| 25 | +```bash |
| 26 | +export KUBECONFIG=/path/to/kubeconfig # Requires OpenShift cluster |
| 27 | +make run-local # Build and run locally as CMO service account |
| 28 | +make run-local SWITCH_TO_CMO=false # Run as current user (e.g., kube:admin) |
| 29 | +``` |
| 30 | + |
| 31 | +### Jsonnet Workflow |
| 32 | +```bash |
| 33 | +# Modify jsonnet source files in jsonnet/ |
| 34 | +make generate # Regenerate manifests, docs, and metadata |
| 35 | +make docs # Regenerate documentation only (api.md, resources.md) |
| 36 | +make check-assets # Verify assets are up to date |
| 37 | +``` |
| 38 | + |
| 39 | +**Rapid iteration**: For quick testing, you can modify YAML files in `assets/` directly, run the operator with `hack/local-cmo.sh` (no rebuild needed), then port changes back to jsonnet. See `Documentation/development.md` for detailed workflow. |
| 40 | + |
| 41 | +**Two-release annotation/label removal**: To remove a label/annotation from a resource managed by `CreateOrUpdateXXX` functions: |
| 42 | +1. First release: Add suffix `"-"` to the annotation/label (CMO deletes it via library-go) |
| 43 | +2. Second release: Remove from jsonnet source |
| 44 | + |
| 45 | +### Testing |
| 46 | +```bash |
| 47 | +make test # All tests (requires OpenShift cluster with KUBECONFIG) |
| 48 | +make test-unit # Unit tests only |
| 49 | +make test-e2e # E2E tests (requires OpenShift cluster) |
| 50 | +make test-ginkgo # Ginkgo tests (ported from openshift-tests-private) |
| 51 | +go test -v ./pkg/... -run TestName # Specific unit test |
| 52 | +go test -v -timeout=120m -run TestName ./test/e2e/ --kubeconfig $KUBECONFIG # Specific e2e test |
| 53 | +``` |
| 54 | + |
| 55 | +**openshift-tests-extension**: CMO integrates with the OpenShift conformance test framework via `tests-ext` binary. Run `make tests-ext-update` after modifying Ginkgo tests to update metadata. |
| 56 | + |
| 57 | +### Verification |
| 58 | +```bash |
| 59 | +make verify # Run all checks |
| 60 | +make format # Format code (go fmt, jsonnet fmt, shellcheck) |
| 61 | +make golangci-lint # Lint Go code |
| 62 | +make check-rules # Validate Prometheus rules with promtool |
| 63 | +``` |
| 64 | + |
| 65 | +## OpenShift Conventions |
| 66 | + |
| 67 | +### Pull Requests |
| 68 | +- **Title format**: `OCPBUGS-12345: descriptive title` (bugs) or `MON-1234: descriptive title` (features) |
| 69 | + - Example: `MON-4435: Add RBAC permission for endpointslice resource in UWM prometheus-operator` |
| 70 | + - Example: `OCPBUGS-61088: create networkpolicy settings for in-cluster monitoring` |
| 71 | +- **Commit format**: `<subsystem>: <what changed>` |
| 72 | + - Example: `jsonnet: update prometheus version` |
| 73 | + - Example: `e2e: add e2e test to verify endpointslice discovery in uwm` |
| 74 | +- All PRs require JIRA ticket reference |
| 75 | + |
| 76 | +### Jira Integration |
| 77 | +- **Automatic linking**: PRs are automatically linked to JIRA when the key is in the PR title |
| 78 | +- **Lifecycle automation**: [jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin) updates JIRA status based on PR events |
| 79 | +- **Jira commands** (comment on PR): |
| 80 | + - `/jira refresh` - Manually sync PR with JIRA issue |
| 81 | + - `/jira cc @username` - CC someone on the JIRA issue |
| 82 | + - `/jira backport <branch>` - Create backport PR to target branch (e.g., `/jira backport release-4.17`) |
| 83 | + - `/jira assign <user>` - Assign the JIRA issue to specified user |
| 84 | + - `/jira unassign` - Remove current assignee from JIRA issue |
| 85 | + - `/jira comment <comment>` - Add comment to the JIRA issue |
| 86 | + - `/jira close` - Close the JIRA issue |
| 87 | + - `/jira reopen` - Reopen the JIRA issue |
| 88 | +- **Creating tickets**: Use OCPBUGS project for bugs, MON project for features |
| 89 | +- **Required fields**: Component (Monitoring), Target Version, Priority |
| 90 | +- **Status workflow**: To Do → In Progress → Code Review → Done |
| 91 | + |
| 92 | +### Prow CI |
| 93 | +- **Triggering tests**: Tests run automatically on PR creation/update |
| 94 | +- **Useful commands** (comment on PR): |
| 95 | + - `/retest` - Retry all failed tests |
| 96 | + - `/test <job-name>` - Run specific job (e.g., `/test e2e-aws`) |
| 97 | + - `/test-with <job-name>` - Run specific job with additional tests |
| 98 | + - `/retitle <new-title>` - Change PR title |
| 99 | + - `/assign @username` - Assign reviewer |
| 100 | + - `/cc @username` - Request review without assignment |
| 101 | + - `/hold` - Prevent auto-merge, `/hold cancel` to remove |
| 102 | + - `/lgtm` - Approve PR (maintainers only) |
| 103 | + - `/approve` - Approve for merge (maintainers only) |
| 104 | + - `/cherry-pick <branch>` - Cherry-pick to another branch after merge |
| 105 | +- **Important jobs**: |
| 106 | + - `ci/prow/images` - Builds container images |
| 107 | + - `ci/prow/e2e-*` - E2E test variants |
| 108 | + - `ci/prow/verify` - Runs `make verify` |
| 109 | + - `ci/prow/unit` - Unit tests |
| 110 | +- **Viewing results**: Click "Details" next to job to see Prow logs |
| 111 | +- **Common failures**: |
| 112 | + - `ci/prow/images` fails if `make verify` would fail (run locally first) |
| 113 | + - E2E timeouts may be transient (retry with `/retest`) |
| 114 | +- **More commands**: See [prow.ci.openshift.org/command-help](https://prow.ci.openshift.org/command-help) |
| 115 | + |
| 116 | +### Feature Development |
| 117 | +- **FeatureGate integration**: CMO integrates with OpenShift FeatureGates for controlling feature availability |
| 118 | + - Example: `MetricsCollectionProfiles` feature gate controls collection profile functionality |
| 119 | + - Check in `pkg/operator/operator.go`: `featureGates.Enabled(features.FeatureGateMetricsCollectionProfiles)` |
| 120 | + - Pass to config: `CollectionProfilesFeatureGateEnabled` flag in `pkg/manifests/config.go` |
| 121 | +- **TechPreview → GA lifecycle**: |
| 122 | + - TechPreview: Feature gated, requires explicit enablement |
| 123 | + - GA: Feature gate removed, enabled by default |
| 124 | +- **Adding new features**: |
| 125 | + 1. Add FeatureGate check in `pkg/operator/operator.go` |
| 126 | + 2. Pass enabled state through config |
| 127 | + 3. Conditionally create resources based on gate state (e.g., `serviceMonitors()` helper) |
| 128 | + 4. Update `pkg/manifests/types.go` if new config fields needed |
| 129 | + |
| 130 | +## Updating Jsonnet Dependencies |
| 131 | + |
| 132 | +Example: Updating kube-prometheus bundle: |
| 133 | + |
| 134 | +```bash |
| 135 | +cd jsonnet |
| 136 | +# Edit jsonnetfile.json, update version for desired component |
| 137 | +jb update |
| 138 | +# Stage only the version/sum changes for target bundle in jsonnetfile.lock.json |
| 139 | +git add -p jsonnetfile.lock.json |
| 140 | +# Revert unwanted changes |
| 141 | +git restore jsonnetfile.json jsonnetfile.lock.json |
| 142 | +# Reinstall with updated lockfile |
| 143 | +rm -rf vendor && jb install |
| 144 | +cd .. |
| 145 | +make generate |
| 146 | +``` |
| 147 | + |
| 148 | +See `Documentation/development.md` for detailed workflow. |
| 149 | + |
| 150 | +## Common Pitfalls |
| 151 | + |
| 152 | +1. **Forgetting `make generate`**: Modifying jsonnet without regenerating assets causes CI failures |
| 153 | +2. **Missing KUBECONFIG**: E2E tests fail silently if KUBECONFIG isn't set, even if `~/.kube/config` exists |
| 154 | +3. **Asset sync issues**: Run `make clean` before `make generate` if vendored jsonnet behaves unexpectedly |
| 155 | +4. **Wrong cluster type**: Tests require OpenShift, not vanilla Kubernetes |
| 156 | +5. **Stale local CMO**: Make sure you have the rights permission when running it local for development or the operator may get stuck within the reconcile loop as it won't have permissions to list or modify resources. |
| 157 | + |
| 158 | +## Documentation |
| 159 | + |
| 160 | +- `CONTRIBUTING.md` - Contribution guidelines and workflow details |
| 161 | +- `Documentation/development.md` - Detailed development workflows |
| 162 | +- [OpenShift Monitoring Docs](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/monitoring/) - User-facing monitoring documentation |
| 163 | + |
| 164 | +## Important Files |
| 165 | + |
| 166 | +- `Makefile` - All build and test targets |
| 167 | +- `VERSION` - Operator version string |
| 168 | +- `manifests/` - CVO deployment manifests |
| 169 | +- `hack/build-jsonnet.sh` - Jsonnet to YAML conversion logic |
0 commit comments