Skip to content

Commit bc902f6

Browse files
committed
MON-4442: Add AGENTS.md to CMO
Signed-off-by: Daniel Mellado <[email protected]>
1 parent 911649d commit bc902f6

File tree

1 file changed

+206
-0
lines changed

1 file changed

+206
-0
lines changed

AGENTS.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# AI Agent Guidance for Cluster Monitoring Operator
2+
3+
This file provides guidance to AI agents when working with code in this repository.
4+
5+
This is the Cluster Monitoring Operator (CMO) - the operator that manages the Prometheus-based monitoring stack in
6+
OpenShift. CMO is deployed by the Cluster Version Operator (CVO).
7+
8+
## Architecture Overview
9+
10+
### Jsonnet Manifest Generation
11+
12+
CMO generates Kubernetes manifests using Jsonnet (`jsonnet/`):
13+
14+
- Source: `jsonnet/components/*.libsonnet` and vendored upstream jsonnet
15+
- Output: `assets/` directory with generated YAML manifests
16+
- The operator reads manifests from `assets/` at runtime via `pkg/manifests/`
17+
18+
**Critical**: When modifying manifests, changes must be made in jsonnet source files, then regenerated with
19+
`make generate`. Direct edits to `assets/*.yaml` will be overwritten. **DO NOT** edit them directly.
20+
21+
### Configuration API
22+
23+
Two ConfigMaps control monitoring behavior:
24+
25+
- `cluster-monitoring-config` (openshift-monitoring) - Platform monitoring
26+
- `user-workload-monitoring-config` (openshift-user-workload-monitoring) - User workload monitoring
27+
28+
Types defined in `pkg/manifests/types.go` with validation rules.
29+
30+
- **DO NOT** invent new config keys
31+
32+
## Development Commands
33+
34+
### Local Development
35+
36+
**Prerequisites**: You need access to an OpenShift cluster. You can provision one using
37+
[cluster-bot](https://github.com/openshift/ci-chat-bot) via Slack (Red Hat internal). In Slack, message `@cluster-bot`
38+
with `launch 4.17` (or desired version) to get a temporary cluster with credentials.
39+
40+
```bash
41+
export KUBECONFIG=/path/to/kubeconfig # Requires OpenShift cluster
42+
make run-local # Build and run locally as CMO service account
43+
make run-local SWITCH_TO_CMO=false # Run as current user (e.g., kube:admin)
44+
```
45+
46+
### Jsonnet Workflow
47+
48+
```bash
49+
# Modify jsonnet source files in jsonnet/
50+
make generate # Regenerate manifests, docs, and metadata
51+
make docs # Regenerate documentation only (api.md, resources.md)
52+
make check-assets # Verify assets are up to date
53+
```
54+
55+
**Rapid iteration**: For quick testing, you can modify YAML files in `assets/` directly, run the operator with
56+
`hack/local-cmo.sh` (no rebuild needed), then port changes back to jsonnet. See `Documentation/development.md` for
57+
detailed workflow.
58+
59+
**Two-release annotation/label removal**: To remove a label/annotation from a resource managed by `CreateOrUpdateXXX`
60+
functions:
61+
62+
1. First release: Add suffix `"-"` to the annotation/label (CMO deletes it via library-go)
63+
2. Second release: Remove from jsonnet source
64+
65+
### Testing
66+
67+
```bash
68+
make test # All tests (requires OpenShift cluster with KUBECONFIG)
69+
make test-unit # Unit tests only
70+
make test-e2e # E2E tests (requires OpenShift cluster)
71+
make test-ginkgo # Ginkgo tests (ported from openshift-tests-private)
72+
73+
# Specific tests
74+
go test -v ./pkg/... -run TestName # Specific unit test
75+
go test -v -timeout=120m -run TestName ./test/e2e/ --kubeconfig $KUBECONFIG # Specific e2e test
76+
```
77+
78+
**openshift-tests-extension**: CMO integrates with the OpenShift conformance test framework via `tests-ext` binary.
79+
Run `make tests-ext-update` after modifying Ginkgo tests to update metadata.
80+
81+
### Verification
82+
83+
```bash
84+
make verify # Run all checks
85+
make format # Format code (go fmt, jsonnet fmt, shellcheck)
86+
make golangci-lint # Lint Go code
87+
make check-rules # Validate Prometheus rules with promtool
88+
```
89+
90+
## OpenShift Conventions
91+
92+
### Pull Requests
93+
94+
- **Title format**: `OCPBUGS-12345: descriptive title` (bugs) or `MON-1234: descriptive title` (features)
95+
- Example: `MON-4435: Add RBAC permission for endpointslice resource in UWM prometheus-operator`
96+
- Example: `OCPBUGS-61088: create networkpolicy settings for in-cluster monitoring`
97+
- **Commit format**: `<subsystem>: <what changed>`
98+
- Example: `jsonnet: update prometheus version`
99+
- Example: `e2e: add e2e test to verify endpointslice discovery in uwm`
100+
- All PRs require JIRA ticket reference
101+
102+
### Jira Integration
103+
104+
- **Automatic linking**: PRs are automatically linked to JIRA when the key is in the PR title
105+
- **Lifecycle automation**: [jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin) updates
106+
JIRA status based on PR events
107+
- **Jira commands** (comment on PR):
108+
- `/jira refresh` - Manually sync PR with JIRA issue
109+
- `/jira cc @username` - CC someone on the JIRA issue
110+
- `/jira backport <branch>` - Create backport PR to target branch (e.g., `/jira backport release-4.17`)
111+
- `/jira assign <user>` - Assign the JIRA issue to specified user
112+
- `/jira unassign` - Remove current assignee from JIRA issue
113+
- `/jira comment <comment>` - Add comment to the JIRA issue
114+
- `/jira close` - Close the JIRA issue
115+
- `/jira reopen` - Reopen the JIRA issue
116+
- **Creating tickets**: Use OCPBUGS project for bugs, MON project for features
117+
- **Required fields**: Component (Monitoring), Target Version, Priority
118+
- **Status workflow**: To Do → In Progress → Code Review → Done
119+
120+
### Prow CI
121+
122+
- **Triggering tests**: Tests run automatically on PR creation/update
123+
- **Useful commands** (comment on PR):
124+
- `/retest` - Retry all failed tests
125+
- `/test <job-name>` - Run specific job (e.g., `/test e2e-aws`)
126+
- `/test-with <job-name>` - Run specific job with additional tests
127+
- `/retitle <new-title>` - Change PR title
128+
- `/assign @username` - Assign reviewer
129+
- `/cc @username` - Request review without assignment
130+
- `/hold` - Prevent auto-merge, `/hold cancel` to remove
131+
- `/lgtm` - Approve PR (maintainers only)
132+
- `/approve` - Approve for merge (maintainers only)
133+
- `/cherry-pick <branch>` - Cherry-pick to another branch after merge
134+
- **Important jobs**:
135+
- `ci/prow/images` - Builds container images
136+
- `ci/prow/e2e-*` - E2E test variants
137+
- `ci/prow/verify` - Runs `make verify`
138+
- `ci/prow/unit` - Unit tests
139+
- **Viewing results**: Click "Details" next to job to see Prow logs
140+
- **Common failures**:
141+
- `ci/prow/images` fails if `make verify` would fail (run locally first)
142+
- E2E timeouts may be transient (retry with `/retest`)
143+
- **More commands**: See [prow.ci.openshift.org/command-help](https://prow.ci.openshift.org/command-help)
144+
145+
### Feature Development
146+
147+
- **FeatureGate integration**: CMO integrates with OpenShift FeatureGates for controlling feature availability
148+
- Example: `MetricsCollectionProfiles` feature gate controls collection profile functionality
149+
- Check in `pkg/operator/operator.go`: `featureGates.Enabled(features.FeatureGateMetricsCollectionProfiles)`
150+
- Pass to config: `CollectionProfilesFeatureGateEnabled` flag in `pkg/manifests/config.go`
151+
- **TechPreview → GA lifecycle**:
152+
- TechPreview: Feature gated, requires explicit enablement
153+
- GA: Feature gate removed, enabled by default
154+
- **Adding new features**:
155+
1. Add FeatureGate check in `pkg/operator/operator.go`
156+
2. Pass enabled state through config
157+
3. Conditionally create resources based on gate state (e.g., `serviceMonitors()` helper)
158+
4. Update `pkg/manifests/types.go` if new config fields needed
159+
160+
## Updating Jsonnet Dependencies
161+
162+
Example: Updating kube-prometheus bundle:
163+
164+
```bash
165+
cd jsonnet
166+
167+
# Edit jsonnetfile.json, update version for desired component
168+
jb update
169+
170+
# Stage only the version/sum changes for target bundle in jsonnetfile.lock.json
171+
git add -p jsonnetfile.lock.json
172+
173+
# Revert unwanted changes
174+
git restore jsonnetfile.json jsonnetfile.lock.json
175+
176+
# Reinstall with updated lockfile
177+
rm -rf vendor && jb install
178+
179+
cd ..
180+
make generate
181+
```
182+
183+
See `Documentation/development.md` for detailed workflow.
184+
185+
## Common Pitfalls
186+
187+
1. **Forgetting `make generate`**: Modifying jsonnet without regenerating assets causes CI failures
188+
2. **Missing KUBECONFIG**: E2E tests fail silently if KUBECONFIG isn't set, even if `~/.kube/config` exists
189+
3. **Asset sync issues**: Run `make clean` before `make generate` if vendored jsonnet behaves unexpectedly
190+
4. **Wrong cluster type**: Tests require OpenShift, not vanilla Kubernetes
191+
5. **Stale local CMO**: Make sure you have the right permissions when running locally for development or the operator
192+
may get stuck within the reconcile loop as it won't have permissions to list or modify resources.
193+
194+
## Documentation
195+
196+
- `CONTRIBUTING.md` - Contribution guidelines and workflow details
197+
- `Documentation/development.md` - Detailed development workflows
198+
- [OpenShift Monitoring Docs](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/monitoring/)
199+
\- User-facing monitoring documentation
200+
201+
## Important Files
202+
203+
- `Makefile` - All build and test targets
204+
- `VERSION` - Operator version string
205+
- `manifests/` - Deployment manifests
206+
- `OWNERS` and `OWNERS_ALIASES` - Code ownership definitions, admins.

0 commit comments

Comments
 (0)