Skip to content

Commit 69a4e66

Browse files
committed
MON-4442: Add AGENTS.md to CMO
Signed-off-by: Daniel Mellado <[email protected]>
1 parent faa9562 commit 69a4e66

File tree

1 file changed

+161
-0
lines changed

1 file changed

+161
-0
lines changed

AGENTS.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
This file provides guidance to AI agents when working with code in this repository.
2+
3+
This is the Cluster Monitoring Operator (CMO) - the operator that manages the Prometheus-based monitoring stack in OpenShift. CMO is deployed by the Cluster Version Operator (CVO).
4+
5+
## Architecture Overview
6+
7+
### Jsonnet Manifest Generation
8+
CMO generates Kubernetes manifests using Jsonnet (`jsonnet/`):
9+
- Source: `jsonnet/components/*.libsonnet` and vendored upstream jsonnet
10+
- Output: `assets/` directory with generated YAML manifests
11+
- The operator reads manifests from `assets/` at runtime via `pkg/manifests/`
12+
13+
**Critical**: When modifying manifests, changes must be made in jsonnet source files, then regenerated with `make generate`. Direct edits to `assets/*.yaml` will be overwritten.
14+
15+
### Configuration API
16+
Two ConfigMaps control monitoring behavior:
17+
- `cluster-monitoring-config` (openshift-monitoring) - Platform monitoring
18+
- `user-workload-monitoring-config` (openshift-user-workload-monitoring) - User workload monitoring
19+
20+
Types defined in `pkg/manifests/types.go` with validation rules.
21+
22+
## Development Commands
23+
24+
### Local Development
25+
```bash
26+
export KUBECONFIG=/path/to/kubeconfig # Requires OpenShift cluster
27+
make run-local # Build and run locally as CMO service account
28+
make run-local SWITCH_TO_CMO=false # Run as current user (e.g., kube:admin)
29+
```
30+
31+
### Jsonnet Workflow
32+
```bash
33+
# Modify jsonnet source files in jsonnet/
34+
make generate # Regenerate manifests, docs, and metadata
35+
make docs # Regenerate documentation only (api.md, resources.md)
36+
make check-assets # Verify assets are up to date
37+
```
38+
39+
**Two-release annotation/label removal**: To remove a label/annotation from a resource managed by `CreateOrUpdateXXX` functions:
40+
1. First release: Add suffix `"-"` to the annotation/label (CMO deletes it via library-go)
41+
2. Second release: Remove from jsonnet source
42+
43+
### Testing
44+
```bash
45+
make test # All tests (requires OpenShift cluster with KUBECONFIG)
46+
make test-unit # Unit tests only
47+
make test-e2e # E2E tests (requires OpenShift cluster)
48+
make test-ginkgo # Ginkgo tests (ported from openshift-tests-private)
49+
go test -v ./pkg/... -run TestName # Specific unit test
50+
go test -v -timeout=120m -run TestName ./test/e2e/ --kubeconfig $KUBECONFIG # Specific e2e test
51+
```
52+
53+
**openshift-tests-extension**: CMO integrates with the OpenShift conformance test framework via `tests-ext` binary. Run `make tests-ext-update` after modifying Ginkgo tests to update metadata.
54+
55+
### Verification
56+
```bash
57+
make verify # Run all checks
58+
make format # Format code (go fmt, jsonnet fmt, shellcheck)
59+
make golangci-lint # Lint Go code
60+
make check-rules # Validate Prometheus rules with promtool
61+
```
62+
63+
## OpenShift Conventions
64+
65+
### Pull Requests
66+
- **Title format**: `OCPBUGS-12345: descriptive title` (bugs) or `MON-1234: descriptive title` (features)
67+
- Example: `MON-4435: Add RBAC permission for endpointslice resource in UWM prometheus-operator`
68+
- Example: `OCPBUGS-61088: create networkpolicy settings for in-cluster monitoring`
69+
- **Commit format**: `<subsystem>: <what changed>`
70+
- Example: `jsonnet: update prometheus version`
71+
- Example: `e2e: add e2e test to verify endpointslice discovery in uwm`
72+
- All PRs require JIRA ticket reference
73+
74+
### Jira Integration
75+
- **Automatic linking**: PRs are automatically linked to JIRA when the key is in the PR title
76+
- **Lifecycle automation**: [jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin) updates JIRA status based on PR events
77+
- **Jira commands** (comment on PR):
78+
- `/jira refresh` - Manually sync PR with JIRA issue
79+
- `/jira cc @username` - CC someone on the JIRA issue
80+
- `/jira backport <branch>` - Create backport PR to target branch (e.g., `/jira backport release-4.17`)
81+
- `/jira assign <user>` - Assign the JIRA issue to specified user
82+
- `/jira unassign` - Remove current assignee from JIRA issue
83+
- `/jira comment <comment>` - Add comment to the JIRA issue
84+
- `/jira close` - Close the JIRA issue
85+
- `/jira reopen` - Reopen the JIRA issue
86+
- **Creating tickets**: Use OCPBUGS project for bugs, MON project for features
87+
- **Required fields**: Component (Monitoring), Target Version, Priority
88+
- **Status workflow**: To Do → In Progress → Code Review → Done
89+
90+
### Prow CI
91+
- **Triggering tests**: Tests run automatically on PR creation/update
92+
- **Useful commands** (comment on PR):
93+
- `/retest` - Retry all failed tests
94+
- `/test <job-name>` - Run specific job (e.g., `/test e2e-aws`)
95+
- `/test-with <job-name>` - Run specific job with additional tests
96+
- `/retitle <new-title>` - Change PR title
97+
- `/assign @username` - Assign reviewer
98+
- `/cc @username` - Request review without assignment
99+
- `/hold` - Prevent auto-merge, `/hold cancel` to remove
100+
- `/lgtm` - Approve PR (maintainers only)
101+
- `/approve` - Approve for merge (maintainers only)
102+
- `/cherry-pick <branch>` - Cherry-pick to another branch after merge
103+
- **Important jobs**:
104+
- `ci/prow/images` - Builds container images
105+
- `ci/prow/e2e-*` - E2E test variants
106+
- `ci/prow/verify` - Runs `make verify`
107+
- `ci/prow/unit` - Unit tests
108+
- **Viewing results**: Click "Details" next to job to see Prow logs
109+
- **Common failures**:
110+
- `ci/prow/images` fails if `make verify` would fail (run locally first)
111+
- E2E timeouts may be transient (retry with `/retest`)
112+
- **More commands**: See [prow.ci.openshift.org/command-help](https://prow.ci.openshift.org/command-help)
113+
114+
### Feature Development
115+
- **FeatureGate integration**: CMO integrates with OpenShift FeatureGates for controlling feature availability
116+
- Example: `MetricsCollectionProfiles` feature gate controls collection profile functionality
117+
- Check in `pkg/operator/operator.go`: `featureGates.Enabled(features.FeatureGateMetricsCollectionProfiles)`
118+
- Pass to config: `CollectionProfilesFeatureGateEnabled` flag in `pkg/manifests/config.go`
119+
- **TechPreview → GA lifecycle**:
120+
- TechPreview: Feature gated, requires explicit enablement
121+
- GA: Feature gate removed, enabled by default
122+
- **Adding new features**:
123+
1. Add FeatureGate check in `pkg/operator/operator.go`
124+
2. Pass enabled state through config
125+
3. Conditionally create resources based on gate state (e.g., `serviceMonitors()` helper)
126+
4. Update `pkg/manifests/types.go` if new config fields needed
127+
128+
## Updating Jsonnet Dependencies
129+
130+
Example: Updating kube-prometheus bundle:
131+
132+
```bash
133+
cd jsonnet
134+
# Edit jsonnetfile.json, update version for desired component
135+
jb update
136+
# Stage only the version/sum changes for target bundle in jsonnetfile.lock.json
137+
git add -p jsonnetfile.lock.json
138+
# Revert unwanted changes
139+
git restore jsonnetfile.json jsonnetfile.lock.json
140+
# Reinstall with updated lockfile
141+
rm -rf vendor && jb install
142+
cd ..
143+
make generate
144+
```
145+
146+
See `Documentation/development.md` for detailed workflow.
147+
148+
## Common Pitfalls
149+
150+
1. **Forgetting `make generate`**: Modifying jsonnet without regenerating assets causes CI failures
151+
2. **Missing KUBECONFIG**: E2E tests fail silently if KUBECONFIG isn't set, even if `~/.kube/config` exists
152+
3. **Asset sync issues**: Run `make clean` before `make generate` if vendored jsonnet behaves unexpectedly
153+
4. **Wrong cluster type**: Tests require OpenShift, not vanilla Kubernetes
154+
5. **Stale local CMO**: Make sure you have the rights permission when running it local for development or the operator may get stuck within the reconcile loop as it won't have permissions to list or modify resources.
155+
156+
## Important Files
157+
158+
- `Makefile` - All build and test targets
159+
- `VERSION` - Operator version string
160+
- `manifests/` - CVO deployment manifests (not OLM)
161+
- `hack/build-jsonnet.sh` - Jsonnet to YAML conversion logic

0 commit comments

Comments
 (0)