Skip to content

Commit e860681

Browse files
committed
MON-4442: Add AGENTS.md to CMO
Signed-off-by: Daniel Mellado <[email protected]>
1 parent faa9562 commit e860681

File tree

1 file changed

+169
-0
lines changed

1 file changed

+169
-0
lines changed

AGENTS.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
This file provides guidance to AI agents when working with code in this repository.
2+
3+
This is the Cluster Monitoring Operator (CMO) - the operator that manages the Prometheus-based monitoring stack in OpenShift. CMO is deployed by the Cluster Version Operator (CVO).
4+
5+
## Architecture Overview
6+
7+
### Jsonnet Manifest Generation
8+
CMO generates Kubernetes manifests using Jsonnet (`jsonnet/`):
9+
- Source: `jsonnet/components/*.libsonnet` and vendored upstream jsonnet
10+
- Output: `assets/` directory with generated YAML manifests
11+
- The operator reads manifests from `assets/` at runtime via `pkg/manifests/`
12+
13+
**Critical**: When modifying manifests, changes must be made in jsonnet source files, then regenerated with `make generate`. Direct edits to `assets/*.yaml` will be overwritten.
14+
15+
### Configuration API
16+
Two ConfigMaps control monitoring behavior:
17+
- `cluster-monitoring-config` (openshift-monitoring) - Platform monitoring
18+
- `user-workload-monitoring-config` (openshift-user-workload-monitoring) - User workload monitoring
19+
20+
Types defined in `pkg/manifests/types.go` with validation rules.
21+
22+
## Development Commands
23+
24+
### Local Development
25+
```bash
26+
export KUBECONFIG=/path/to/kubeconfig # Requires OpenShift cluster
27+
make run-local # Build and run locally as CMO service account
28+
make run-local SWITCH_TO_CMO=false # Run as current user (e.g., kube:admin)
29+
```
30+
31+
### Jsonnet Workflow
32+
```bash
33+
# Modify jsonnet source files in jsonnet/
34+
make generate # Regenerate manifests, docs, and metadata
35+
make docs # Regenerate documentation only (api.md, resources.md)
36+
make check-assets # Verify assets are up to date
37+
```
38+
39+
**Rapid iteration**: For quick testing, you can modify YAML files in `assets/` directly, run the operator with `hack/local-cmo.sh` (no rebuild needed), then port changes back to jsonnet. See `Documentation/development.md` for detailed workflow.
40+
41+
**Two-release annotation/label removal**: To remove a label/annotation from a resource managed by `CreateOrUpdateXXX` functions:
42+
1. First release: Add suffix `"-"` to the annotation/label (CMO deletes it via library-go)
43+
2. Second release: Remove from jsonnet source
44+
45+
### Testing
46+
```bash
47+
make test # All tests (requires OpenShift cluster with KUBECONFIG)
48+
make test-unit # Unit tests only
49+
make test-e2e # E2E tests (requires OpenShift cluster)
50+
make test-ginkgo # Ginkgo tests (ported from openshift-tests-private)
51+
go test -v ./pkg/... -run TestName # Specific unit test
52+
go test -v -timeout=120m -run TestName ./test/e2e/ --kubeconfig $KUBECONFIG # Specific e2e test
53+
```
54+
55+
**openshift-tests-extension**: CMO integrates with the OpenShift conformance test framework via `tests-ext` binary. Run `make tests-ext-update` after modifying Ginkgo tests to update metadata.
56+
57+
### Verification
58+
```bash
59+
make verify # Run all checks
60+
make format # Format code (go fmt, jsonnet fmt, shellcheck)
61+
make golangci-lint # Lint Go code
62+
make check-rules # Validate Prometheus rules with promtool
63+
```
64+
65+
## OpenShift Conventions
66+
67+
### Pull Requests
68+
- **Title format**: `OCPBUGS-12345: descriptive title` (bugs) or `MON-1234: descriptive title` (features)
69+
- Example: `MON-4435: Add RBAC permission for endpointslice resource in UWM prometheus-operator`
70+
- Example: `OCPBUGS-61088: create networkpolicy settings for in-cluster monitoring`
71+
- **Commit format**: `<subsystem>: <what changed>`
72+
- Example: `jsonnet: update prometheus version`
73+
- Example: `e2e: add e2e test to verify endpointslice discovery in uwm`
74+
- All PRs require JIRA ticket reference
75+
76+
### Jira Integration
77+
- **Automatic linking**: PRs are automatically linked to JIRA when the key is in the PR title
78+
- **Lifecycle automation**: [jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin) updates JIRA status based on PR events
79+
- **Jira commands** (comment on PR):
80+
- `/jira refresh` - Manually sync PR with JIRA issue
81+
- `/jira cc @username` - CC someone on the JIRA issue
82+
- `/jira backport <branch>` - Create backport PR to target branch (e.g., `/jira backport release-4.17`)
83+
- `/jira assign <user>` - Assign the JIRA issue to specified user
84+
- `/jira unassign` - Remove current assignee from JIRA issue
85+
- `/jira comment <comment>` - Add comment to the JIRA issue
86+
- `/jira close` - Close the JIRA issue
87+
- `/jira reopen` - Reopen the JIRA issue
88+
- **Creating tickets**: Use OCPBUGS project for bugs, MON project for features
89+
- **Required fields**: Component (Monitoring), Target Version, Priority
90+
- **Status workflow**: To Do → In Progress → Code Review → Done
91+
92+
### Prow CI
93+
- **Triggering tests**: Tests run automatically on PR creation/update
94+
- **Useful commands** (comment on PR):
95+
- `/retest` - Retry all failed tests
96+
- `/test <job-name>` - Run specific job (e.g., `/test e2e-aws`)
97+
- `/test-with <job-name>` - Run specific job with additional tests
98+
- `/retitle <new-title>` - Change PR title
99+
- `/assign @username` - Assign reviewer
100+
- `/cc @username` - Request review without assignment
101+
- `/hold` - Prevent auto-merge, `/hold cancel` to remove
102+
- `/lgtm` - Approve PR (maintainers only)
103+
- `/approve` - Approve for merge (maintainers only)
104+
- `/cherry-pick <branch>` - Cherry-pick to another branch after merge
105+
- **Important jobs**:
106+
- `ci/prow/images` - Builds container images
107+
- `ci/prow/e2e-*` - E2E test variants
108+
- `ci/prow/verify` - Runs `make verify`
109+
- `ci/prow/unit` - Unit tests
110+
- **Viewing results**: Click "Details" next to job to see Prow logs
111+
- **Common failures**:
112+
- `ci/prow/images` fails if `make verify` would fail (run locally first)
113+
- E2E timeouts may be transient (retry with `/retest`)
114+
- **More commands**: See [prow.ci.openshift.org/command-help](https://prow.ci.openshift.org/command-help)
115+
116+
### Feature Development
117+
- **FeatureGate integration**: CMO integrates with OpenShift FeatureGates for controlling feature availability
118+
- Example: `MetricsCollectionProfiles` feature gate controls collection profile functionality
119+
- Check in `pkg/operator/operator.go`: `featureGates.Enabled(features.FeatureGateMetricsCollectionProfiles)`
120+
- Pass to config: `CollectionProfilesFeatureGateEnabled` flag in `pkg/manifests/config.go`
121+
- **TechPreview → GA lifecycle**:
122+
- TechPreview: Feature gated, requires explicit enablement
123+
- GA: Feature gate removed, enabled by default
124+
- **Adding new features**:
125+
1. Add FeatureGate check in `pkg/operator/operator.go`
126+
2. Pass enabled state through config
127+
3. Conditionally create resources based on gate state (e.g., `serviceMonitors()` helper)
128+
4. Update `pkg/manifests/types.go` if new config fields needed
129+
130+
## Updating Jsonnet Dependencies
131+
132+
Example: Updating kube-prometheus bundle:
133+
134+
```bash
135+
cd jsonnet
136+
# Edit jsonnetfile.json, update version for desired component
137+
jb update
138+
# Stage only the version/sum changes for target bundle in jsonnetfile.lock.json
139+
git add -p jsonnetfile.lock.json
140+
# Revert unwanted changes
141+
git restore jsonnetfile.json jsonnetfile.lock.json
142+
# Reinstall with updated lockfile
143+
rm -rf vendor && jb install
144+
cd ..
145+
make generate
146+
```
147+
148+
See `Documentation/development.md` for detailed workflow.
149+
150+
## Common Pitfalls
151+
152+
1. **Forgetting `make generate`**: Modifying jsonnet without regenerating assets causes CI failures
153+
2. **Missing KUBECONFIG**: E2E tests fail silently if KUBECONFIG isn't set, even if `~/.kube/config` exists
154+
3. **Asset sync issues**: Run `make clean` before `make generate` if vendored jsonnet behaves unexpectedly
155+
4. **Wrong cluster type**: Tests require OpenShift, not vanilla Kubernetes
156+
5. **Stale local CMO**: Make sure you have the rights permission when running it local for development or the operator may get stuck within the reconcile loop as it won't have permissions to list or modify resources.
157+
158+
## Documentation
159+
160+
- `CONTRIBUTING.md` - Contribution guidelines and workflow details
161+
- `Documentation/development.md` - Detailed development workflows
162+
- [OpenShift Monitoring Docs](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/monitoring/) - User-facing monitoring documentation
163+
164+
## Important Files
165+
166+
- `Makefile` - All build and test targets
167+
- `VERSION` - Operator version string
168+
- `manifests/` - CVO deployment manifests
169+
- `hack/build-jsonnet.sh` - Jsonnet to YAML conversion logic

0 commit comments

Comments
 (0)