Skip to content

Commit 25c886c

Browse files
committed
docs: improve must-gather guide
This commit improves the must-gather readme Signed-off-by: Vibhu Prashar <[email protected]>
1 parent 4f52a99 commit 25c886c

File tree

1 file changed

+176
-15
lines changed

1 file changed

+176
-15
lines changed

must-gather/README.md

Lines changed: 176 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,192 @@
1-
# Power Monitoring Must-Gather
1+
# Kepler Operator Must-Gather
22

3-
The Power Monitoring `must-gather` tool is designed to collect
4-
information about power monitoring components within an OpenShift cluster.
5-
This tool extends the functionality of [OpenShift must-gather](https://github.com/openshift/must-gather)
6-
to specifically target and retrieve data related to power monitoring,
7-
including support for both the upstream Kepler Operator and the
8-
Power Monitoring Operator.
3+
`kepler-operator-must-gather` is a tool for collecting diagnostic information about the Kepler Operator and Power Monitoring components within an OpenShift cluster. This tool extends the functionality of [OpenShift must-gather](https://github.com/openshift/must-gather) to specifically target and retrieve data related to power monitoring deployments.
4+
5+
## About
6+
7+
The Kepler Operator must-gather tool collects information about:
8+
9+
- Operator deployment and configuration
10+
- OLM (Operator Lifecycle Manager) resources
11+
- Power monitoring instances (Kepler deployments)
12+
- Prometheus metrics and monitoring configuration
13+
- Pod logs and runtime information
14+
- Hardware information (kernel version, RAPL data)
15+
16+
This tool supports both the upstream **Kepler Operator** and the downstream **Power Monitoring Operator**.
17+
18+
## Prerequisites
19+
20+
- OpenShift CLI (`oc`) installed and configured
21+
- Access to an OpenShift cluster where the operator is deployed
22+
- Appropriate RBAC permissions to collect cluster resources
923

1024
## Usage
1125

12-
To run the must-gather, use one of the following
13-
commands, depending on the operator and namespace where it is deployed
26+
### Using the image from the operator deployment
1427

15-
### Using the image from the Operator deployment
28+
To run must-gather using the operator's current image:
1629

1730
```sh
18-
oc adm must-gather --image=$(oc -n <namespace> get deployment.apps/kepler-operator-controller -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') -- /usr/bin/gather --operator <operator-name> --ns <namespace>
31+
oc adm must-gather \
32+
--image=$(oc -n <namespace> get deployment.apps/kepler-operator-controller \
33+
-o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \
34+
-- /usr/bin/gather --operator <operator-name> --ns <namespace>
1935
```
2036

21-
Replace `<namespace>` with the namespace where the operator is deployed, and
22-
`<operator-name>` with the name of the operator(e.g. `kepler-operator` or `power-monitoring-operator`).
37+
**Parameters:**
38+
39+
- `<namespace>`: The namespace where the operator is deployed (e.g., `openshift-operators`)
40+
- `<operator-name>`: The name of the operator (e.g., `kepler-operator` or `power-monitoring-operator`)
41+
42+
**Example:**
43+
44+
```sh
45+
oc adm must-gather \
46+
--image=$(oc -n openshift-operators get deployment.apps/kepler-operator-controller \
47+
-o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \
48+
-- /usr/bin/gather --operator kepler-operator --ns openshift-operators
49+
```
2350

2451
### Using a specific image
2552

53+
To use a specific must-gather image:
54+
55+
```sh
56+
oc adm must-gather \
57+
--image=quay.io/sustainable_computing_io/kepler-operator:v1alpha1 \
58+
-- /usr/bin/gather --operator <operator-name> --ns <namespace>
59+
```
60+
61+
**Example:**
62+
63+
```sh
64+
oc adm must-gather \
65+
--image=quay.io/sustainable_computing_io/kepler-operator:v1alpha1 \
66+
-- /usr/bin/gather --operator kepler-operator --ns openshift-operators
67+
```
68+
69+
### Command-line options
70+
71+
The `gather` script supports the following options:
72+
73+
```sh
74+
Options:
75+
--operator | -o Specify the name of the operator that is deployed
76+
Default: kepler-operator
77+
78+
--ns | -n Namespace where the operator is deployed
79+
Default: openshift-operators
80+
81+
--dest-dir | -d Gather collection path
82+
Default: /must-gather
83+
84+
--help | -h Display help information
85+
```
86+
87+
## Collected Information
88+
89+
### Operator Information
90+
91+
- Catalog source, subscription, and install plan
92+
- ClusterServiceVersion (CSV)
93+
- Operator deployment configuration
94+
- Operator pod details and logs
95+
96+
### OLM Resources
97+
98+
- All OLM-managed resources related to the operator
99+
- Summary of operator lifecycle management state
100+
101+
### Power Monitor Components
102+
103+
- PowerMonitor custom resource and internals
104+
- DaemonSet configuration
105+
- ConfigMap settings
106+
- ServiceAccount and SecurityContextConstraints
107+
- Events in the power monitoring namespace
108+
109+
### Pod-Level Information
110+
111+
For each Kepler pod:
112+
113+
- Pod specification and status
114+
- Container logs
115+
- Kernel version information
116+
- RAPL (Running Average Power Limit) capabilities
117+
- Hardware power monitoring capabilities
118+
119+
### Monitoring Information (if available)
120+
121+
- Prometheus rules and configuration
122+
- Active targets
123+
- Time-series database (TSDB) status
124+
- Runtime information from Prometheus replicas
125+
126+
## Troubleshooting
127+
128+
### Common issues
129+
130+
**Issue:** `Error from server (NotFound): deployments.apps "kepler-operator-controller" not found`
131+
132+
**Solution:** Verify the operator is installed and the namespace is correct:
133+
26134
```sh
27-
oc adm must-gather --image=quay.io/sustainable_computing_io/kepler-operator:v1alpha1 -- /usr/bin/gather --operator <operator-name> --ns <namespace>
135+
oc get deployment -n <namespace>
136+
```
137+
138+
**Issue:** `error: insufficient permissions`
139+
140+
**Solution:** Ensure you have cluster-admin privileges or appropriate RBAC permissions to run must-gather.
141+
142+
**Issue:** `cannot gather UWM details; skipping gathering monitoring info`
143+
144+
**Solution:** This is informational. The must-gather will still collect other resources. User Workload Monitoring may not be configured in your cluster.
145+
146+
### Viewing must-gather logs
147+
148+
Check the `gather-debug.log` file in the output directory for detailed information about the collection process:
28149

150+
```sh
151+
cat must-gather.local.<timestamp>/gather-debug.log
29152
```
30153

31-
Running these commands will collect and store information in a newly created directory, based on the specified operator and namespace.
154+
## Development
155+
156+
### Collection Scripts
157+
158+
Data collection scripts are located in:
159+
160+
- `gather` - Main collection script
161+
- `utils` - Utility functions for logging and command execution
162+
163+
### Testing Changes
164+
165+
To test must-gather changes locally:
166+
167+
1. Build the operator image with your changes:
168+
169+
```sh
170+
make operator-build operator-push IMG_BASE=<your-registry>
171+
```
172+
173+
2. Run must-gather with your custom image:
174+
175+
```sh
176+
oc adm must-gather \
177+
--image=<your-registry>/kepler-operator:<tag> \
178+
-- /usr/bin/gather --operator kepler-operator --ns openshift-operators
179+
```
180+
181+
3. Verify the collected data in the output directory
182+
183+
## Additional Resources
184+
185+
- [OpenShift must-gather documentation](https://docs.openshift.com/container-platform/latest/support/gathering-cluster-data.html)
186+
187+
## Contributing
188+
189+
For issues or improvements related to must-gather:
190+
191+
- Report issues: [GitHub Issues](https://github.com/sustainable-computing-io/kepler-operator/issues)
192+
- Submit pull requests: [GitHub Pull Requests](https://github.com/sustainable-computing-io/kepler-operator/pulls)

0 commit comments

Comments
 (0)