|
1 | | -# Power Monitoring Must-Gather |
| 1 | +# Kepler Operator Must-Gather |
2 | 2 |
|
3 | | -The Power Monitoring `must-gather` tool is designed to collect |
4 | | -information about power monitoring components within an OpenShift cluster. |
5 | | -This tool extends the functionality of [OpenShift must-gather](https://github.com/openshift/must-gather) |
6 | | -to specifically target and retrieve data related to power monitoring, |
7 | | -including support for both the upstream Kepler Operator and the |
8 | | -Power Monitoring Operator. |
| 3 | +`kepler-operator-must-gather` is a tool for collecting diagnostic information about the Kepler Operator and Power Monitoring components within an OpenShift cluster. This tool extends the functionality of [OpenShift must-gather](https://github.com/openshift/must-gather) to specifically target and retrieve data related to power monitoring deployments. |
| 4 | + |
| 5 | +## About |
| 6 | + |
| 7 | +The Kepler Operator must-gather tool collects information about: |
| 8 | + |
| 9 | +- Operator deployment and configuration |
| 10 | +- OLM (Operator Lifecycle Manager) resources |
| 11 | +- Power monitoring instances (Kepler deployments) |
| 12 | +- Prometheus metrics and monitoring configuration |
| 13 | +- Pod logs and runtime information |
| 14 | +- Hardware information (kernel version, RAPL data) |
| 15 | + |
| 16 | +This tool supports both the upstream **Kepler Operator** and the downstream **Power Monitoring Operator**. |
| 17 | + |
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +- OpenShift CLI (`oc`) installed and configured |
| 21 | +- Access to an OpenShift cluster where the operator is deployed |
| 22 | +- Appropriate RBAC permissions to collect cluster resources |
9 | 23 |
|
10 | 24 | ## Usage |
11 | 25 |
|
12 | | -To run the must-gather, use one of the following |
13 | | -commands, depending on the operator and namespace where it is deployed |
| 26 | +### Using the image from the operator deployment |
14 | 27 |
|
15 | | -### Using the image from the Operator deployment |
| 28 | +To run must-gather using the operator's current image: |
16 | 29 |
|
17 | 30 | ```sh |
18 | | -oc adm must-gather --image=$(oc -n <namespace> get deployment.apps/kepler-operator-controller -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') -- /usr/bin/gather --operator <operator-name> --ns <namespace> |
| 31 | +oc adm must-gather \ |
| 32 | + --image=$(oc -n <namespace> get deployment.apps/kepler-operator-controller \ |
| 33 | + -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \ |
| 34 | + -- /usr/bin/gather --operator <operator-name> --ns <namespace> |
19 | 35 | ``` |
20 | 36 |
|
21 | | -Replace `<namespace>` with the namespace where the operator is deployed, and |
22 | | -`<operator-name>` with the name of the operator(e.g. `kepler-operator` or `power-monitoring-operator`). |
| 37 | +**Parameters:** |
| 38 | + |
| 39 | +- `<namespace>`: The namespace where the operator is deployed (e.g., `openshift-operators`) |
| 40 | +- `<operator-name>`: The name of the operator (e.g., `kepler-operator` or `power-monitoring-operator`) |
| 41 | + |
| 42 | +**Example:** |
| 43 | + |
| 44 | +```sh |
| 45 | +oc adm must-gather \ |
| 46 | + --image=$(oc -n openshift-operators get deployment.apps/kepler-operator-controller \ |
| 47 | + -o jsonpath='{.spec.template.spec.containers[?(@.name == "manager")].image}') \ |
| 48 | + -- /usr/bin/gather --operator kepler-operator --ns openshift-operators |
| 49 | +``` |
23 | 50 |
|
24 | 51 | ### Using a specific image |
25 | 52 |
|
| 53 | +To use a specific must-gather image: |
| 54 | + |
| 55 | +```sh |
| 56 | +oc adm must-gather \ |
| 57 | + --image=quay.io/sustainable_computing_io/kepler-operator:v1alpha1 \ |
| 58 | + -- /usr/bin/gather --operator <operator-name> --ns <namespace> |
| 59 | +``` |
| 60 | + |
| 61 | +**Example:** |
| 62 | + |
| 63 | +```sh |
| 64 | +oc adm must-gather \ |
| 65 | + --image=quay.io/sustainable_computing_io/kepler-operator:v1alpha1 \ |
| 66 | + -- /usr/bin/gather --operator kepler-operator --ns openshift-operators |
| 67 | +``` |
| 68 | + |
| 69 | +### Command-line options |
| 70 | + |
| 71 | +The `gather` script supports the following options: |
| 72 | + |
| 73 | +```sh |
| 74 | +Options: |
| 75 | + --operator | -o Specify the name of the operator that is deployed |
| 76 | + Default: kepler-operator |
| 77 | + |
| 78 | + --ns | -n Namespace where the operator is deployed |
| 79 | + Default: openshift-operators |
| 80 | + |
| 81 | + --dest-dir | -d Gather collection path |
| 82 | + Default: /must-gather |
| 83 | + |
| 84 | + --help | -h Display help information |
| 85 | +``` |
| 86 | + |
| 87 | +## Collected Information |
| 88 | + |
| 89 | +### Operator Information |
| 90 | + |
| 91 | +- Catalog source, subscription, and install plan |
| 92 | +- ClusterServiceVersion (CSV) |
| 93 | +- Operator deployment configuration |
| 94 | +- Operator pod details and logs |
| 95 | + |
| 96 | +### OLM Resources |
| 97 | + |
| 98 | +- All OLM-managed resources related to the operator |
| 99 | +- Summary of operator lifecycle management state |
| 100 | + |
| 101 | +### Power Monitor Components |
| 102 | + |
| 103 | +- PowerMonitor custom resource and internals |
| 104 | +- DaemonSet configuration |
| 105 | +- ConfigMap settings |
| 106 | +- ServiceAccount and SecurityContextConstraints |
| 107 | +- Events in the power monitoring namespace |
| 108 | + |
| 109 | +### Pod-Level Information |
| 110 | + |
| 111 | +For each Kepler pod: |
| 112 | + |
| 113 | +- Pod specification and status |
| 114 | +- Container logs |
| 115 | +- Kernel version information |
| 116 | +- RAPL (Running Average Power Limit) capabilities |
| 117 | +- Hardware power monitoring capabilities |
| 118 | + |
| 119 | +### Monitoring Information (if available) |
| 120 | + |
| 121 | +- Prometheus rules and configuration |
| 122 | +- Active targets |
| 123 | +- Time-series database (TSDB) status |
| 124 | +- Runtime information from Prometheus replicas |
| 125 | + |
| 126 | +## Troubleshooting |
| 127 | + |
| 128 | +### Common issues |
| 129 | + |
| 130 | +**Issue:** `Error from server (NotFound): deployments.apps "kepler-operator-controller" not found` |
| 131 | + |
| 132 | +**Solution:** Verify the operator is installed and the namespace is correct: |
| 133 | + |
26 | 134 | ```sh |
27 | | -oc adm must-gather --image=quay.io/sustainable_computing_io/kepler-operator:v1alpha1 -- /usr/bin/gather --operator <operator-name> --ns <namespace> |
| 135 | +oc get deployment -n <namespace> |
| 136 | +``` |
| 137 | + |
| 138 | +**Issue:** `error: insufficient permissions` |
| 139 | + |
| 140 | +**Solution:** Ensure you have cluster-admin privileges or appropriate RBAC permissions to run must-gather. |
| 141 | + |
| 142 | +**Issue:** `cannot gather UWM details; skipping gathering monitoring info` |
| 143 | + |
| 144 | +**Solution:** This is informational. The must-gather will still collect other resources. User Workload Monitoring may not be configured in your cluster. |
| 145 | + |
| 146 | +### Viewing must-gather logs |
| 147 | + |
| 148 | +Check the `gather-debug.log` file in the output directory for detailed information about the collection process: |
28 | 149 |
|
| 150 | +```sh |
| 151 | +cat must-gather.local.<timestamp>/gather-debug.log |
29 | 152 | ``` |
30 | 153 |
|
31 | | -Running these commands will collect and store information in a newly created directory, based on the specified operator and namespace. |
| 154 | +## Development |
| 155 | + |
| 156 | +### Collection Scripts |
| 157 | + |
| 158 | +Data collection scripts are located in: |
| 159 | + |
| 160 | +- `gather` - Main collection script |
| 161 | +- `utils` - Utility functions for logging and command execution |
| 162 | + |
| 163 | +### Testing Changes |
| 164 | + |
| 165 | +To test must-gather changes locally: |
| 166 | + |
| 167 | +1. Build the operator image with your changes: |
| 168 | + |
| 169 | + ```sh |
| 170 | + make operator-build operator-push IMG_BASE=<your-registry> |
| 171 | + ``` |
| 172 | + |
| 173 | +2. Run must-gather with your custom image: |
| 174 | + |
| 175 | + ```sh |
| 176 | + oc adm must-gather \ |
| 177 | + --image=<your-registry>/kepler-operator:<tag> \ |
| 178 | + -- /usr/bin/gather --operator kepler-operator --ns openshift-operators |
| 179 | + ``` |
| 180 | + |
| 181 | +3. Verify the collected data in the output directory |
| 182 | + |
| 183 | +## Additional Resources |
| 184 | + |
| 185 | +- [OpenShift must-gather documentation](https://docs.openshift.com/container-platform/latest/support/gathering-cluster-data.html) |
| 186 | + |
| 187 | +## Contributing |
| 188 | + |
| 189 | +For issues or improvements related to must-gather: |
| 190 | + |
| 191 | +- Report issues: [GitHub Issues](https://github.com/sustainable-computing-io/kepler-operator/issues) |
| 192 | +- Submit pull requests: [GitHub Pull Requests](https://github.com/sustainable-computing-io/kepler-operator/pulls) |
0 commit comments