|
| 1 | +<!-- Add useful information to your short description that explains what the product is, why a user wants to install and use it, and any additional details the user needs to get started. The following information is an example. Make sure you update this section accordingly. --> |
| 2 | + |
| 3 | +StormForge Optimize Live delivers continuous, autonomous rightsizing for Kubernetes workloads. |
| 4 | + |
| 5 | +The StormForge Agent is a helm chart which combines the stormforge-agent that surfaces minimum Kubernetes resource (pods, hpa) metrics and a prometheus agent to forward these metrics to StormForge Optimize Live's SaaS backend. |
| 6 | + |
| 7 | +## Before you begin |
| 8 | + |
| 9 | +<!-- List any prereqs including required permissions, capacity requirements, etc. The following information is an example. Make sure you update this section accordingly. --> |
| 10 | + |
| 11 | +* Sign up for a StormForge Optimize Live account by visiting https://app.stormforge.io |
| 12 | +* Download the StormForge CLI by following the instructions here: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#install-the-stormforge-cli-tool |
| 13 | +* Create the access credential that will contain the input variables required to successfully authenticate and deploy the StormForge Agent: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#generate-an-access-credential |
| 14 | + |
| 15 | + |
| 16 | +## Required resources |
| 17 | + |
| 18 | +<!-- The following information is an example. Make sure you update this section accordingly. --> |
| 19 | + |
| 20 | +To run the software, the following resources are required: |
| 21 | + |
| 22 | + * A Kubernetes cluster > v1.16 |
| 23 | + * The StormForge CLI: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#install-the-stormforge-cli-tool |
| 24 | + * A valid StormForge Optimize Live license: https://app.stormforge.io |
| 25 | + |
| 26 | +## Installing the software |
| 27 | + |
| 28 | +<!-- It is recommended to not include the large table of configuration parameters that are listed on the Create page. --> |
| 29 | +Generating the credentials: |
| 30 | + |
| 31 | +``` |
| 32 | +stormforge auth create AUTH_NAME |
| 33 | +``` |
| 34 | + |
| 35 | +It will generate the following file. Save the file locally, i.e. as `AUTH_NAME-credentials.yaml`: |
| 36 | + |
| 37 | +``` |
| 38 | +stormforge: |
| 39 | + address: https://api.stormforge.io/ |
| 40 | +authorization: |
| 41 | + issuer: https://api.stormforge.io/ |
| 42 | + clientID: <CLIENT_ID> # AUTH_NAME |
| 43 | + clientSecret: <CLIENT_SECRET> |
| 44 | +``` |
| 45 | + |
| 46 | +Running the installation (replace `LATEST_VERSION` and `CLUSTER_NAME` in example with appropriate values) |
| 47 | + |
| 48 | +``` |
| 49 | +helm install stormforge-agent oci://registry.stormforge.io/library/stormforge-agent \ |
| 50 | + --version LATEST_VERSION \ |
| 51 | + --namespace stormforge-system \ |
| 52 | + --create-namespace \ |
| 53 | + --values AUTH_NAME-credentials.yaml \ |
| 54 | + --set stormforge.clusterName=CLUSTER_NAME |
| 55 | +``` |
| 56 | + |
| 57 | +### Parameters |
| 58 | + |
| 59 | +<!-- Add additional H3 level headings as needed for sections that apply to IBM Cloud such as network policy, persistence, cluster topologies, etc. |
| 60 | +### H3 |
| 61 | +### H3 |
| 62 | +--> |
| 63 | +| Parameter | Description | Default | |
| 64 | +|-----------------------|----------------------------------------------------------------------|--------------------------------| |
| 65 | +| `stormforge.address` | API endpoint for StormForge Optimize Live Saas | `https://api.stormforge.io` | |
| 66 | +| `authorization.issuer`| Authorization Issuer | `https://api.stormforge.io` | |
| 67 | +| `authorization.clientID` | client.ID string from credential YAML. Visit docs.stormforge.io for details | `[]` | |
| 68 | +| `authorization.clientSecret` | client.Secret string from credential YAML. Visit docs.stormforge.io for details | `[]` | |
| 69 | +| `workload.allowNamespaces` | List specific namespaces for Optimize Live's recommendations. Default behavior is all namespaces expect "kube-system" | `[]`| |
| 70 | +| `workload.denyNamespaces` | List specific namespaces to exclude from Optimize Live's recommendations. Note: `workload.allowNamespaces` and `workload.denyNamespaces` are mutuall exclusive with `workload.Allownamespaces` taking precendence. | `[]`| |
| 71 | +| `stormforge.clusterName`| String used to define Cluster Name in the StormForge SaaS UI | `[]`| |
| 72 | + |
| 73 | +## Upgrading to a new version |
| 74 | + |
| 75 | +<!-- Information about how a user can upgrade to a new version when it's available. The following information is an example. Make sure you update this section accordingly. --> |
| 76 | + |
| 77 | +A typical upgrade might look something like the following. |
| 78 | + |
| 79 | +``` |
| 80 | +helm upgrade stormforge-agent oci://registry.stormforge.io/library/stormforge-agent \ |
| 81 | + --version LATEST_VERSION \ |
| 82 | + --namespace stormforge-system \ |
| 83 | + --reuse-values |
| 84 | +``` |
| 85 | +## Uninstalling the software |
| 86 | + |
| 87 | +<!-- Information about how a user can uninstall this product. The following information is an example. Make sure you update this section accordingly. --> |
| 88 | + |
| 89 | +Complete the following steps to uninstall a Helm Chart from your account. |
| 90 | + |
| 91 | +``` |
| 92 | +helm uninstall RELEASE_NAME [...] [flags] |
| 93 | +``` |
| 94 | +## Workload Metrics |
| 95 | + |
| 96 | +Here are the workload metrics produced by StormForge Agent |
| 97 | + |
| 98 | +| Metric | Source | Why | |
| 99 | +| ------------------------------------------------ | ----------------------------------- | ----------------------------------------------------------------------------- | |
| 100 | +| sf_kube_pod_container_resource_requests | KSM-like/pod-metrics | Track requests for each container | |
| 101 | +| sf_kube_pod_container_resource_limits | KSM-like/pod-metrics | Track limits for each container | |
| 102 | +| sf_kube_replicaset_spec_replicas | KSM-like/replicaset-metrics | Track replicas for replicasets | |
| 103 | +| sf_kube_statefulset_replicas | KSM-like/statefulset-metrics | Track replicas for statefulsets | |
| 104 | +| sf_workload_pod_owner | Consolidated metric for ownership | With this metric, we have pod owner and workload, replacing KSM kube_pod_owner and kube_replicaset_owner | |
| 105 | +| sf_workload_replicas | Consolidated metric for replicas number | With this metric, we have all replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_spec_replicas and kube_statefulset_replicas | |
| 106 | +| sf_workload_spec_replicas | Consolidated metric for desired replicas number | With this metric, we have all desired replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_spec_replicas and kube_statefulset_replicas | |
| 107 | +| sf_workload_status_replicas | Consolidated metric for observed replicas number | With this metric, we have all observed replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_status_replicas and kube_statefulset_replicas | |
| 108 | +| sf_workload_pod_container_resource_requests | Consolidated pod metric with requests | With this metric, we have all requests metrics in a single metric. Should eventually replace KSM-like kube_pod_container_resource_requests | |
| 109 | +| sf_workload_pod_container_resource_limits | Consolidated pod metric with limits | With this metric, we have all limits metrics in a single metric. Should eventually replace KSM-like kube_pod_container_resource_limits | |
| 110 | +| container_cpu_usage_seconds_total | cadvisor | Track cpu usage for each container | |
| 111 | +| container_memory_working_set_bytes | cadvisor | Track memory usage for each container | |
| 112 | +| sf_horizontalpodautoscaler_spec_min_replicas | KSM-like/horizontalpodautoscaler-metrics | Track minimum replicas for each HPA | |
| 113 | +| sf_horizontalpodautoscaler_spec_max_replicas | KSM-like/horizontalpodautoscaler-metrics | Track maximum replicas for each HPA | |
| 114 | +| sf_horizontalpodautoscaler_spec_target_metric | KSM-like/horizontalpodautoscaler-metrics | Track target metric for each HPA | |
| 115 | + |
| 116 | +Individual tenants could have additional metrics. |
| 117 | + |
| 118 | +## Troubleshooting StormForge Agent |
| 119 | + |
| 120 | +### Getting Logs from Prometheus Agent |
| 121 | + |
| 122 | +In case one does not see data on AMP, check the prometheus agent logs. In this example below, the agent is running on namespace `stormforge-system`: |
| 123 | + |
| 124 | +```sh |
| 125 | +kubectl logs -l app.kubernetes.io/name=stormforge-agent --tail=-1 -n stormforge-system -c prom-agent |
| 126 | +``` |
| 127 | + |
| 128 | +If there is no errors, see the next steps. |
| 129 | + |
| 130 | +### Verify Prom Targets |
| 131 | + |
| 132 | +When you install the agent, you should be sure to verify it is actually able to scrape the workload metrics. In particular, stormforge-agent has a static url config which makes it config error prone, which is `https://<>:8080/metrics`. In this example below, the agent is running on namespace `stormforge-system` |
| 133 | + |
| 134 | +```sh |
| 135 | +# e.g. |
| 136 | +kubectl expose deploy/stormforge-agent -n stormforge-system |
| 137 | +kubectl port-forward deploy/stormforge-agent 9090:9090 -n stormforge-system |
| 138 | +# http://localhost:9090/targets?search= to validate targets are being collected |
| 139 | +``` |
| 140 | + |
| 141 | +To look at the actual metrics from the perspective of the stormforge-agent: |
| 142 | + |
| 143 | +```sh |
| 144 | +# e.g. |
| 145 | +kubectl expose deploy/stormforge-agent -n stormforge-system |
| 146 | +kubectl port-forward deploy/stormforge-agent 8080:8080 -n stormforge-system |
| 147 | +# http://localhost:8080/metrics to validate targets are being collected |
| 148 | +``` |
| 149 | + |
| 150 | +### Checking Prometheus WAL |
| 151 | + |
| 152 | +Data should be on the WAL. In this example below, the agent is running on namespace `stormforge-system`: |
| 153 | + |
| 154 | +```sh |
| 155 | +# e.g. |
| 156 | +kubectl exec $(kubectl get pods -n stormforge-system -l app.kubernetes.io/name=stormforge-agent | grep agent | awk '{print $1}') -n stormforge-system -it -c prom-agent -- sh |
| 157 | + |
| 158 | +# inside the pod |
| 159 | +$ promtool tsdb dump data-agent/ | head |
| 160 | + |
| 161 | +# check sf workload metrics |
| 162 | +$ promtool tsdb dump data-agent/ | grep sf_workload | head -5 |
| 163 | + |
| 164 | +# check horizontal metrics |
| 165 | +$ promtool tsdb dump data-agent/ | grep horizontal | head -5 |
| 166 | + |
| 167 | +``` |
| 168 | + |
| 169 | +By default, we are holding 30 minutes on data on WAL. |
| 170 | + |
| 171 | +### Credentials |
| 172 | + |
| 173 | +Credentials are not authorized, ask permission: |
| 174 | + |
| 175 | +``` |
| 176 | +# kubectl logs --tail=-1 -n stormforge-system -l app.kubernetes.io/name=stormforge-agent -c prom-agent |
| 177 | +
|
| 178 | +ts=2023-02-13T22:21:55.813Z caller=dedupe.go:112 component=remote level=error remote_name=d24ad1 url=https://in.dev-1.dev.gramlabs.dev/prometheus/write msg="non-recoverable error" count=77 exemplarCount=0 err="server returned HTTP status 404 Not Found: {\"message\":null}" |
| 179 | +``` |
| 180 | + |
| 181 | +Bad credentials, double check parameters passed during installation (i.e. secrets): |
| 182 | + |
| 183 | +``` |
| 184 | +# kubectl logs --tail=-1 -n stormforge-system -l app.kubernetes.io/name=stormforge-agent -c prom-agent |
| 185 | +
|
| 186 | +ts=2023-02-13T22:25:48.460Z caller=dedupe.go:112 component=remote level=error remote_name=0745da url=https://in.dev-1.dev.gramlabs.dev/prometheus/write msg="non-recoverable error" count=35 exemplarCount=0 err="server returned HTTP status 401 Unauthorized: Authorization malformed or invalid" |
| 187 | +ts=2023-02-13T22:26:03.506Z caller=dedupe.go:112 component=remote level=error remote_name=0745da url=https://in.dev-1.dev.gramlabs.dev/prometheus/write msg="non-recoverable error" count=77 exemplarCount=0 err="server returned HTTP status 401 Unauthorized: Authorization malformed or invalid" |
| 188 | +``` |
| 189 | + |
| 190 | +### Enable debug logging |
| 191 | + |
| 192 | +Debug logging can now be enabled via http requests. |
| 193 | +This should make it more useful to enable debug logging for a short period. |
| 194 | + |
| 195 | +The default log level is `1` ( info ). |
| 196 | + |
| 197 | +This can be changed by: |
| 198 | +``` |
| 199 | +kubectl port-forward -n stormforge-system <stormforge-agent pod> 6060:6060 |
| 200 | +
|
| 201 | +# Default info level logging |
| 202 | +curl -X PUT localhost:6060/debug/loglevel -d level=1 |
| 203 | +# Verbose/Debug logging |
| 204 | +curl -X PUT localhost:6060/debug/loglevel -d level=5 |
| 205 | +# Trace logging |
| 206 | +curl -X PUT localhost:6060/debug/loglevel -d level=9 |
| 207 | +``` |
0 commit comments