Skip to content
This repository was archived by the owner on May 12, 2025. It is now read-only.

Commit 63ea849

Browse files
authored
Merge pull request #78 from erwindaria01/ibm-catalog
Initial commit for IBM Cloud Catalog Support
2 parents 57ad881 + ea17f3b commit 63ea849

File tree

14 files changed

+1414
-0
lines changed

14 files changed

+1414
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
.idea/
33
*.iml
44
.vscode
5+
*.DS_Store

charts/stormforge-agent/Chart.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: v2
2+
appVersion: 2.3.0
3+
description: Bundled StormForge Agent and Prometheus in Agent Mode capturing telemetry
4+
for StormForge systems.
5+
home: https://www.stormforge.io/
6+
icon: https://app.stormforge.io/img/logo.png
7+
kubeVersion: '>= 1.16.x-0'
8+
maintainers:
9+
- name: grambot
10+
name: stormforge-agent
11+
type: application
12+
version: 2.3.0

charts/stormforge-agent/README.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
<!-- Add useful information to your short description that explains what the product is, why a user wants to install and use it, and any additional details the user needs to get started. The following information is an example. Make sure you update this section accordingly. -->
2+
3+
StormForge Optimize Live delivers continuous, autonomous rightsizing for Kubernetes workloads.
4+
5+
The StormForge Agent is a helm chart which combines the stormforge-agent that surfaces minimum Kubernetes resource (pods, hpa) metrics and a prometheus agent to forward these metrics to StormForge Optimize Live's SaaS backend.
6+
7+
## Before you begin
8+
9+
<!-- List any prereqs including required permissions, capacity requirements, etc. The following information is an example. Make sure you update this section accordingly. -->
10+
11+
* Sign up for a StormForge Optimize Live account by visiting https://app.stormforge.io
12+
* Download the StormForge CLI by following the instructions here: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#install-the-stormforge-cli-tool
13+
* Create the access credential that will contain the input variables required to successfully authenticate and deploy the StormForge Agent: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#generate-an-access-credential
14+
15+
16+
## Required resources
17+
18+
<!-- The following information is an example. Make sure you update this section accordingly. -->
19+
20+
To run the software, the following resources are required:
21+
22+
* A Kubernetes cluster > v1.16
23+
* The StormForge CLI: https://docs.stormforge.io/optimize-live/getting-started/install-v2/#install-the-stormforge-cli-tool
24+
* A valid StormForge Optimize Live license: https://app.stormforge.io
25+
26+
## Installing the software
27+
28+
<!-- It is recommended to not include the large table of configuration parameters that are listed on the Create page. -->
29+
Generating the credentials:
30+
31+
```
32+
stormforge auth create AUTH_NAME
33+
```
34+
35+
It will generate the following file. Save the file locally, i.e. as `AUTH_NAME-credentials.yaml`:
36+
37+
```
38+
stormforge:
39+
address: https://api.stormforge.io/
40+
authorization:
41+
issuer: https://api.stormforge.io/
42+
clientID: <CLIENT_ID> # AUTH_NAME
43+
clientSecret: <CLIENT_SECRET>
44+
```
45+
46+
Running the installation (replace `LATEST_VERSION` and `CLUSTER_NAME` in example with appropriate values)
47+
48+
```
49+
helm install stormforge-agent oci://registry.stormforge.io/library/stormforge-agent \
50+
--version LATEST_VERSION \
51+
--namespace stormforge-system \
52+
--create-namespace \
53+
--values AUTH_NAME-credentials.yaml \
54+
--set stormforge.clusterName=CLUSTER_NAME
55+
```
56+
57+
### Parameters
58+
59+
<!-- Add additional H3 level headings as needed for sections that apply to IBM Cloud such as network policy, persistence, cluster topologies, etc.
60+
### H3
61+
### H3
62+
-->
63+
| Parameter | Description | Default |
64+
|-----------------------|----------------------------------------------------------------------|--------------------------------|
65+
| `stormforge.address` | API endpoint for StormForge Optimize Live Saas | `https://api.stormforge.io` |
66+
| `authorization.issuer`| Authorization Issuer | `https://api.stormforge.io` |
67+
| `authorization.clientID` | client.ID string from credential YAML. Visit docs.stormforge.io for details | `[]` |
68+
| `authorization.clientSecret` | client.Secret string from credential YAML. Visit docs.stormforge.io for details | `[]` |
69+
| `workload.allowNamespaces` | List specific namespaces for Optimize Live's recommendations. Default behavior is all namespaces expect "kube-system" | `[]`|
70+
| `workload.denyNamespaces` | List specific namespaces to exclude from Optimize Live's recommendations. Note: `workload.allowNamespaces` and `workload.denyNamespaces` are mutuall exclusive with `workload.Allownamespaces` taking precendence. | `[]`|
71+
| `stormforge.clusterName`| String used to define Cluster Name in the StormForge SaaS UI | `[]`|
72+
73+
## Upgrading to a new version
74+
75+
<!-- Information about how a user can upgrade to a new version when it's available. The following information is an example. Make sure you update this section accordingly. -->
76+
77+
A typical upgrade might look something like the following.
78+
79+
```
80+
helm upgrade stormforge-agent oci://registry.stormforge.io/library/stormforge-agent \
81+
--version LATEST_VERSION \
82+
--namespace stormforge-system \
83+
--reuse-values
84+
```
85+
## Uninstalling the software
86+
87+
<!-- Information about how a user can uninstall this product. The following information is an example. Make sure you update this section accordingly. -->
88+
89+
Complete the following steps to uninstall a Helm Chart from your account.
90+
91+
```
92+
helm uninstall RELEASE_NAME [...] [flags]
93+
```
94+
## Workload Metrics
95+
96+
Here are the workload metrics produced by StormForge Agent
97+
98+
| Metric | Source | Why |
99+
| ------------------------------------------------ | ----------------------------------- | ----------------------------------------------------------------------------- |
100+
| sf_kube_pod_container_resource_requests | KSM-like/pod-metrics | Track requests for each container |
101+
| sf_kube_pod_container_resource_limits | KSM-like/pod-metrics | Track limits for each container |
102+
| sf_kube_replicaset_spec_replicas | KSM-like/replicaset-metrics | Track replicas for replicasets |
103+
| sf_kube_statefulset_replicas | KSM-like/statefulset-metrics | Track replicas for statefulsets |
104+
| sf_workload_pod_owner | Consolidated metric for ownership | With this metric, we have pod owner and workload, replacing KSM kube_pod_owner and kube_replicaset_owner |
105+
| sf_workload_replicas | Consolidated metric for replicas number | With this metric, we have all replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_spec_replicas and kube_statefulset_replicas |
106+
| sf_workload_spec_replicas | Consolidated metric for desired replicas number | With this metric, we have all desired replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_spec_replicas and kube_statefulset_replicas |
107+
| sf_workload_status_replicas | Consolidated metric for observed replicas number | With this metric, we have all observed replica metrics regardless type of pod owner. Should eventually replace KSM-like kube_replicaset_status_replicas and kube_statefulset_replicas |
108+
| sf_workload_pod_container_resource_requests | Consolidated pod metric with requests | With this metric, we have all requests metrics in a single metric. Should eventually replace KSM-like kube_pod_container_resource_requests |
109+
| sf_workload_pod_container_resource_limits | Consolidated pod metric with limits | With this metric, we have all limits metrics in a single metric. Should eventually replace KSM-like kube_pod_container_resource_limits |
110+
| container_cpu_usage_seconds_total | cadvisor | Track cpu usage for each container |
111+
| container_memory_working_set_bytes | cadvisor | Track memory usage for each container |
112+
| sf_horizontalpodautoscaler_spec_min_replicas | KSM-like/horizontalpodautoscaler-metrics | Track minimum replicas for each HPA |
113+
| sf_horizontalpodautoscaler_spec_max_replicas | KSM-like/horizontalpodautoscaler-metrics | Track maximum replicas for each HPA |
114+
| sf_horizontalpodautoscaler_spec_target_metric | KSM-like/horizontalpodautoscaler-metrics | Track target metric for each HPA |
115+
116+
Individual tenants could have additional metrics.
117+
118+
## Troubleshooting StormForge Agent
119+
120+
### Getting Logs from Prometheus Agent
121+
122+
In case one does not see data on AMP, check the prometheus agent logs. In this example below, the agent is running on namespace `stormforge-system`:
123+
124+
```sh
125+
kubectl logs -l app.kubernetes.io/name=stormforge-agent --tail=-1 -n stormforge-system -c prom-agent
126+
```
127+
128+
If there is no errors, see the next steps.
129+
130+
### Verify Prom Targets
131+
132+
When you install the agent, you should be sure to verify it is actually able to scrape the workload metrics. In particular, stormforge-agent has a static url config which makes it config error prone, which is `https://<>:8080/metrics`. In this example below, the agent is running on namespace `stormforge-system`
133+
134+
```sh
135+
# e.g.
136+
kubectl expose deploy/stormforge-agent -n stormforge-system
137+
kubectl port-forward deploy/stormforge-agent 9090:9090 -n stormforge-system
138+
# http://localhost:9090/targets?search= to validate targets are being collected
139+
```
140+
141+
To look at the actual metrics from the perspective of the stormforge-agent:
142+
143+
```sh
144+
# e.g.
145+
kubectl expose deploy/stormforge-agent -n stormforge-system
146+
kubectl port-forward deploy/stormforge-agent 8080:8080 -n stormforge-system
147+
# http://localhost:8080/metrics to validate targets are being collected
148+
```
149+
150+
### Checking Prometheus WAL
151+
152+
Data should be on the WAL. In this example below, the agent is running on namespace `stormforge-system`:
153+
154+
```sh
155+
# e.g.
156+
kubectl exec $(kubectl get pods -n stormforge-system -l app.kubernetes.io/name=stormforge-agent | grep agent | awk '{print $1}') -n stormforge-system -it -c prom-agent -- sh
157+
158+
# inside the pod
159+
$ promtool tsdb dump data-agent/ | head
160+
161+
# check sf workload metrics
162+
$ promtool tsdb dump data-agent/ | grep sf_workload | head -5
163+
164+
# check horizontal metrics
165+
$ promtool tsdb dump data-agent/ | grep horizontal | head -5
166+
167+
```
168+
169+
By default, we are holding 30 minutes on data on WAL.
170+
171+
### Credentials
172+
173+
Credentials are not authorized, ask permission:
174+
175+
```
176+
# kubectl logs --tail=-1 -n stormforge-system -l app.kubernetes.io/name=stormforge-agent -c prom-agent
177+
178+
ts=2023-02-13T22:21:55.813Z caller=dedupe.go:112 component=remote level=error remote_name=d24ad1 url=https://in.dev-1.dev.gramlabs.dev/prometheus/write msg="non-recoverable error" count=77 exemplarCount=0 err="server returned HTTP status 404 Not Found: {\"message\":null}"
179+
```
180+
181+
Bad credentials, double check parameters passed during installation (i.e. secrets):
182+
183+
```
184+
# kubectl logs --tail=-1 -n stormforge-system -l app.kubernetes.io/name=stormforge-agent -c prom-agent
185+
186+
ts=2023-02-13T22:25:48.460Z caller=dedupe.go:112 component=remote level=error remote_name=0745da url=https://in.dev-1.dev.gramlabs.dev/prometheus/write msg="non-recoverable error" count=35 exemplarCount=0 err="server returned HTTP status 401 Unauthorized: Authorization malformed or invalid"
187+
ts=2023-02-13T22:26:03.506Z caller=dedupe.go:112 component=remote level=error remote_name=0745da url=https://in.dev-1.dev.gramlabs.dev/prometheus/write msg="non-recoverable error" count=77 exemplarCount=0 err="server returned HTTP status 401 Unauthorized: Authorization malformed or invalid"
188+
```
189+
190+
### Enable debug logging
191+
192+
Debug logging can now be enabled via http requests.
193+
This should make it more useful to enable debug logging for a short period.
194+
195+
The default log level is `1` ( info ).
196+
197+
This can be changed by:
198+
```
199+
kubectl port-forward -n stormforge-system <stormforge-agent pod> 6060:6060
200+
201+
# Default info level logging
202+
curl -X PUT localhost:6060/debug/loglevel -d level=1
203+
# Verbose/Debug logging
204+
curl -X PUT localhost:6060/debug/loglevel -d level=5
205+
# Trace logging
206+
curl -X PUT localhost:6060/debug/loglevel -d level=9
207+
```
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
stormforge:
3+
clusterName: my-cluster
4+
authorization:
5+
clientID: "id"
6+
clientSecret: "secret"
7+
issuer: "https://api.stormforge.io/"

0 commit comments

Comments
 (0)