You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sources/k6/next/set-up/set-up-distributed-k6/troubleshooting.md
+3-209Lines changed: 3 additions & 209 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,123 +7,9 @@ title: Troubleshooting
7
7
8
8
This topic includes instructions to help you troubleshoot common issues with the k6 Operator.
9
9
10
-
## Common tricks
10
+
## How to troubleshoot
11
11
12
-
### Test your script locally
13
-
14
-
Always run your script locally before trying to run it with the k6 Operator:
15
-
16
-
```bash
17
-
k6 run script.js
18
-
```
19
-
20
-
If you're using environment variables or CLI options, pass them in as well:
21
-
22
-
```bash
23
-
MY_ENV_VAR=foo k6 run script.js --tag my_tag=bar
24
-
```
25
-
26
-
That ensures that the script has correct syntax and can be parsed with k6 in the first place. Additionally, running locally can help you check if the configured options are doing what you expect. If there are any errors or unexpected results in the output of `k6 run`, make sure to fix those prior to deploying the script elsewhere.
27
-
28
-
### `TestRun` deployment
29
-
30
-
#### The pods
31
-
32
-
In case of one `TestRun` Custom Resource (CR) creation with `parallelism: n`, there are certain repeating patterns:
33
-
34
-
1. There will be `n + 2` Jobs (with corresponding Pods) created: initializer, starter, and `n` runners.
35
-
1. If any of these Jobs didn't result in a Pod being deployed, there must be an issue with that Job. Some commands that can help here:
36
-
37
-
```bash
38
-
kubectl get jobs -A
39
-
kubectl describe job mytest-initializer
40
-
```
41
-
42
-
1. If one of the Pods was deployed but finished with `Error`, you can check its logs with the following command:
43
-
44
-
```bash
45
-
kubectl logs mytest-initializer-xxxxx
46
-
```
47
-
48
-
If the Pods seem to be working but not producing an expected result and there's not enough information in the logs, you can use the k6 [verbose option](https://grafana.com/docs/k6/<K6_VERSION>/using-k6/k6-options/#options) in the `TestRun` spec:
49
-
50
-
```yaml
51
-
apiVersion: k6.io/v1alpha1
52
-
kind: TestRun
53
-
metadata:
54
-
name: k6-sample
55
-
spec:
56
-
parallelism: 2
57
-
script:
58
-
configMap:
59
-
name: 'test'
60
-
file: 'test.js'
61
-
arguments: --verbose
62
-
```
63
-
64
-
#### k6 Operator
65
-
66
-
Another source of info is the k6 Operator itself. It's deployed as a Kubernetes `Deployment`, with `replicas: 1` by default, and its logs together with observations about the Pods from the previous section usually contain enough information to help you diagnose any issues. With the standard deployment, the logs of the k6 Operator can be checked with:
After you deploy a `TestRun` CR, you can inspect it the same way as any other resource:
75
-
76
-
```bash
77
-
kubectl describe testrun my-testrun
78
-
```
79
-
80
-
Firstly, check if the spec is as expected. Then, see the current status:
81
-
82
-
```yaml
83
-
Status:
84
-
Conditions:
85
-
Last Transition Time: 2024-01-17T10:30:01Z
86
-
Message:
87
-
Reason: CloudTestRunFalse
88
-
Status: False
89
-
Type: CloudTestRun
90
-
Last Transition Time: 2024-01-17T10:29:58Z
91
-
Message:
92
-
Reason: TestRunPreparation
93
-
Status: Unknown
94
-
Type: TestRunRunning
95
-
Last Transition Time: 2024-01-17T10:29:58Z
96
-
Message:
97
-
Reason: CloudTestRunAbortedFalse
98
-
Status: False
99
-
Type: CloudTestRunAborted
100
-
Last Transition Time: 2024-01-17T10:29:58Z
101
-
Message:
102
-
Reason: CloudPLZTestRunFalse
103
-
Status: False
104
-
Type: CloudPLZTestRun
105
-
Stage: error
106
-
```
107
-
108
-
If `Stage` is equal to `error`, you can check the logs of k6 Operator.
109
-
110
-
Conditions can be used as a source of info as well, but it's a more advanced troubleshooting option that should be used if the previous steps weren't enough to diagnose the issue. Note that conditions that start with the `Cloud` prefix only matter in the setting of k6 Cloud test runs, for example, for cloud output and PLZ test runs.
111
-
112
-
### `PrivateLoadZone` deployment
113
-
114
-
If the `PrivateLoadZone` CR was successfully created in Kubernetes, it should become visible in your account in Grafana Cloud k6 (GCk6) interface soon afterwards. If it doesn't appear in the UI, then there is likely a problem to troubleshoot.
115
-
116
-
First, go over the [guide](https://grafana.com/docs/grafana-cloud/k6/author-run/private-load-zone-v2/) to double-check if all the steps have been done correctly and successfully.
117
-
118
-
Unlike `TestRun` deployment, when a `PrivateLoadZone` is first created, there are no additional resources deployed. So, the only source for troubleshooting are the logs of k6 Operator. See the [previous subsection](#k6-operator) on how to access its logs. Any errors there might be a hint to diagnose the issue. Refer to [PrivateLoadZone: subscription error](#privateloadzone-subscription-error) for more details.
119
-
120
-
### Running tests in `PrivateLoadZone`
121
-
122
-
Each time a user runs a test in a PLZ, for example with `k6 cloud run script.js`, there is a corresponding `TestRun` being deployed by the k6 Operator. This `TestRun` will be deployed in the same namespace as its `PrivateLoadZone`. If the test is misbehaving, for example, it errors out, or doesn't produce the expected result, then you can check:
123
-
124
-
1. If there are any messages in the GCk6 UI.
125
-
2. If there are any messages in the output of the `k6 cloud run` command.
126
-
3. The resources and their logs, the same way as with a [standalone `TestRun` deployment](#testrun-deployment)
@@ -165,96 +51,4 @@ If the standalone `k6 inspect --execution-requirements` executes successfully, t
165
51
-:information_source: k6 Operator expects the initializer logs to contain only the output of `k6 inspect`. If there are any other log lines present, then the k6 Operator will fail to parse it and the test won't start. Refer to this [issue](https://github.com/grafana/k6-operator/issues/193) for more details.
166
52
- Check events in the initializer Job and Pod as they may contain another hint about what's wrong.
167
53
168
-
### Non-existent ServiceAccount
169
-
170
-
A ServiceAccount can be defined as `serviceAccountName` in a PrivateLoadZone, and as `runner.serviceAccountName` in a TestRun CRD. If the specified ServiceAccount doesn't exist, k6 Operator will successfully create Jobs but corresponding Pods will fail to be deployed, and the k6 Operator will wait indefinitely for Pods to be `Ready`. This error can be best seen in the events of the Job:
171
-
172
-
```bash
173
-
kubectl describe job plz-test-xxxxxx-initializer
174
-
...
175
-
Events:
176
-
Warning FailedCreate 57s (x4 over 2m7s) job-controller Error creating: pods "plz-test-xxxxxx-initializer-" is forbidden: error looking up service account plz-ns/plz-sa: serviceaccount "plz-sa" not found
177
-
```
178
-
179
-
k6 Operator doesn't try to analyze such scenarios on its own, but you can refer to the following [issue](https://github.com/grafana/k6-operator/issues/260) for improvements.
180
-
181
-
#### How to fix
182
-
183
-
To fix this issue, the incorrect `serviceAccountName` must be corrected, and the TestRun or PrivateLoadZone resource must be re-deployed.
184
-
185
-
### Non-existent `nodeSelector`
186
-
187
-
`nodeSelector`can be defined as `nodeSelector` in a PrivateLoadZone, and as `runner.nodeSelector` in the TestRun CRD.
188
-
189
-
This case is very similar to the [ServiceAccount](#non-existent-serviceaccount): the Pod creation will fail, but the error is slightly different:
190
-
191
-
```bash
192
-
kubectl describe pod plz-test-xxxxxx-initializer-xxxxx
193
-
...
194
-
Events:
195
-
Warning FailedScheduling 48s (x5 over 4m6s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector.
196
-
```
197
-
198
-
#### How to fix
199
-
200
-
To fix this issue, the incorrect `nodeSelector` must be corrected and the TestRun or PrivateLoadZone resource must be re-deployed.
201
-
202
-
### Insufficient resources
203
-
204
-
A related problem can happen when the cluster does not have sufficient resources to deploy the runners. There's a higher probability of hitting this issue when setting small CPU and memory limits for runners or using options like `nodeSelector`, `runner.affinity` or `runner.topologySpreadConstraints`, and not having a set of nodes matching the spec. Alternatively, it can happen if there is a high number of runners required for the test (via `parallelism` in TestRun or during PLZ test run) and autoscaling of the cluster has limits on the maximum number of nodes, and can't provide the required resources on time or at all.
205
-
206
-
This case is somewhat similar to the previous two: the k6 Operator will wait indefinitely and can be monitored with events in Jobs and Pods. If it's possible to fix the issue with insufficient resources on-the-fly, for example, by adding more nodes, k6 Operator will attempt to continue executing a test run.
207
-
208
-
### OOM of a runner Pod
209
-
210
-
If there's at least one runner Pod that OOM-ed, the whole test will be [stuck](https://github.com/grafana/k6-operator/issues/251) and will have to be deleted manually:
211
-
212
-
```bash
213
-
kubectl -f my-test.yaml delete
214
-
# or
215
-
kubectl delete testrun my-test
216
-
```
217
-
218
-
In case of OOM, it makes sense to review the k6 script to understand what kind of resource usage this script requires. It may be that the k6 script can be improved to be more performant. Then, set the `spec.runner.resources` in the TestRun CRD, or `spec.resources` in the PrivateLoadZone CRD accordingly.
219
-
220
-
### PrivateLoadZone: subscription error
221
-
222
-
If there's an issue with your Grafana Cloud k6 subscription, there will be a 400 error in the logs with the message detailing the problem. For example:
223
-
224
-
```bash
225
-
"Received error `(400) You have reached the maximum Number of private load zones your organization is allowed to have. Please contact support if you want to create more.`. Message from server ``"
226
-
```
227
-
228
-
To fix this issue, check your organization settings in Grafana Cloud k6 or contact Support.
229
-
230
-
### PrivateLoadZone: Wrong token
231
-
232
-
There can be two major problems with the authentication token:
233
-
234
-
1. If the token wasn't created, or was created in a wrong location, the logs will show the following error:
2. If the token contains a corrupted value, or it's not an organizational token, the logs will show the following error:
241
-
242
-
```bash
243
-
"Received error `(403) Authentication token incorrect or expired`. Message from server ``"
244
-
```
245
-
246
-
### PrivateLoadZone: Networking setup
247
-
248
-
If you see any dial or connection errors in the logs of the k6 Operator, it makes sense to double-check the networking setup. For a PrivateLoadZone to operate, outbound traffic to Grafana Cloud k6 [must be allowed](https://grafana.com/docs/grafana-cloud/k6/author-run/private-load-zone-v2/#before-you-begin). To check the reachability of Grafana Cloud k6 endpoints:
For more resources on troubleshooting networking, refer to the [Kubernetes docs](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/).
257
-
258
-
### PrivateLoadZone: Insufficient resources
259
-
260
-
The PrivateLoadZone insufficient resources problem is similar to [insufficient resources issue](#insufficient-resources). But, when running a PrivateLoadZone test, the k6 Operator will wait only for a timeout period. When the timeout period is up, the test will be aborted by Grafana Cloud k6 and marked as such, both in the PrivateLoadZone and in Grafana Cloud k6. In other words, there is a time limit to fix this issue without restarting the test run.
title: Shared scenarios for troubleshooting k6 Operator
3
+
---
4
+
5
+
### Non-existent ServiceAccount
6
+
7
+
A ServiceAccount can be defined as `serviceAccountName` in a PrivateLoadZone, and as `runner.serviceAccountName` in a TestRun CRD. If the specified ServiceAccount doesn't exist, k6 Operator will successfully create Jobs but corresponding Pods will fail to be deployed, and the k6 Operator will wait indefinitely for Pods to be `Ready`. This error can be best seen in the events of the Job:
8
+
9
+
```bash
10
+
kubectl describe job plz-test-xxxxxx-initializer
11
+
...
12
+
Events:
13
+
Warning FailedCreate 57s (x4 over 2m7s) job-controller Error creating: pods "plz-test-xxxxxx-initializer-" is forbidden: error looking up service account plz-ns/plz-sa: serviceaccount "plz-sa" not found
14
+
```
15
+
16
+
k6 Operator doesn't try to analyze such scenarios on its own, but you can refer to the following [issue](https://github.com/grafana/k6-operator/issues/260) for improvements.
17
+
18
+
#### How to fix
19
+
20
+
To fix this issue, the incorrect `serviceAccountName` must be corrected, and the TestRun or PrivateLoadZone resource must be re-deployed.
21
+
22
+
### Non-existent `nodeSelector`
23
+
24
+
`nodeSelector` can be defined as `nodeSelector` in a PrivateLoadZone, and as `runner.nodeSelector` in the TestRun CRD.
25
+
26
+
This case is very similar to the [ServiceAccount](#non-existent-serviceaccount): the Pod creation will fail, but the error is slightly different:
27
+
28
+
```bash
29
+
kubectl describe pod plz-test-xxxxxx-initializer-xxxxx
30
+
...
31
+
Events:
32
+
Warning FailedScheduling 48s (x5 over 4m6s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector.
33
+
```
34
+
35
+
#### How to fix
36
+
37
+
To fix this issue, the incorrect `nodeSelector` must be corrected and the TestRun or PrivateLoadZone resource must be re-deployed.
38
+
39
+
### Insufficient resources
40
+
41
+
A related problem can happen when the cluster does not have sufficient resources to deploy the runners. There's a higher probability of hitting this issue when setting small CPU and memory limits for runners or using options like `nodeSelector`, `runner.affinity` or `runner.topologySpreadConstraints`, and not having a set of nodes matching the spec. Alternatively, it can happen if there is a high number of runners required for the test (via `parallelism` in TestRun or during PLZ test run) and autoscaling of the cluster has limits on the maximum number of nodes, and can't provide the required resources on time or at all.
42
+
43
+
This case is somewhat similar to the previous two: the k6 Operator will wait indefinitely and can be monitored with events in Jobs and Pods. If it's possible to fix the issue with insufficient resources on-the-fly, for example, by adding more nodes, k6 Operator will attempt to continue executing a test run.
44
+
45
+
### OOM of a runner Pod
46
+
47
+
If there's at least one runner Pod that OOM-ed, the whole test will be [stuck](https://github.com/grafana/k6-operator/issues/251) and will have to be deleted manually:
48
+
49
+
```bash
50
+
kubectl -f my-test.yaml delete
51
+
# or
52
+
kubectl delete testrun my-test
53
+
```
54
+
55
+
In case of OOM, it makes sense to review the k6 script to understand what kind of resource usage this script requires. It may be that the k6 script can be improved to be more performant. Then, set the `spec.runner.resources` in the TestRun CRD, or `spec.resources` in the PrivateLoadZone CRD accordingly.
56
+
57
+
### PrivateLoadZone: subscription error
58
+
59
+
If there's an issue with your Grafana Cloud k6 subscription, there will be a 400 error in the logs with the message detailing the problem. For example:
60
+
61
+
```bash
62
+
"Received error `(400) You have reached the maximum Number of private load zones your organization is allowed to have. Please contact support if you want to create more.`. Message from server ``"
63
+
```
64
+
65
+
To fix this issue, check your organization settings in Grafana Cloud k6 or contact Support.
66
+
67
+
### PrivateLoadZone: Wrong token
68
+
69
+
There can be two major problems with the authentication token:
70
+
71
+
1. If the token wasn't created, or was created in a wrong location, the logs will show the following error:
0 commit comments