You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tool runs in the hosts network namespace to ensure it has the same access as a user running kubectl on the host.
@@ -128,7 +128,7 @@ There are 2 tests requested in this example config.
128
128
129
129
`testKind` is required - at present you can only ask for `"thruput-latency"` or `ttfr`
130
130
131
-
`numPolicies`, `numIdlePolicies`, `numServices`, `numPods` specify the standing config desired for this test. Standing config exists simply to "load" the cluster up with config. The number that you can create is limited by your cluster - e.g. you cannot create more standing pods than will fit on your cluster!`numPolicies` creates policies that apply to the test pods. `numIdlePolicies` creates policies that will NOT apply to the test pods.
131
+
`numPolicies`, `numIdlePolicies`, `numServices`, `numPods` specify the standing config desired for this test. Standing config exists simply to "load" the cluster up with config. The number that you can create is limited by your cluster - e.g. you cannot create more standing pods than will fit on your cluster! `numPolicies` creates policies that apply to the test pods. `numIdlePolicies` creates policies that will NOT apply to the test pods.
132
132
133
133
`leaveStandingConfig` tells the tool whether it should leave or clean up the standing resources it created for this test. It is sometimes useful to leave standing config up between tests, especially if it takes a long time to set up.
134
134
@@ -151,11 +151,11 @@ external: false
151
151
`direct` is a boolean, which determines whether the test should run a direct pod-to-pod test.
152
152
`service` is a boolean, which determines whether the test should run a pod-to-service-to-pod test.
153
153
`external` is a boolean, which determines whether the test should run a test from whereever this test is being run to an externally exposed service.
154
-
If `external=true`, you must also supply `ExternalIPOrFQDN`, `TestPort` and `ControlPort` (for a thruput-latency test) to tell the test the IP and ports it should connect to. The ExternalIPOrFQDN will be whatever is exposed to the world, and might be a LoadBalancer IP, or a node IP, or something else, depending on how you exposed the service. The Test and Control ports need to be the same as used on the test server pod (because the test tools were not designed to work in an environment with NAT).
154
+
If `external=true`, you must also supply `ExternalIPOrFQDN`, `TestPort` and `ControlPort` (for a thruput-latency test) to tell the test the IP and ports it should connect to. The ExternalIPOrFQDN will be whatever is exposed to the world, and might be a LoadBalancer IP, or a node IP, or something else, depending on how you exposed the service. The Test and Control ports need to be the same as used on the test server pod (because the test tools were not designed to work in an environment with NAT).
155
155
156
-
Note that the tool will NOT expose the services for you, because there are too many different ways to expose services to the world. You will need to expose pods with the label `app: qperf` in the test namespace to the world for this test to work. An example of exposing these pods using NodePorts can be found in `external_service_example.yaml`. If you wanted to change that to use a LoadBalancer, simply change `type: NodePort` to `type: LoadBalancer`.
156
+
Note that the tool will NOT expose the services for you, because there are too many different ways to expose services to the world. You will need to expose pods with the label `app: qperf` in the test namespace to the world for this test to work. An example of exposing these pods using NodePorts can be found in `external_service_example.yaml`. If you wanted to change that to use a LoadBalancer, simply change `type: NodePort` to `type: LoadBalancer`.
157
157
158
-
For `thruput-latency` tests, you will need to expose 2 ports from those pods: A TCP `TestPort` and a `ControlPort`. You must not map the port numbers between the pod and the external service, but they do NOT need to be consecutive. i.e. if you specify TestPort=32221, the pod will listen on port 32221 and whatever method you use to expose that service to the outside world must also use that port number.
158
+
For `thruput-latency` tests, you will need to expose 2 ports from those pods: A TCP `TestPort` and a `ControlPort`. You must not map the port numbers between the pod and the external service, but they do NOT need to be consecutive. i.e. if you specify TestPort=32221, the pod will listen on port 32221 and whatever method you use to expose that service to the outside world must also use that port number.
159
159
160
160
A `ttfr` test may have the following additional config:
161
161
@@ -164,11 +164,12 @@ A `ttfr` test may have the following additional config:
164
164
TestPodsPerNode: 80
165
165
Rate: 2.5
166
166
```
167
+
167
168
The `TestPodsPerNode` setting controls the number of pods it will try to set up on each test node
168
169
169
-
The `Rate` is the rate at which it will send requests to set up pods, in pods per second. Note that the acheivable rate depends on a number of things, including the TestPodsPerNode setting (since it cannot set up more than TestPodsPerNode multiplied by the number of nodes with the test label, the tool will stall if all the permitted pods are in the process of starting or terminating). And that will depend on the speed of the kubernetes control plane, kubelet, etc.
170
+
The `Rate` is the rate at which it will send requests to set up pods, in pods per second. Note that the acheivable rate depends on a number of things, including the TestPodsPerNode setting (since it cannot set up more than TestPodsPerNode multiplied by the number of nodes with the test label, the tool will stall if all the permitted pods are in the process of starting or terminating). And that will depend on the speed of the kubernetes control plane, kubelet, etc.
170
171
171
-
In the event that you ask for a rate higher than the tool can acheive, it will run at the maximum rate it can, while logging warnings that it is "unable to keep up with rate". If the problem is running out of pod slots, it will log that also, and you can fix it by either increasing the pods per node or giving more nodes the test label.
172
+
In the event that you ask for a rate higher than the tool can acheive, it will run at the maximum rate it can, while logging warnings that it is "unable to keep up with rate". If the problem is running out of pod slots, it will log that also, and you can fix it by either increasing the pods per node or giving more nodes the test label.
172
173
173
174
### Settings which can reconfigure your cluster
174
175
@@ -335,12 +336,11 @@ An example result from a "thruput-latency" test might look like:
335
336
`ClusterDetails` contains information collected about the cluster at the time of the test.
336
337
`thruput-latency` contains a statistical summary of the raw qperf results - latency and throughput for a direct pod-pod test and via a service. Units are given in the result.
337
338
338
-
339
339
### The "Time To First Response" test
340
340
341
-
This "time to first response" (TTFR) test spins up a server pod on each node in the cluster, and then spins up client pods on each node in the cluster. The client pods start and send requests to the server pod, and record the amount of time it takes before they get a response. This is sometimes[1] a useful proxy for how long its taking for Calico to program the rules for that pod (since pods start with a deny-all rule and calico-node must program the correct rules before it can talk to anything). A better measure of the time it takes Calico to program rules for pods is to look in the [Felix Prometheus metrics](https://docs.tigera.io/calico/latest/reference/felix/prometheus#common-data-plane-metrics) at the `felix_int_dataplane_apply_time_seconds` statistic.
341
+
This "time to first response" (TTFR) test spins up a server pod on each node in the cluster, and then spins up client pods on each node in the cluster. The client pods start and send requests to the server pod, and record the amount of time it takes before they get a response. This is sometimes[1] a useful proxy for how long its taking for Calico to program the rules for that pod (since pods start with a deny-all rule and calico-node must program the correct rules before it can talk to anything). A better measure of the time it takes Calico to program rules for pods is to look in the [Felix Prometheus metrics](https://docs.tigera.io/calico/latest/reference/felix/prometheus#common-data-plane-metrics) at the `felix_int_dataplane_apply_time_seconds` statistic.
342
342
343
-
[1] if `linuxPolicySetupTimeoutSeconds` is set in the CalicoNetworkSpec in the Installation resource, then pod startup will be delayed until policy is applied. This can be handy if your application pod wants its first request to always succeed. This is a Calico-specific feature that is not part of the CNI spec. See the [Calico documentation](https://docs.tigera.io/calico/latest/reference/configure-cni-plugins#enabling-policy-setup-timeout) for more information on this feature and how to enable it.
343
+
[1] if `linuxPolicySetupTimeoutSeconds` is set in the CalicoNetworkSpec in the Installation resource, then pod startup will be delayed until policy is applied. This can be handy if your application pod wants its first request to always succeed. This is a Calico-specific feature that is not part of the CNI spec. See the [Calico documentation](https://docs.tigera.io/calico/latest/reference/configure-cni-plugins#enabling-policy-setup-timeout) for more information on this feature and how to enable it.
344
344
345
345
For a "ttfr" test, the tool will:
346
346
@@ -350,18 +350,19 @@ For a "ttfr" test, the tool will:
350
350
- Wait for those to come up.
351
351
- Create a server pod on each node with the `tigera.io/test-nodepool=default-pool` label
352
352
- Loop round:
353
-
- creating test pods on those nodes, at the rate defined by Rate in the test config
354
-
- test pods are then checked until they produce a ttfr result in their log, which is read by the tool
355
-
- and a delete is sent for the test pod.
353
+
- creating test pods on those nodes, at the rate defined by Rate in the test config
354
+
- test pods are then checked until they produce a ttfr result in their log, which is read by the tool
355
+
- and a delete is sent for the test pod.
356
356
- ttfr results are recorded
357
357
- Collate results and compute min/max/average/50/75/90/99th percentiles
358
358
- Output that summary into a JSON format results file.
359
359
- Optionally delete the test namespace (which will cause all test resources within it to be deleted)
360
360
- Wait for everything to finish being cleaned up.
361
361
362
-
This test measures Time to First Response in seconds. i.e. the time between a pod starting up, and it getting a response from a server pod on the same node.
362
+
This test measures Time to First Response in seconds. i.e. the time between a pod starting up, and it getting a response from a server pod on the same node.
363
363
364
364
An example result from a "ttfr" test might look like:
365
+
365
366
```
366
367
[
367
368
{
@@ -425,4 +426,4 @@ An example result from a "ttfr" test might look like:
425
426
426
427
`config` contains the configuration requested in the test definition.
427
428
`ClusterDetails` contains information collected about the cluster at the time of the test.
428
-
`ttfr` contains a statistical summary of the raw results. Units are given in the result.
429
+
`ttfr` contains a statistical summary of the raw results. Units are given in the result.
0 commit comments