Skip to content

ovn-kubernetes/kubernetes-traffic-flow-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

777 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Traffic Flow Test Scripts

This repository contains the yaml files, docker files, and test scripts to test Traffic Flows in an OVN-Kubernetes k8s cluster.

Setting up the environment

The package "kubectl" should be installed.

The recommended python version is 3.11 for running the Traffic Flow tests

python -m venv tft-venv
source tft-venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt

Optional: Developer Environment Setup

If you're planning to contribute or run tests/linters locally, install the developer dependencies to the environment. These include everything from requirements.txt (runtime) plus additional tools like pytest, black, mypy, and flake8:

python -m venv tft-venv 
source tft-venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements-devel.txt

Once installed, you can use:

pytest         # Run test suite
black .        # Format code
...

This step is optional and not required for using the Traffic Flow Test scripts.

Configuration YAML fields:

tft:
  - name: "(1)"
    namespace: "(2)"
    # test cases can be specified individually i.e "1,2,POD_TO_HOST_SAME_NODE,6" or as a range i.e. "POD_TO_POD_SAME_NODE-9,15-19"
    test_cases: "(3)"
    duration: "(4)"
    # Location of artifacts from run can be specified: default <working-dir>/ft-logs/
    # logs: "/tmp/ft-logs"
    connections:
      - name: "(5)"
        type: "(6)"
        instances: (7)
        server:
          - name: "(8)"
            persistent: "(9)"
            sriov: "(10)"
            default_network: "(11)"
        client:
          - name: "(12)"
            sriov: "(13)"
            default_network: "(14)"
        plugins:
          - name: (15)
          - name: (15)
        secondary_network_nad: "(16)"
	resource_name: "(17)"
    privileged_pod: (18)
    capabilities_pod: (19)
kubeconfig: (20)
kubeconfig_infra: (20)
  1. "name" - This is the name of the test. Any string value to identify the test.
  2. "namespace" - The k8s namespace where the test pods will be run on
  3. "test_cases" - A list of the tests that can be run. This can be either a string that possibly contains ranges (comma separated, ranged separated by '-'), or a YAML list.
    ID Test Name
    1 POD_TO_POD_SAME_NODE
    2 POD_TO_POD_DIFF_NODE
    3 POD_TO_HOST_SAME_NODE
    4 POD_TO_HOST_DIFF_NODE
    5 POD_TO_CLUSTER_IP_TO_POD_SAME_NODE
    6 POD_TO_CLUSTER_IP_TO_POD_DIFF_NODE
    7 POD_TO_CLUSTER_IP_TO_HOST_SAME_NODE
    8 POD_TO_CLUSTER_IP_TO_HOST_DIFF_NODE
    9 POD_TO_NODE_PORT_TO_POD_SAME_NODE
    10 POD_TO_NODE_PORT_TO_POD_DIFF_NODE
    11 POD_TO_NODE_PORT_TO_HOST_SAME_NODE
    12 POD_TO_NODE_PORT_TO_HOST_DIFF_NODE
    13 HOST_TO_HOST_SAME_NODE
    14 HOST_TO_HOST_DIFF_NODE
    15 HOST_TO_POD_SAME_NODE
    16 HOST_TO_POD_DIFF_NODE
    17 HOST_TO_CLUSTER_IP_TO_POD_SAME_NODE
    18 HOST_TO_CLUSTER_IP_TO_POD_DIFF_NODE
    19 HOST_TO_CLUSTER_IP_TO_HOST_SAME_NODE
    20 HOST_TO_CLUSTER_IP_TO_HOST_DIFF_NODE
    21 HOST_TO_NODE_PORT_TO_POD_SAME_NODE
    22 HOST_TO_NODE_PORT_TO_POD_DIFF_NODE
    23 HOST_TO_NODE_PORT_TO_HOST_SAME_NODE
    24 HOST_TO_NODE_PORT_TO_HOST_DIFF_NODE
    25 POD_TO_EXTERNAL
    26 HOST_TO_EXTERNAL
    27 POD_TO_POD_2ND_INTERFACE_SAME_NODE
    28 POD_TO_POD_2ND_INTERFACE_DIFF_NODE
    29 POD_TO_POD_MULTI_NETWORK_POLICY
  4. "duration" - The duration that each individual test will run for.
  5. "name" - This is the connection name. Any string value to identify the connection.
  6. "type" - Supported types of connections are iperf-tcp, iperf-udp, netperf-tcp-stream, netperf-tcp-rr, ib-write-bw, ib-read-bw, ib-send-bw
  7. "instances" - The number of instances that would be created. Default is "1"
  8. "name" - The node name of the server.
  9. "persistent" - Whether to have the server pod persist after the test. Takes in "true/false"
  10. "sriov" - Whether SRIOV should be used for the server pod. Takes in "true/false"
  11. "default_network" - (Optional) The name of the default network that the sriov pod would use.
  12. "name" - The node name of the client.
  13. "sriov" - Whether SRIOV should be used for the client pod. Takes in "true/false"
  14. "default_network" - (Optional) The name of the default network that the sriov pod would use. 14a. "args" - (Optional) Extra command-line arguments to pass to the test tool (iperf3, simple-tcp-server-client). Supported for iperf-tcp, iperf-udp, and simple test types. Can be a string or list of strings.
  15. "name" - (Optional) list of plugin names
    Name Description
    measure_cpu Measure CPU Usage
    measure_power Measure Power Usage
    validate_offload Verify OvS Offload
  16. "secondary_network_nad" - (Optional) - The name of the secondary network for multi-homing and multi-networkpolicies tests. For tests except 27-29, the primary network will be used if unspecified (the default which is None). For mandatory tests 27-29 it defaults to "tft-secondary" if not set.
  17. "resource_name" - (Optional) - The resource name for tests that require resource limit and requests to be set. This field is optional and will default to None if not set, but if secondary network nad is defined, traffic flow test tool will try to autopopulate resource_name based on the secondary+network_nad provided.
  18. "privileged_pod" - (Optional) - Whether to run test pods as privileged. Defaults to false. Can be set at test level or per-node (server/client).
  19. "capabilities_pod" - (Optional) - Linux capabilities for test pods. Format: {"add": ["NET_ADMIN", "SYS_TIME"]}. Can be set at test level (applies to all pods) or per-node (server/client) for fine-grained control. Per-node settings take precedence over test-level settings.
  20. "kubeconfig", "kubeconfig_infra": if set to non-empty strings, then these are the KUBECONFIG files. "kubeconfig_infra" must be set for DPU cluster mode. If both are empty, the configs are detected based on the files we find at /root/kubeconfig.*.
  21. "dpu_node_host_label": (Required for DPU mode) The label on DPU nodes that identifies which host worker node they belong to. For NVIDIA DPUs, use provisioning.dpu.nvidia.com/host.

DPU Mode

When running with a DPU (Data Processing Unit) cluster, the validate_offload plugin needs to query VF representors from the DPU cluster rather than the host. This is because in DPU environments, VF representors reside on the DPU where OVS/OVN runs.

Configuration

To enable DPU mode, configure the following in your config.yaml:

kubeconfig: /path/to/tenant-cluster.kubeconfig
kubeconfig_infra: /path/to/dpu-cluster.kubeconfig
dpu_node_host_label: "provisioning.dpu.nvidia.com/host"

The dpu_node_host_label specifies which label on DPU nodes identifies the corresponding host worker node. For example, with NVIDIA DPUs, each DPU node has a label like:

provisioning.dpu.nvidia.com/host: worker-node-name

The plugin uses this label to find the correct DPU node for each worker node.

How It Works

  1. DPU Node Discovery: The plugin queries DPU nodes by label to find the DPU corresponding to each worker node.

  2. VF Info from Pod: Gets the VF index and PF index from the pod using standard Linux sysfs interfaces (vendor-agnostic).

  3. VF Representor Lookup: Uses devlink port show on the DPU to find the VF representor by matching pfnum and vfnum (vendor-agnostic).

  4. Ethtool Stats: Runs ethtool -S on the VF representor from the DPU tools pod to verify hardware offload.

Running the tests

Simply run the python application as so:

./tft.py config.yaml

Example: iperf UDP with custom bandwidth

By default, iperf-udp tests use -u -b 25G options. You can customize the bandwidth or add other iperf3 options using the args parameter on the client and/or server:

tft:
  - name: "UDP Test with custom bandwidth"
    namespace: "default"
    test_cases: "1"
    duration: "30"
    connections:
      - name: "Connection_1"
        type: "iperf-udp"
        instances: 1
        server:
          - name: "worker-1"
        client:
          - name: "worker-2"
            args: "-b 10G"  # Override the default 25G bandwidth

You can also pass multiple options:

        client:
          - name: "worker-2"
            args: "-b 10G --parallel 4"  # Custom bandwidth and 4 parallel streams

Or as a list:

        client:
          - name: "worker-2"
            args:
              - "-b"
              - "10G"
              - "--parallel"
              - "4"

Environment variables

  • TFT_TEST_IMAGE specify the test image. Defaults to ghcr.io/ovn-kubernetes/kubernetes-traffic-flow-tests:latest. This is mainly for development and manual testing, to inject another container image. Used for all test types except ib-* tests.
  • TFT_RDMA_TEST_IMAGE specify the RDMA test image containing perftest tools (ib_write_bw, etc.). If not set, automatically derived from TFT_TEST_IMAGE by adding -rdma suffix (e.g., image:tag becomes image-rdma:tag). Used automatically for ib-* test types.
  • TFT_IMAGE_PULL_POLICY the image pull policy. One of IfNotPresent, Always, Never. Defaults to IfNotPresentm unless $TFT_TEST_IMAGE is set (in which case it defaults to Always).
  • TFT_PRIVILEGED_POD sets whether test pods are privileged. This overwrites the settings from the configuration YAML.
  • TFT_MANIFESTS_OVERRIDES to specify an overrides directory for manifests. If not set, the default is "manifests/overrides". If set to empty, no overrides are used. You can place your own variants of the files from "manifests" directory and they will be preferred.
  • TFT_MANIFESTS_YAMLS to specify the output directory for rendered manifests. This defaults to "manifests/yamls".
  • TFT_KUBECONFIG, TFT_KUBECONFIG_INFRA to overwrite the kubeconfigs from the configuration file. See also the "--kubeconfig" and "--kubeconfig-infra" command line options.

File Transfer via magic-wormhole

It is sometimes cumbersome to transfer files between machines. magic-wormhole helps with that. Unfortunately it is not packaged in RHEL/Fedora. You can install it with pip install magic-wormhole or

python3 -m venv /opt/magic-wormhole-venv && \
( source /opt/magic-wormhole-venv/bin/activate && \
  pip install --upgrade pip && \
  pip install magic-wormhole ) && \
ln -s /opt/magic-wormhole-venv/bin/wormhole /usr/bin/

wormhole is installed in the kubernetes-traffic-flow-tests container. From inside the container you can issue wormhole send $FILE. Or you can

podman run --rm -ti -v /:/host -v .:/pwd:Z -w /pwd ghcr.io/ovn-kubernetes/kubernetes-traffic-flow-tests:latest wormhole send $FILE

This will print a code, which you use on the receiving end via wormhole receive $CODE. Or

podman run --rm -ti -v .:/pwd:Z -w /pwd ghcr.io/ovn-kubernetes/kubernetes-traffic-flow-tests:latest wormhole receive $CODE

Use ktoolbox-netdev

Use ktoolbox' netdev command to collect interface information:

podman run --privileged --network=host ghcr.io/ovn-kubernetes/kubernetes-traffic-flow-tests:latest ktoolbox-netdev
podman run --privileged --network=host ghcr.io/ovn-kubernetes/kubernetes-traffic-flow-tests:latest sh -c 'ktoolbox-netdev | yq -P -C' | less -R

Debugging Tests using Simple Exec Script

When a TFT test fails, it cleans up the broken environment. That can make debugging cumbersome.

One possible way can be using the "simple" test type with the "--exec" parameter. The "simple" test type runs scripts/simple-tcp-server-client.py script. Check the --help output about the --exec options (and --exec-insecure, --exec-args, --exec-arg). In exec mode, the script simple does something else. It will download an external script and execute that instead. That script can do anything and you can tweak it to be useful for debugging.

There is already a default script scripts/simple-exec.sh. You could take that script as starting poing and tweak it (or you can use your own script).

If you use scripts/simple-exec.sh, then by default it will call it's calling script simple-tcp-server-client.py again, albeit with some steps that might be useful for debugging. In particular, if the simple-tcp-server-client.py call fails, the script will hang, which allows you to enter the pod and investigate the problem yourself.

If a non-empty first parameter to scripts/simple-exec.sh is provided, then that is expected to be a URL to download a simple-tcp-server-client.py like script, which is invoked instead of the simple-tcp-server-client.py script from the tft container.

This allows you to run arbitrary code without need to rebuild the tft container. In a first step, you can pass your own --exec script. Either based on scripts/simple-exec.sh or whatever suits you.

If you use the unmodified scripts/simple-exec.sh, then by default it will call back into scripts/simple-tcp-server-client.py from inside the container. This then runs the actual traffic flow test. If you wish, you can also provide your own patched variant of that latter script, instead of using the one from the container.

For example, consider the following configuration.

--- c/tft-config.yaml
+++ i/tft-config.yaml
@@ -1,21 +1,24 @@
 tft:
   - name: "Test 1"
     namespace: "default"
     test_cases: "1"
     duration: "30"
+    privileged_pod: true
     connections:
       - name: "Connection_1"
-        type: "iperf-udp"
+        type: "simple"
         instances: 1
         server:
           - name: "$worker"
             sriov: "true"
+            args: "--num-clients 0 --exec https://example.com/tft-test/simple-exec.sh --exec-insecure -E https://example.com/tft-test/simple-tcp-server-client.py"
         client:
           - name: "$worker"
             sriov: "true"
+            args: "--exec https://example.com/tft-test/simple-exec.sh --exec-insecure -E https://example.com/tft-test/simple-tcp-server-client.py"

In above example, the server side will first download and exec https://example.com/tft-test/simple-exec.sh, with one parameter, the URL https://example.com/tft-test/simple-tcp-server-client.py. If that scripts behaves as the scripts/simple-exec.sh from our tree, then it will take the first argument, download it, and execut that script as if it were a "simple-tcp-server-client.py" script. Note how the parameters like --num-clients 0 will be passed all the way down to that last python script. This leaves you two scripts that you can tweak to your needs and update easily, while being based on some default implementations that can be useful without modification.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 12

Languages