Skip to content

Commit 9bc5c80

Browse files
authored
Merge pull request #5338 from flavio-fernandes/network-qos-guide
Add network QoS guide to docs navigation
2 parents f9e5482 + 6902456 commit 9bc5c80

File tree

2 files changed

+332
-1
lines changed

2 files changed

+332
-1
lines changed

docs/features/network-qos-guide.md

Lines changed: 329 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,329 @@
1+
# Guide to Using Network QoS
2+
3+
## Contents
4+
5+
1. [Overview](#1-overview)
6+
2. [Create a Secondary Network (NAD)](#2-create-a-secondary-network)
7+
3. [Define a NetworkQoS Policy](#3-define-a-networkqos-policy)
8+
4. [Create Sample Pods and Verify the Configuration](#4-create-sample-pods-and-verify-the-configuration)
9+
5. [Explain the NetworkQoS Object](#5-explain-the-networkqos-object)
10+
11+
## **1 Overview**
12+
13+
Differentiated Services Code Point (DSCP) marking and egress bandwidth metering let you prioritize or police specific traffic flows. The new **NetworkQoS** Custom Resource Definition (CRD) in [ovn-kubernetes](https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/dist/templates/k8s.ovn.org_networkqoses.yaml.j2) makes both features available to Kubernetes users on **all** pod interfaces—primary or secondary—without touching pod manifests.
14+
15+
This guide provides a step-by-step example of how to use this feature. Before you begin, ensure that you have a Kubernetes cluster configured with the ovn-kubernetes CNI. Since the examples use network attachments, you must run the cluster with multiple network support enabled. In a kind cluster, you would use the following flags:
16+
17+
```bash
18+
cd contrib
19+
./kind-helm.sh -nqe -mne ; # --enable-network-qos --enable-multi-network
20+
```
21+
22+
## **2 Create a Secondary Network**
23+
24+
File: nad.yaml
25+
26+
```yaml
27+
apiVersion: k8s.cni.cncf.io/v1
28+
kind: NetworkAttachmentDefinition
29+
metadata:
30+
name: ovn-stream
31+
namespace: default
32+
labels: # label needed for NetworkQoS selector
33+
nad-type: ovn-kubernetes-nqos
34+
spec:
35+
config: |2
36+
{
37+
"cniVersion": "1.0.0",
38+
"name": "ovn-stream",
39+
"type": "ovn-k8s-cni-overlay",
40+
"topology": "layer3",
41+
"subnets": "10.245.0.0/16/24",
42+
"mtu": 1300,
43+
"master": "eth1",
44+
"netAttachDefName": "default/ovn-stream"
45+
}
46+
```
47+
*Why the label?* `NetworkQoS` uses a label selector to find matching NADs. Without at least one label, the selector cannot match.
48+
49+
## **3 Define a NetworkQoS Policy**
50+
51+
File: nqos.yaml
52+
53+
```yaml
54+
apiVersion: k8s.ovn.org/v1alpha1
55+
kind: NetworkQoS
56+
metadata:
57+
name: qos-external
58+
namespace: default
59+
spec:
60+
networkSelectors:
61+
- networkSelectionType: NetworkAttachmentDefinitions
62+
networkAttachmentDefinitionSelector:
63+
namespaceSelector: {} # any namespace
64+
networkSelector:
65+
matchLabels:
66+
nad-type: ovn-kubernetes-nqos
67+
podSelector:
68+
matchLabels:
69+
nqos-app: bw-limited
70+
priority: 10 # higher value wins in a tie-break
71+
egress:
72+
- dscp: 20
73+
bandwidth:
74+
burst: 100 # kilobits
75+
rate: 20000 # kbps
76+
classifier:
77+
to:
78+
- ipBlock:
79+
cidr: 0.0.0.0/0
80+
except:
81+
- 10.11.12.13/32
82+
- 172.16.0.0/12
83+
- 192.168.0.0/16
84+
```
85+
A full CRD template lives [here](https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/dist/templates/k8s.ovn.org_networkqoses.yaml.j2).
86+
87+
The `egress` field is a list, allowing you to define multiple markings and bandwidth limits based on different classifiers.
88+
89+
Note that this configuration will apply to the NAD of pods based on the network selector, and only on pods that have the label `nqos-app: bw-limited`.
90+
91+
```bash
92+
$ kubectl create -f nad.yaml && \
93+
kubectl create -f nqos.yaml
94+
95+
networkattachmentdefinition.k8s.cni.cncf.io/ovn-stream created
96+
networkqos.k8s.ovn.org/qos-external created
97+
```
98+
At this point, the output from `kubectl get networkqoses` will look like this:
99+
100+
```bash
101+
$ kubectl api-resources -owide | head -1 ; \
102+
kubectl api-resources -owide | grep NetworkQoS
103+
NAME SHORTNAMES APIVERSION NAMESPACED KIND VERBS CATEGORIES
104+
networkqoses k8s.ovn.org/v1alpha1 true NetworkQoS delete,deletecollection,get,list,patch,create,update,watch
105+
106+
$ kubectl get networkqoses qos-external -n default -owide
107+
NAME STATUS
108+
qos-external NetworkQoS Destinations applied
109+
```
110+
111+
## **4 Create Sample Pods and Verify the Configuration**
112+
113+
### **4.1 Launch Test Pods**
114+
115+
To test this, let's create a pod using a helper function that allows us to add labels to it.
116+
117+
File: create_pod.source
118+
119+
```bash
120+
create_pod() {
121+
local pod_name=${1:-pod0}
122+
local node_name=${2:-ovn-worker}
123+
local extra_labels=${3:-}
124+
125+
NAMESPACE=$(kubectl config view --minify --output 'jsonpath={..namespace}')
126+
NAMESPACE=${NAMESPACE:-default}
127+
128+
if ! kubectl get pod "$pod_name" -n "$NAMESPACE" &>/dev/null; then
129+
echo "Creating pod $pod_name in namespace $NAMESPACE..."
130+
131+
# Prepare labels block
132+
labels_block=" name: $pod_name"
133+
if [[ -n "$extra_labels" ]]; then
134+
# Convert JSON string to YAML-compatible lines
135+
while IFS="=" read -r k v; do
136+
labels_block+="
137+
$k: $v"
138+
done < <(echo "$extra_labels" | jq -r 'to_entries|map("\(.key)=\(.value)")|.[]')
139+
fi
140+
141+
# Generate the manifest
142+
cat <<EOF | kubectl apply -n "$NAMESPACE" -f -
143+
apiVersion: v1
144+
kind: Pod
145+
metadata:
146+
name: $pod_name
147+
labels:
148+
$labels_block
149+
annotations:
150+
k8s.v1.cni.cncf.io/networks: ovn-stream@eth1
151+
spec:
152+
nodeSelector:
153+
kubernetes.io/hostname: $node_name
154+
containers:
155+
- name: $pod_name
156+
image: ghcr.io/nicolaka/netshoot:v0.13
157+
command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
158+
EOF
159+
else
160+
echo "Pod $pod_name already exists."
161+
fi
162+
}
163+
```
164+
165+
```bash
166+
$ create_pod pod0 && \
167+
create_pod pod1 ovn-worker '{"nqos-app":"bw-limited"}' && \
168+
create_pod pod2 ovn-worker2 '{"foo":"bar","nqos-app":"bw-limited"}' && \
169+
echo pods created
170+
171+
extract_pod_ip_from_annotation() {
172+
local pod_name="$1"
173+
local namespace="${2:-default}"
174+
local interface="${3:-eth1}"
175+
176+
kubectl get pod "$pod_name" -n "$namespace" -o json |
177+
jq -r '.metadata.annotations["k8s.v1.cni.cncf.io/network-status"]' |
178+
jq -r --arg iface "$interface" '.[] | select(.interface == $iface) | .ips[0]'
179+
}
180+
```
181+
182+
```bash
183+
NAMESPACE=$(kubectl config view --minify --output 'jsonpath={..namespace}') ; NAMESPACE=${NAMESPACE:-default}
184+
DST_IP_POD0=$(extract_pod_ip_from_annotation pod0 $NAMESPACE eth1)
185+
DST_IP_POD1=$(extract_pod_ip_from_annotation pod1 $NAMESPACE eth1)
186+
DST_IP_POD2=$(extract_pod_ip_from_annotation pod2 $NAMESPACE eth1)
187+
188+
# Let's see the NAD IP addresses of the pods created
189+
$ echo pod0 has ip $DST_IP_POD0 ; \
190+
echo pod1 has ip $DST_IP_POD1 ; \
191+
echo pod2 has ip $DST_IP_POD2
192+
193+
pod0 has ip 10.245.4.4
194+
pod1 has ip 10.245.4.3
195+
pod2 has ip 10.245.2.3
196+
```
197+
198+
### **4.2 Checking Bandwidth**
199+
200+
`qos-external` limits **only** traffic on pods that carry `nqos-app=bw-limited`. That means:
201+
202+
* **pod1 → pod0**: *unlimited* (no matching label)
203+
* **pod1 → pod2**: *rate-limited* to ≈ 20 Mbit/s
204+
205+
Follow these steps to verify it with `iperf3`.
206+
207+
```bash
208+
# 1) Start an iperf server inside pod0 and pod2 (runs forever in background)
209+
kubectl -n default exec pod0 -- iperf3 -s -p 5201 &
210+
kubectl -n default exec pod2 -- iperf3 -s -p 5201 &
211+
212+
# 2) From pod1 → pod0 (EXPECTED ≈ line rate)
213+
kubectl -n default exec pod1 -- iperf3 -c "$DST_IP_POD0" -p 5201 -R -t 10
214+
215+
# 3) From pod1 → pod2 (EXPECTED ≈ 20 Mbit/s)
216+
kubectl -n default exec pod1 -- iperf3 -c "$DST_IP_POD2" -p 5201 -R -t 10
217+
```
218+
219+
Sample output:
220+
221+
```
222+
# to pod0 (unlimited)
223+
[ ID] Interval Transfer Bitrate Retr
224+
[ 5] 0.00-10.00 sec 37.2 GBytes 31.9 Gbits/sec 607 sender
225+
[ 5] 0.00-10.00 sec 37.2 GBytes 31.9 Gbits/sec receiver
226+
227+
# to pod1 (rate-limited)
228+
[ ID] Interval Transfer Bitrate Retr
229+
[ 5] 0.00-10.00 sec 20.8 MBytes 17.4 Mbits/sec 4056 sender
230+
[ 5] 0.00-10.00 sec 20.8 MBytes 17.4 Mbits/sec receiver
231+
```
232+
233+
The sharp drop confirms that `NetworkQoS` is enforcing the **20 Mbit/s** rate limit only for pods matching the selector.
234+
235+
### **4.3 Packet Capture**
236+
237+
Generate ICMP traffic and observe DSCP markings in Geneve outer headers using `tcpdump -envvi eth0 geneve` inside the worker node's network namespace. Only flows involving label-matched pods (those with `nqos-app=bw-limited`) will show `tos 0x50` (DSCP 20).
238+
239+
```bash
240+
# Run ping commands in the background, so we can look at packets they generate
241+
242+
# pod0 to pod2
243+
nohup kubectl exec -i pod0 -- ping -c 3600 -q $DST_IP_POD2 >/dev/null 2>&1 &
244+
# pod1 to pod2
245+
nohup kubectl exec -i pod1 -- ping -c 3600 -q $DST_IP_POD2 >/dev/null 2>&1 &
246+
247+
sudo dnf install -y --quiet tcpdump ; # Install tcpdump, if needed
248+
249+
IPNS=$(docker inspect --format '{{ '{{' }} .State.Pid }}' ovn-worker)
250+
sudo nsenter -t ${IPNS} -n tcpdump -envvi eth0 geneve
251+
```
252+
253+
```
254+
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
255+
256+
**Pod0 to Pod2**: Notice that since pod0 does not have the label to match against NetworkQoS, its TOS is 0. However, pod2's response is DSCP marked (tos 0x50), since pod2 matches the NetworkQoS criteria with the label `nqos-app: bw-limited`.
257+
258+
12:46:30.755551 02:42:ac:12:00:06 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 156: (tos 0x0, ttl 64, id 26896, offset 0, flags [DF], proto UDP (17), length 142)
259+
172.18.0.6.38210 > 172.18.0.5.geneve: [bad udp cksum 0x58bb -> 0xc87d!] Geneve, Flags [C], vni 0x12, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 00090006]
260+
0a:58:0a:f5:02:01 > 0a:58:0a:f5:02:03, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 61037, offset 0, flags [DF], proto ICMP (1), length 84)
261+
10.245.4.4 > 10.245.2.3: ICMP echo request, id 14, seq 44, length 64
262+
263+
264+
265+
12:46:30.755694 02:42:ac:12:00:05 > 02:42:ac:12:00:06, ethertype IPv4 (0x0800), length 156: (tos 0x50, ttl 64, id 46220, offset 0, flags [DF], proto UDP (17), length 142)
266+
172.18.0.5.38210 > 172.18.0.6.geneve: [bad udp cksum 0x58bb -> 0xc47d!] Geneve, Flags [C], vni 0x12, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 0004000a]
267+
0a:58:0a:f5:04:01 > 0a:58:0a:f5:04:04, ethertype IPv4 (0x0800), length 98: (tos 0x50, ttl 63, id 45002, offset 0, flags [none], proto ICMP (1), length 84)
268+
10.245.2.3 > 10.245.4.4: ICMP echo reply, id 14, seq 44, length 64
269+
270+
—---------
271+
272+
**Pod1 to Pod2**: Traffic is marked both ways (both pods have the matching label)
273+
274+
12:46:30.497289 02:42:ac:12:00:06 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 156: (tos 0x50, ttl 64, id 26752, offset 0, flags [DF], proto UDP (17), length 142)
275+
172.18.0.6.7856 > 172.18.0.5.geneve: [bad udp cksum 0x58bb -> 0x3f10!] Geneve, Flags [C], vni 0x12, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 00090006]
276+
0a:58:0a:f5:02:01 > 0a:58:0a:f5:02:03, ethertype IPv4 (0x0800), length 98: (tos 0x50, ttl 63, id 21760, offset 0, flags [DF], proto ICMP (1), length 84)
277+
10.245.4.3 > 10.245.2.3: ICMP echo request, id 14, seq 56, length 64
278+
279+
280+
281+
12:46:30.497381 02:42:ac:12:00:05 > 02:42:ac:12:00:06, ethertype IPv4 (0x0800), length 156: (tos 0x50, ttl 64, id 46019, offset 0, flags [DF], proto UDP (17), length 142)
282+
172.18.0.5.7856 > 172.18.0.6.geneve: [bad udp cksum 0x58bb -> 0x3b11!] Geneve, Flags [C], vni 0x12, proto TEB (0x6558), options [class Open Virtual Networking (OVN) (0x102) type 0x80(C) len 8 data 0004000a]
283+
0a:58:0a:f5:04:01 > 0a:58:0a:f5:04:03, ethertype IPv4 (0x0800), length 98: (tos 0x50, ttl 63, id 3850, offset 0, flags [none], proto ICMP (1), length 84)
284+
10.245.2.3 > 10.245.4.3: ICMP echo reply, id 14, seq 56, length 64
285+
```
286+
287+
## **5 Explain the NetworkQoS Object**
288+
289+
Below is an *abbreviated* map of the CRD schema returned by `kubectl explain networkqos --recursive` (v1alpha1). Use this as a quick reference. For the definitive specification, always consult the `kubectl explain` output or the CRD YAML in the ovn-kubernetes repository.
290+
291+
### **5.1 Top‑level `spec` keys**
292+
293+
| Field | Type | Required | Purpose |
294+
| ----- | ----- | ----- | ----- |
295+
| **podSelector** | `LabelSelector` | No | Selects pods whose traffic will be evaluated by the QoS rules. If empty, all pods in the namespace are selected. |
296+
| **networkSelectors[]** | list `NetworkSelector` | No | Restricts the rule to traffic on specific networks. If absent, the rule matches any interface. *(See §5.2)* |
297+
| **priority** | `int` | **Yes** | Higher number → chosen first when multiple `NetworkQoS` objects match the same packet. |
298+
| **egress[]** | list `EgressRule` | **Yes** | One or more marking / policing rules. Evaluated in the order listed. *(See §5.3)* |
299+
300+
Note the square-bracket notation (`[]`) for **both** `egress` and `networkSelectors`—each is an array in the CRD.
301+
302+
---
303+
304+
### **5.2 Inside a `networkSelectors[]` entry**
305+
306+
Each list element tells the controller **where** the pods' egress traffic must flow in order to apply the rule. Exactly **one** selector type must be set.
307+
308+
| Key | Required | Description |
309+
| :---- | :---- | :---- |
310+
| `networkSelectionType` | **Yes** | Enum that declares which selector below is populated. Common values: `NetworkAttachmentDefinitions`, `DefaultNetwork`, `SecondaryUserDefinedNetworks`, … |
311+
| `networkAttachmentDefinitionSelector` | conditional | When `networkSelectionType=NetworkAttachmentDefinitions`. Selects NADs by **namespaceSelector** (required) *and* **networkSelector** (required). Both are ordinary `LabelSelectors`. |
312+
| `secondaryUserDefinedNetworkSelector` | conditional | Used when `networkSelectionType=SecondaryUserDefinedNetworks`. Similar structure: required **namespaceSelector** & **networkSelector**. |
313+
| `clusterUserDefinedNetworkSelector`, `primaryUserDefinedNetworkSelector` | conditional | Additional selector styles, each with required sub‑selectors as per the CRD. |
314+
315+
**Typical usage** – `networkSelectionType: NetworkAttachmentDefinitions` + `networkAttachmentDefinitionSelector`.
316+
317+
---
318+
319+
### **5.3 Inside an `egress[]` rule**
320+
321+
| Field | Type | Required | Description |
322+
| :---- | :---- | :---- | :---- |
323+
| `dscp` | `int` (0 – 63) | **Yes** | DSCP value to stamp on the **inner** IP header. This value determines the traffic priority. |
324+
| `bandwidth.rate` | `int` (kbps) | No | Sustained rate for the token-bucket policer (in kilobits per second). |
325+
| `bandwidth.burst` | `int` (kilobits) | No | Maximum burst size that can accrue (in kilobits). |
326+
| `classifier.to` / `classifier.from` | list `TrafficSelector` | No | CIDRs the packet destination (or source) must match. Each entry is an `ipBlock` supporting an `except` list. |
327+
| `classifier.ports[]` | list | No | List of `{protocol, port}` tuples the packet must match; protocol is `TCP`, `UDP`, or `SCTP`. |
328+
329+
If **all** specified classifier conditions match, the packet gets the DSCP mark and/or bandwidth policer defined above. This allows for fine-grained control over which traffic flows receive QoS treatment.

mkdocs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,9 @@ nav:
126126
- MultiNetworkPolicies: features/multiple-networks/multi-network-policies.md
127127
- MultiNetworkRails: features/multiple-networks/multi-vtep.md
128128
- Multicast: features/multicast.md
129-
- NetworkQoS: features/network-qos.md
129+
- NetworkQoS:
130+
- Overview: features/network-qos.md
131+
- Usage Guide: features/network-qos-guide.md
130132
- LiveMigration: features/live-migration.md
131133
- HybridOverlay: features/hybrid-overlay.md
132134
- Hardware Acceleration:

0 commit comments

Comments
 (0)