Skip to content

Commit a6cac97

Browse files
committed
ovnkube in dpu host mode: advanced gateway detection
This patch introduces a new way to detect the gateway interface in the case ovnkube is running in DPU-HOST mode. Introduce the argument derive-from-mgmt-port, if this is specified as the gateway interface, we will identify the physical function of the device used as an management port accelerated interface. Signed-off-by: Alin Gabriel Serdean <[email protected]>
1 parent 4020bd2 commit a6cac97

File tree

7 files changed

+615
-0
lines changed

7 files changed

+615
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# From PCI Address Gateway Interface Feature
2+
3+
## Overview
4+
5+
The "derive-from-mgmt-port" gateway interface feature is a new capability in OVN-Kubernetes that enables automatic gateway interface resolution in DPU (Data Processing Unit) host mode deployments. This feature automatically discovers and configures the appropriate Physical Function (PF) interface as the gateway interface based on the Virtual Function (VF) used for the management port.
6+
7+
## Problem Statement
8+
9+
In DPU deployments, the host typically has access to Virtual Functions (VFs) for management purposes, while the Physical Functions (PFs) are used for external connectivity. Previously, administrators had to manually specify the gateway interface, which required:
10+
11+
1. Knowledge of the hardware topology
12+
2. Manual mapping of VF to PF relationships
13+
3. Configuration updates when hardware changes
14+
4. Potential for misconfiguration
15+
16+
## Solution
17+
18+
The "derive-from-mgmt-port" feature automates the gateway interface discovery process by:
19+
20+
1. **Automatic Discovery**: Automatically finds the PF interface associated with the management port VF
21+
2. **Hardware Abstraction**: Eliminates the need for manual hardware topology knowledge
22+
3. **Dynamic Configuration**: Adapts to hardware changes automatically
23+
4. **Reduced Configuration**: Simplifies deployment configuration
24+
25+
## Benefits
26+
27+
### For Administrators
28+
29+
- **Simplified Configuration**: No need to manually specify gateway interfaces
30+
- **Reduced Errors**: Eliminates manual mapping errors
31+
- **Hardware Agnostic**: Works with any SR-IOV capable hardware
32+
- **Dynamic Adaptation**: Automatically adapts to hardware changes
33+
34+
### For Operations
35+
36+
- **Faster Deployment**: Reduced configuration time
37+
- **Consistent Setup**: Standardized gateway interface selection
38+
- **Reduced Maintenance**: Less manual intervention required
39+
- **Better Reliability**: Fewer configuration-related issues
40+
41+
### For Development
42+
43+
- **Cleaner Code**: Centralized gateway interface logic
44+
- **Better Testing**: Comprehensive unit test coverage
45+
- **Extensible Design**: Foundation for future enhancements
46+
47+
## Technical Implementation
48+
49+
### Code Changes
50+
51+
1. **New Constant**: Added `DeriveFromMgmtPort = "derive-from-mgmt-port"` constant in `go-controller/pkg/types/const.go`
52+
2. **Enhanced Logic**: Extended gateway initialization in `go-controller/pkg/node/default_node_network_controller.go`
53+
3. **Comprehensive Testing**: Added unit tests covering success and failure scenarios
54+
55+
### Key Functions
56+
57+
- `getManagementPortNetDev()`: Resolves management port device name
58+
- `GetPciFromNetDevice()`: Retrieves PCI address from network device
59+
- `GetPfPciFromVfPci()`: Resolves PF PCI address from VF PCI address
60+
- `GetNetDevicesFromPci()`: Discovers network devices associated with PCI address
61+
62+
### Error Handling
63+
64+
The implementation includes robust error handling for:
65+
- Missing network devices
66+
- PCI address resolution failures
67+
- SR-IOV operation failures
68+
- Hardware compatibility issues
69+
70+
## Configuration Examples
71+
72+
### Basic Configuration
73+
74+
```bash
75+
--ovnkube-node-mode=dpu-host
76+
--ovnkube-node-mgmt-port-netdev=pf0vf0
77+
--gateway-interface=derive-from-mgmt-port
78+
```
79+
80+
### Helm Configuration
81+
82+
```yaml
83+
ovnkube-node:
84+
mode: dpu-host
85+
mgmtPortNetdev: pf0vf0
86+
87+
gateway:
88+
interface: derive-from-mgmt-port
89+
```
90+
91+
### Configuration File
92+
93+
```ini
94+
[OvnKubeNode]
95+
mode=dpu-host
96+
mgmt-port-netdev=pf0vf0
97+
98+
[Gateway]
99+
interface=derive-from-mgmt-port
100+
```
101+
102+
## Migration Guide
103+
104+
### From Manual Configuration
105+
106+
**Before:**
107+
```bash
108+
--gateway-interface=eth0
109+
```
110+
111+
**After:**
112+
```bash
113+
--gateway-interface=derive-from-mgmt-port
114+
```
115+
116+
### Verification Steps
117+
118+
1. Verify SR-IOV configuration is correct
119+
2. Ensure management port device is properly configured
120+
3. Check that PF interfaces are available
121+
4. Monitor logs for successful gateway interface resolution
122+
123+
## Testing
124+
125+
### Unit Tests
126+
127+
Comprehensive unit tests cover:
128+
- Successful gateway interface resolution
129+
- Error handling for missing devices
130+
- PCI address resolution failures
131+
- Network device discovery failures
132+
133+
### Integration Tests
134+
135+
The feature integrates with existing:
136+
- Gateway initialization
137+
- DPU host mode functionality
138+
- SR-IOV operations
139+
- Network configuration
140+
141+
## Future Enhancements
142+
143+
Potential improvements include:
144+
- Support for multiple gateway interfaces
145+
- Enhanced device selection criteria
146+
- Integration with device plugins
147+
- Support for non-SR-IOV hardware
148+
- Advanced error reporting and diagnostics
149+
150+
## Related Documentation
151+
152+
- [DPU Gateway Interface Configuration](dpu-gateway-interface.md)
153+
- [DPU Support](dpu-support.md)
154+
- [Gateway Accelerated Interface Configuration](../design/gateway-accelerated-interface-configuration.md)
155+
- [Configuration Guide](../../getting-started/configuration.md)
156+
157+
## Support
158+
159+
For issues related to this feature:
160+
1. Check the troubleshooting section in the DPU Gateway Interface Configuration guide
161+
2. Verify SR-IOV hardware and driver support
162+
3. Review error messages and logs
163+
4. Consult the OVN-Kubernetes community for additional support
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# DPU Gateway Interface Configuration
2+
3+
## Overview
4+
5+
In DPU (Data Processing Unit) host mode deployments, OVN-Kubernetes supports automatic gateway interface resolution from PCI address. This feature is particularly useful when the management port is a Virtual Function (VF) and you want to automatically select the corresponding Physical Function (PF) interface as the gateway.
6+
7+
## Background
8+
9+
In DPU deployments, the host typically has access to Virtual Functions (VFs) for management purposes, while the Physical Functions (PFs) are used for external connectivity. The "derive-from-mgmt-port" feature allows OVN-Kubernetes to automatically discover and configure the appropriate PF interface as the gateway interface based on the VF used for the management port.
10+
11+
## How It Works
12+
13+
When configured with `--gateway-interface=derive-from-mgmt-port`, OVN-Kubernetes performs the following steps:
14+
15+
1. **Management Port Resolution**: Gets the management port network device name (specified by `--ovnkube-node-mgmt-port-netdev`)
16+
2. **VF PCI Address Retrieval**: Retrieves the PCI address of the management port device (VF)
17+
3. **PF PCI Address Resolution**: Gets the Physical Function (PF) PCI address from the Virtual Function (VF) PCI address
18+
4. **Network Device Discovery**: Retrieves all network devices associated with the PF PCI address
19+
5. **Interface Selection**: Selects the first available network device as the gateway interface
20+
21+
## Configuration
22+
23+
### Command Line Options
24+
25+
```bash
26+
--ovnkube-node-mode=dpu-host
27+
--ovnkube-node-mgmt-port-netdev=pf0vf0
28+
--gateway-interface=derive-from-mgmt-port
29+
```
30+
31+
### Configuration File
32+
33+
```ini
34+
[OvnKubeNode]
35+
mode=dpu-host
36+
mgmt-port-netdev=pf0vf0
37+
38+
[Gateway]
39+
interface=derive-from-mgmt-port
40+
```
41+
42+
### Helm Configuration
43+
44+
```yaml
45+
ovnkube-node:
46+
mode: dpu-host
47+
mgmtPortNetdev: pf0vf0
48+
49+
gateway:
50+
interface: derive-from-mgmt-port
51+
```
52+
53+
## Example Scenario
54+
55+
Consider a DPU setup with the following configuration:
56+
57+
- **Management port device**: `pf0vf0` (Virtual Function)
58+
- **VF PCI address**: `0000:01:02.3`
59+
- **PF PCI address**: `0000:01:00.0`
60+
- **Available PF interfaces**: `eth0`, `eth1`
61+
62+
With `--gateway-interface=derive-from-mgmt-port`, OVN-Kubernetes will:
63+
64+
1. Start with the management port device `pf0vf0`
65+
2. Get its PCI address `0000:01:02.3`
66+
3. Resolve the PF PCI address to `0000:01:00.0`
67+
4. Find all network devices associated with PF `0000:01:00.0`: `eth0`, `eth1`
68+
5. Select `eth0` (first device) as the gateway interface
69+
70+
## Requirements
71+
72+
### Hardware Requirements
73+
74+
- SR-IOV capable network interface card
75+
- Virtual Function (VF) and Physical Function (PF) setup
76+
- Management port configured as a VF
77+
78+
### Software Requirements
79+
80+
- SR-IOV utilities available on the system
81+
- OVN-Kubernetes running in DPU host mode
82+
- Proper VF/PF driver support
83+
84+
### Configuration Requirements
85+
86+
- Must be used in DPU host mode (`--ovnkube-node-mode=dpu-host`)
87+
- Management port netdev must be specified (`--ovnkube-node-mgmt-port-netdev`)
88+
- Gateway interface must be set to `derive-from-mgmt-port`
89+
90+
## Error Handling
91+
92+
The system will return an error in the following scenarios:
93+
94+
### No Network Devices Found
95+
96+
```
97+
no netdevs found for pci address 0000:01:00.0
98+
```
99+
100+
**Cause**: The PF PCI address doesn't have any associated network devices.
101+
102+
**Resolution**: Verify that the PF has network interfaces configured and are visible to the system.
103+
104+
### PCI Address Resolution Failure
105+
106+
```
107+
failed to get PCI address
108+
```
109+
110+
**Cause**: Unable to retrieve the PCI address from the management port device.
111+
112+
**Resolution**: Ensure the management port device exists and is properly configured.
113+
114+
### PF PCI Address Resolution Failure
115+
116+
```
117+
failed to get PF PCI address
118+
```
119+
120+
**Cause**: Unable to resolve the PF PCI address from the VF PCI address.
121+
122+
**Resolution**: Verify SR-IOV configuration and driver support.
123+
124+
### Network Device Discovery Failure
125+
126+
```
127+
failed to get network devices
128+
```
129+
130+
**Cause**: Unable to retrieve network devices associated with the PF PCI address.
131+
132+
**Resolution**: Check SR-IOV utilities and system configuration.
133+
134+
## Troubleshooting
135+
136+
### Verify SR-IOV Configuration
137+
138+
```bash
139+
# Check if SR-IOV is enabled
140+
lspci | grep -i ethernet
141+
142+
# Check VF configuration
143+
ip link show
144+
145+
# Check PF/VF relationship
146+
ls /sys/bus/pci/devices/*/virtfn*
147+
```
148+
149+
### Verify Management Port Device
150+
151+
```bash
152+
# Check if management port device exists
153+
ip link show pf0vf0
154+
155+
# Check PCI address
156+
ethtool -i pf0vf0 | grep bus-info
157+
```
158+
159+
### Debug PCI Address Resolution
160+
161+
```bash
162+
# Get VF PCI address
163+
cat /sys/class/net/pf0vf0/device/address
164+
165+
# Get PF PCI address (if available)
166+
cat /sys/class/net/pf0vf0/device/physfn/address
167+
```
168+
169+
## Integration with Existing Features
170+
171+
### Gateway Accelerated Interface
172+
173+
The "derive-from-mgmt-port" feature is used in conjunction with management interface to select the appropriate gateway accelerated interface.
174+
175+
The management port can be specified through one of the following options:
176+
```
177+
--ovnkube-node-mgmt-port-netdev)
178+
OVNKUBE_NODE_MGMT_PORT_NETDEV=$VALUE
179+
```
180+
181+
```
182+
--ovnkube-node-mgmt-port-dp-resource-name)
183+
OVNKUBE_NODE_MGMT_PORT_DP_RESOURCE_NAME=$VALUE
184+
```
185+
186+
OVNKUBE_NODE_MGMT_PORT_DP_RESOURCE_NAME has priority over OVNKUBE_NODE_MGMT_PORT_NETDEV and it is easier to use since it points to a SRIOV Device Plugin pool name.
187+
188+
### Multiple Network Support
189+
190+
This feature works with multiple network support and can be used in environments where pods have multiple interfaces connected to different networks.
191+
192+
## Limitations
193+
194+
- Only available in DPU host mode
195+
- Requires SR-IOV capable hardware
196+
- Limited to the first available network device from the PF
197+
- Depends on proper VF/PF driver support
198+
- May not work with all SR-IOV implementations
199+
200+
## Future Enhancements
201+
202+
Potential improvements to this feature could include:
203+
204+
- Support for selecting specific network devices based on criteria
205+
- Integration with device plugin resources
206+
- Support for multiple gateway interfaces
207+
- Enhanced error reporting and diagnostics
208+
- Support for non-SR-IOV hardware configurations

docs/features/hardware-offload/dpu-support.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ These aforementioned parts are expected to be deployed also on two different Kub
4343
#### OVN Kubernetes component on a DPU-Enabled Host
4444
- ovn-node
4545

46+
For detailed configuration of gateway interfaces in DPU host mode, see [DPU Gateway Interface Configuration](dpu-gateway-interface.md).
47+
4648
### DPU Cluster
4749
---
4850

0 commit comments

Comments
 (0)