-
Notifications
You must be signed in to change notification settings - Fork 935
202511 multi-vrf #21866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 202511
Are you sure you want to change the base?
202511 multi-vrf #21866
Conversation
Introduce infrastructure changes required to converge multiple BGP peers into a minimum number of cEOSLab hosts, via the use of VLANs and VRFs. Overview of peer convergence: Converging the total number of peer switches into the fewest possible number of cEOSLab containers reduces the overall resource constraints required to run large numbers of peers. The basic premises behind convergence are as follows: - cEOSLab peers in docker containers may be converged into a smaller number of host peers. - The SONiC-facing configuration of each BGP peer may be separated in routing and bridging via the use of VRFs. - The PTF-facing configuration of each BGP peer may be separated within each VRF via VLAN tagging, enabling the use of a single backplane interface on each host cEOSLab container. - Each VRF includes a number of interfaces either facing the SONiC DUT or the backplane. - Changes should be as transparent to the SONiC DUT as possible. At the time of testbed setup, the ansible topology file for the testbed is modified to include new metadata specific to multi-vrf configuration, and the VMs list is trimmed to only include those containers which will host multiple BGP peerings, separated by VRF. The new metadata includes mappings between host containers and VRFs, backplane VLAN mappings, and BGP session parameters. VLAN tag 2000 is used as the starting value for all VLANs between the test infrastructure PTF container interfaces and cEOSLab device interfaces. The IP and IPv6 addresses used to connect the cEOSLab peer and infrastructure PTF container are generated in order to make the backplane connections clearer, more unique, and easier to implement. In general, backplane L3 addresses used by the CEOSLab peer end in even numbers, and those used by the PTF container end in odd numbers. All addresses generated for use in backplane connections start with the value 100 (0x64) in the least-significant octet or hextet (depending on the family of the address). The address changes are mapped and stored in the new multi-vrf metadata in the ansible topology file. Multiple BGP features, such as local-as and next-hop-peer, are used in order to aid in the resolution of routes. This is necessary to keep the SONiC DUT multi-vrf-agnostic as possible. Known limitations: - cEOSLab instances do not allow for the creation of interfaces with interface-IDs greater than 127, when interfaces are layed out unidimensionally. - The use of multiple VRFs has not been tested in conjunction with asynchronous ansible tasks. Signed-off-by: Will Rideout <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
@yutongzhang-microsoft for the full-topo you well need to adjust the maxFpNum as set in the testbed.yaml file to 127. Apologies for not mentioning this earlier-- I will update the instructions above. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
In order for multi-vrf to support the redeploy-topo CLI command, the VLAN interfaces created in the ptf container must be cleaned up in the topo removal phase. In addition, when creating the VLAN interfaces in the ptf container during the topo add phase, check for existence of the VLAN interface first. If it already exists, then clear all IP adderesses associated with the interface and skip interface creation. Otherwise, create the VLAN interface as normal. Existence checking must be done, as the topo removal phase does not stop the redeployment of the topo if it fails, so we may end up adding the topo with containers and config in a indeterminate state. Signed-off-by: Will Rideout <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Fix the fetching of the intf-offset when running ipv6 bgp scale tests on multi-vrf testbeds. The offset is now a member of a dictionary inside the multi-vrf intf_mapping metadata. Signed-off-by: Will Rideout <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
In bgp/test_bgp_allow_list.py tests, pass vrf information if running on a multi-vrf testbed when getting bgp route information from a peer. Otherwise, use the vrf "default". Signed-off-by: Will Rideout <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
When running bgp traffic-shift tests on multi-vrf testbeds, fetch the current vrf (peer) from nbrhosts metadata, and use it to pass the vrf to bgp show commands. This was verified to fix traffic-shift tests which were failing as unable to verify routes on the ceoslab peers. Signed-off-by: Will Rideout <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
When running qos testing for dscp on multi-vrf testbeds, extract the vm offset from the multi-vrf metadata instead of the shortened vm list in the test topology. This was verified to fix KeyErrors on multi-vrf testsbeds thrown in qos/test_qos_dscp_mapping.py. Signed-off-by: Will Rideout <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description of PR
Converge cEOSLab peer containers via the use of VRFs and VLANs
Type of change
Approach
Converging the total number of peer switches into the fewest possible
number of cEOSLab containers reduces the overall resource constraints
required to run large numbers of peers. The basic premises behind
convergence are as follows:
cEOSLab peers in docker containers may be converged into a smaller
number of host peers.
The SONiC-facing configuration of each BGP peer may be separated in
routing and bridging via the use of VRFs.
The PTF-facing configuration of each BGP peer may be separated within
each VRF via VLAN tagging, enabling the use of a single backplane
interface on each host cEOSLab container.
Each VRF includes a number of interfaces either facing the SONiC DUT
or the backplane.
Changes should be as transparent to the SONiC DUT as possible.
At the time of testbed setup, the ansible topology file for the testbed
is modified to include new metadata specific to multi-vrf configuration,
and the VMs list is trimmed to only include those containers which will
host multiple BGP peerings, separated by VRF. The new metadata includes
mappings between host containers and VRFs, backplane VLAN mappings, and
BGP session parameters.
VLAN tag 2000 is used as the starting value for all VLANs between the
test infrastructure PTF container interfaces and cEOSLab device
interfaces.
The IP and IPv6 addresses used to connect the cEOSLab peer and
infrastructure PTF container are generated in order to
make the backplane connections clearer, more unique, and easier to
implement. In general, backplane L3 addresses used by the CEOSLab peer
end in even numbers, and those used by the PTF container end in odd
numbers. All addresses generated for use in backplane connections start
with the value 100 (0x64) in the least-significant octet or hextet
(depending on the family of the address). The address changes are
mapped and stored in the new multi-vrf metadata in the ansible topology
file.
Multiple BGP features, such as local-as and next-hop-peer, are used in
order to aid in the resolution of routes. This is necessary to keep the
SONiC DUT multi-vrf-agnostic as possible.
Enabling multi-VRF mode:
Multi-VRF mode may be enabled by including the set attribute
use_converged_peers: truein the testbed definition found in sonic-mgmt/ansible/testbed.yaml. This file is read the TesbedProcessing.py script, which sets global variables indicating to other ansible tasks and libraries that the testbed is to be started in multi-VRF mode.In addition, the value of
max_fp_numsmust be adjusted such that each CEOSLab docker container has enough resources to run all the new BGP sessions in each vrf. This can be done dynamically, of course, however for the full-scale topologies the maximum supported by cEOSLab, 127, must be used.Known limitations:
cEOSLab instances do not allow for the creation of interfaces with
interface-IDs greater than 127, when interfaces are layed out unidimensionally.
The use of multiple VRFs has not been tested in conjunction with
asynchronous ansible tasks.