202511 multi-vrf #21866

wrideout-arista · 2026-01-08T14:51:13Z

Description of PR

Converge cEOSLab peer containers via the use of VRFs and VLANs

Type of change

Approach

Converging the total number of peer switches into the fewest possible
number of cEOSLab containers reduces the overall resource constraints
required to run large numbers of peers. The basic premises behind
convergence are as follows:

cEOSLab peers in docker containers may be converged into a smaller
number of host peers.
The SONiC-facing configuration of each BGP peer may be separated in
routing and bridging via the use of VRFs.
The PTF-facing configuration of each BGP peer may be separated within
each VRF via VLAN tagging, enabling the use of a single backplane
interface on each host cEOSLab container.
Each VRF includes a number of interfaces either facing the SONiC DUT
or the backplane.
Changes should be as transparent to the SONiC DUT as possible.
At the time of testbed setup, the ansible topology file for the testbed
is modified to include new metadata specific to multi-vrf configuration,
and the VMs list is trimmed to only include those containers which will
host multiple BGP peerings, separated by VRF. The new metadata includes
mappings between host containers and VRFs, backplane VLAN mappings, and
BGP session parameters.

VLAN tag 2000 is used as the starting value for all VLANs between the
test infrastructure PTF container interfaces and cEOSLab device
interfaces.

The IP and IPv6 addresses used to connect the cEOSLab peer and
infrastructure PTF container are generated in order to
make the backplane connections clearer, more unique, and easier to
implement. In general, backplane L3 addresses used by the CEOSLab peer
end in even numbers, and those used by the PTF container end in odd
numbers. All addresses generated for use in backplane connections start
with the value 100 (0x64) in the least-significant octet or hextet
(depending on the family of the address). The address changes are
mapped and stored in the new multi-vrf metadata in the ansible topology
file.

Multiple BGP features, such as local-as and next-hop-peer, are used in
order to aid in the resolution of routes. This is necessary to keep the
SONiC DUT multi-vrf-agnostic as possible.

Enabling multi-VRF mode:

Multi-VRF mode may be enabled by including the set attribute use_converged_peers: true in the testbed definition found in sonic-mgmt/ansible/testbed.yaml. This file is read the TesbedProcessing.py script, which sets global variables indicating to other ansible tasks and libraries that the testbed is to be started in multi-VRF mode.

In addition, the value of max_fp_nums must be adjusted such that each CEOSLab docker container has enough resources to run all the new BGP sessions in each vrf. This can be done dynamically, of course, however for the full-scale topologies the maximum supported by cEOSLab, 127, must be used.

Known limitations:

cEOSLab instances do not allow for the creation of interfaces with
interface-IDs greater than 127, when interfaces are layed out unidimensionally.
The use of multiple VRFs has not been tested in conjunction with
asynchronous ansible tasks.

Introduce infrastructure changes required to converge multiple BGP peers into a minimum number of cEOSLab hosts, via the use of VLANs and VRFs. Overview of peer convergence: Converging the total number of peer switches into the fewest possible number of cEOSLab containers reduces the overall resource constraints required to run large numbers of peers. The basic premises behind convergence are as follows: - cEOSLab peers in docker containers may be converged into a smaller number of host peers. - The SONiC-facing configuration of each BGP peer may be separated in routing and bridging via the use of VRFs. - The PTF-facing configuration of each BGP peer may be separated within each VRF via VLAN tagging, enabling the use of a single backplane interface on each host cEOSLab container. - Each VRF includes a number of interfaces either facing the SONiC DUT or the backplane. - Changes should be as transparent to the SONiC DUT as possible. At the time of testbed setup, the ansible topology file for the testbed is modified to include new metadata specific to multi-vrf configuration, and the VMs list is trimmed to only include those containers which will host multiple BGP peerings, separated by VRF. The new metadata includes mappings between host containers and VRFs, backplane VLAN mappings, and BGP session parameters. VLAN tag 2000 is used as the starting value for all VLANs between the test infrastructure PTF container interfaces and cEOSLab device interfaces. The IP and IPv6 addresses used to connect the cEOSLab peer and infrastructure PTF container are generated in order to make the backplane connections clearer, more unique, and easier to implement. In general, backplane L3 addresses used by the CEOSLab peer end in even numbers, and those used by the PTF container end in odd numbers. All addresses generated for use in backplane connections start with the value 100 (0x64) in the least-significant octet or hextet (depending on the family of the address). The address changes are mapped and stored in the new multi-vrf metadata in the ansible topology file. Multiple BGP features, such as local-as and next-hop-peer, are used in order to aid in the resolution of routes. This is necessary to keep the SONiC DUT multi-vrf-agnostic as possible. Known limitations: - cEOSLab instances do not allow for the creation of interfaces with interface-IDs greater than 127, when interfaces are layed out unidimensionally. - The use of multiple VRFs has not been tested in conjunction with asynchronous ansible tasks. Signed-off-by: Will Rideout <[email protected]>

mssonicbld · 2026-01-08T14:51:20Z

/azp run

azure-pipelines · 2026-01-08T14:51:35Z

Azure Pipelines successfully started running 1 pipeline(s).

wrideout-arista · 2026-01-13T14:38:21Z

Hi, @wrideout-arista , while deploying, we met such issue

TASK [vm_set : Bind topology t1-isolated-d448u15-lag to VMs. base vm = VM77200] ****************************************************************************
Tuesday 13 January 2026 01:25:56 +0000 (0:00:00.095) 0:04:33.602 *******
fatal: [STR4-ACS-SERV-77]: FAILED! => {"changed": false, "msg": "Wrong vlans parameter for hostname ARISTA01T0, vm VM77200. Too many vlans. Maximum is 4"}

It seems that the parameter max_fp_num is using default value. Can you check if any changes are missing in this PR?

@yutongzhang-microsoft for the full-topo you well need to adjust the maxFpNum as set in the testbed.yaml file to 127. Apologies for not mentioning this earlier-- I will update the instructions above.

mssonicbld · 2026-01-22T15:19:33Z

/azp run

azure-pipelines · 2026-01-22T15:19:51Z

Azure Pipelines successfully started running 1 pipeline(s).

In order for multi-vrf to support the redeploy-topo CLI command, the VLAN interfaces created in the ptf container must be cleaned up in the topo removal phase. In addition, when creating the VLAN interfaces in the ptf container during the topo add phase, check for existence of the VLAN interface first. If it already exists, then clear all IP adderesses associated with the interface and skip interface creation. Otherwise, create the VLAN interface as normal. Existence checking must be done, as the topo removal phase does not stop the redeployment of the topo if it fails, so we may end up adding the topo with containers and config in a indeterminate state. Signed-off-by: Will Rideout <[email protected]>

mssonicbld · 2026-01-22T15:38:34Z

/azp run

azure-pipelines · 2026-01-22T15:38:53Z

Azure Pipelines successfully started running 1 pipeline(s).

Fix the fetching of the intf-offset when running ipv6 bgp scale tests on multi-vrf testbeds. The offset is now a member of a dictionary inside the multi-vrf intf_mapping metadata. Signed-off-by: Will Rideout <[email protected]>

mssonicbld · 2026-01-22T15:53:23Z

/azp run

azure-pipelines · 2026-01-22T15:53:41Z

Azure Pipelines successfully started running 1 pipeline(s).

In bgp/test_bgp_allow_list.py tests, pass vrf information if running on a multi-vrf testbed when getting bgp route information from a peer. Otherwise, use the vrf "default". Signed-off-by: Will Rideout <[email protected]>

mssonicbld · 2026-01-22T16:09:46Z

/azp run

azure-pipelines · 2026-01-22T16:10:03Z

Azure Pipelines successfully started running 1 pipeline(s).

When running bgp traffic-shift tests on multi-vrf testbeds, fetch the current vrf (peer) from nbrhosts metadata, and use it to pass the vrf to bgp show commands. This was verified to fix traffic-shift tests which were failing as unable to verify routes on the ceoslab peers. Signed-off-by: Will Rideout <[email protected]>

mssonicbld · 2026-01-22T19:09:03Z

/azp run

azure-pipelines · 2026-01-22T19:09:20Z

Azure Pipelines successfully started running 1 pipeline(s).

When running qos testing for dscp on multi-vrf testbeds, extract the vm offset from the multi-vrf metadata instead of the shortened vm list in the test topology. This was verified to fix KeyErrors on multi-vrf testsbeds thrown in qos/test_qos_dscp_mapping.py. Signed-off-by: Will Rideout <[email protected]>

mssonicbld · 2026-01-23T17:26:35Z

/azp run

azure-pipelines · 2026-01-23T17:26:50Z

Azure Pipelines successfully started running 1 pipeline(s).

wrideout-arista requested review from auspham, judyjoseph, wangxin, yejianquan and yxieca as code owners January 8, 2026 14:51

wrideout-arista marked this pull request as draft January 8, 2026 14:51

github-actions bot requested review from r12f and sdszhang January 8, 2026 14:51

wrideout-arista mentioned this pull request Jan 8, 2026

202511 multi-vrf #21839

Closed

5 tasks

Merge branch '202511' into 202511_multi_vrf

f8b5318

multi-vrf testbeds: fix bgp scale intf offset

5d483eb

Fix the fetching of the intf-offset when running ipv6 bgp scale tests on multi-vrf testbeds. The offset is now a member of a dictionary inside the multi-vrf intf_mapping metadata. Signed-off-by: Will Rideout <[email protected]>

multi-vrf testbed: Use vrfs when getting routes

e2d977f

In bgp/test_bgp_allow_list.py tests, pass vrf information if running on a multi-vrf testbed when getting bgp route information from a peer. Otherwise, use the vrf "default". Signed-off-by: Will Rideout <[email protected]>

wrideout-arista marked this pull request as ready for review January 23, 2026 00:46

202511 multi-vrf #21866

Are you sure you want to change the base?

202511 multi-vrf #21866

Conversation

wrideout-arista commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Approach

Enabling multi-VRF mode:

Known limitations:

Uh oh!

mssonicbld commented Jan 8, 2026

Uh oh!

azure-pipelines bot commented Jan 8, 2026

Uh oh!

wrideout-arista commented Jan 13, 2026

Uh oh!

mssonicbld commented Jan 22, 2026

Uh oh!

azure-pipelines bot commented Jan 22, 2026

Uh oh!

mssonicbld commented Jan 22, 2026

Uh oh!

azure-pipelines bot commented Jan 22, 2026

Uh oh!

mssonicbld commented Jan 22, 2026

Uh oh!

azure-pipelines bot commented Jan 22, 2026

Uh oh!

mssonicbld commented Jan 22, 2026

Uh oh!

azure-pipelines bot commented Jan 22, 2026

Uh oh!

mssonicbld commented Jan 22, 2026

Uh oh!

azure-pipelines bot commented Jan 22, 2026

Uh oh!

mssonicbld commented Jan 23, 2026

Uh oh!

azure-pipelines bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wrideout-arista commented Jan 8, 2026 •

edited

Loading