Skip to content

Commit a147159

Browse files
authored
refresh opergroup lab (#189)
1 parent 7667900 commit a147159

File tree

5 files changed

+46
-37
lines changed

5 files changed

+46
-37
lines changed

docs/tutorials/programmability/event-handler/oper-group/lab.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,35 +11,36 @@ As always, this tutorial will be backed up by a lab that readers can effortlessl
1111
3. L2 EVPN service[^1] configured across the leaves of the fabric
1212
4. A telemetry stack to demonstrate oper-group operations in action.
1313

14-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:0,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
14+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=2.1, title='', page=0) }}-
1515

1616
## Physical topology
1717

18-
On a physical layer topology interconnections are layed down as follows:
19-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:5,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
18+
On a physical layer topology interconnections are laid down as follows:
19+
20+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=2.1, title='', page=5) }}-
2021

2122
Each client is dual-homed to corresponding leaves; To achieve that, interfaces `eth1` and `eth2` are formed into a `bond0` interface.
22-
On the leaves side, the access interface `Ethernet-1/1`` is part of a LAG interface that is "stretched" between a pair of leaves, forming a logical construct similar to MC-LAG.
23+
On the leaves side, the access interface `Ethernet-1/1` is part of a LAG interface that is "stretched" between a pair of leaves, forming a logical construct similar to MC-LAG.
2324

24-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:6,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
25+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=3, title='', page=6) }}-
2526

2627
## Fabric underlay
2728

2829
In the underlay of a fabric leaves and spines run eBGP protocol to enable leaves to exchange reachability information for their `system0` interfaces.
2930

30-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:7,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
31+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=3, title='', page=7) }}-
3132

3233
eBGP peerings are formed between each leaf and spine pair.
3334

3435
## Fabric overlay
3536

3637
To support BGP EVPN service, in the overlay iBGP peerings with EVPN address family are established from each leaf to each spine, with spines acting as route reflectors.
3738

38-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:8,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
39+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=3, title='', page=8) }}-
3940

4041
From the EVPN service standpoint, the mac-vrf instance named `vrf-1` is created on leaves and `ES-1` ethernet segment is formed from a LAG interface.
4142

42-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:9,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
43+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=3, title='', page=9) }}-
4344

4445
Ethernet segments are configured to be in an all-active mode to make sure that every access link is utilized in the fabric.
4546

@@ -63,12 +64,12 @@ git clone https://github.com/srl-labs/opergroup-lab.git && cd opergroup-lab
6364
Lab repository contains startup configuration files for the fabric nodes, as well as necessary files for the telemetry stack to come up online operational. To deploy the lab:
6465

6566
```
66-
containerlab deploy -t opergroup.clab.yml
67+
containerlab deploy
6768
```
6869

69-
This will stand up a lab with an already pre-configured fabric using startup configs contained within [`configs`](https://github.com/srl-labs/opergroup-lab/tree/main/configs) directory.
70+
This will bring up a lab with an already pre-configured fabric using startup configs contained within [`configs`](https://github.com/srl-labs/opergroup-lab/tree/main/configs) directory.
7071

71-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:10,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
72+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=3, title='', page=10) }}-
7273

7374
The deployed lab starts up in a pre-provisioned step, where underlay/overlay configuration has already been done. We proceed with oper-group use case exploration in the next chapter of this tutorial.
7475

docs/tutorials/programmability/event-handler/oper-group/oper-group-cfg.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
comments: true
33
---
44

5-
<script type="text/javascript" src="https://cdn.jsdelivr.net/gh/hellt/drawio-js@main/embed2.js" async></script>
5+
<script type="text/javascript" src="https://viewer.diagrams.net/js/viewer-static.min.js" async></script>
66

77
Now that we are [aware of a potential traffic blackholing](problem-statement.md#traffic-loss-scenario) that may happen in the all-active EVPN-based fabrics it is time to meet one of the remediation tactics.
88

docs/tutorials/programmability/event-handler/oper-group/oper-group-intro.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,36 +14,34 @@ tags:
1414
| **Resource requirements** | :fontawesome-solid-microchip: 4 vCPU <br/>:fontawesome-solid-memory: 6 GB |
1515
| **Lab** | [srl-labs/opergroup-lab][lab] |
1616
| **Main ref documents** | [Event Handler Guide][eh-guide] |
17-
| **Version information**[^1] | [`containerlab:0.26.1`][clab-install], [`srlinux:22.3.2`][srlinux-container], [`docker-ce:20.10.2`][docker-install] |
17+
| **Version information**[^1] | [`containerlab:0.65.1`][clab-install], [`srlinux:24.10.2`][srlinux-container], [`docker-ce:26.0.0`][docker-install] |
1818

1919
[lab]: https://github.com/srl-labs/opergroup-lab
2020
[clab-install]: https://containerlab.dev/install/
2121
[srlinux-container]: https://github.com/nokia/srlinux-container-image
2222
[docker-install]: https://docs.docker.com/engine/install/
23-
[refdoc1]: https://nokia.com
23+
[eh-guide]: https://documentation.nokia.com/srlinux/24-10/books/event-handler/event-handler-overview.html
2424

2525
One of the most common use cases that can be covered with the Event Handler framework is known as "Operational group" or "Oper-group" for short. An oper-group feature covers several use cases, but in essence, it creates a relationship between logical elements of a network node so that they become aware of each other - forming a logical group.
2626

2727
In the data center space oper-group feature can tackle the problem of traffic black-holing when leaves lose all connectivity to the spine layer. Consider the following simplified Clos topology where clients are multi-homed to leaves:
2828

29-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:1,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
29+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=2.1, title='', page=1) }}-
3030

3131
With EVPN [all-active multihoming](https://documentation.nokia.com/srlinux/22-3/SR_Linux_Book_Files/Advanced_Solutions_Guide/evpn-l2-multihome.html#ariaid-title22) enabled in fabric traffic from `client1` is load-balanced over the links attached to the upstream leaves and propagates via fabric to its destination.
3232

3333
Since all links of a client' bond interface are active, traffic is hashed to each of the constituent links and thus utilizes all available bandwidth. A problem occurs when a leaf looses connectivity to all upstream spines, as illustrated below:
3434

35-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:2,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
35+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=2.1, title='', page=2) }}-
3636

3737
When `leaf1` loses its uplinks, traffic from `client1` still gets sent to it since the client is not aware of any link loss problems happening on the leaf. This results in traffic blackholing on `leaf1`.
3838

3939
To remedy this particular failure scenario an oper-group can be used. The idea here is to make a logical grouping between certain uplink and downlink interfaces on the leaves so that downlinks would share fate with uplink status. In our example, oper-group can be configured in such a way that leaves will shutdown their downlink interfaces should they detect that uplinks went down. This operational group's workflow depicted below:
4040

41-
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;margin:0 auto; display:block;" data-mxgraph="{&quot;page&quot;:3,&quot;zoom&quot;:3,&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;check-visible-state&quot;:true,&quot;resize&quot;:true,&quot;url&quot;:&quot;https://raw.githubusercontent.com/srl-labs/learn-srlinux/diagrams/opergroup.drawio&quot;}"></div>
41+
-{{ diagram(url='srl-labs/learn-srlinux/diagrams/opergroup.drawio',zoom=3, title='', page=3) }}-
4242

4343
When a leaf loses its uplinks, the oper-group gets notified about that fact and reacts accordingly by operationally disabling the access link towards the client. Once the leaf's downlink transitions to a `down` state, the client's bond interface stops using that particular interface for hashing, and traffic moves over to healthy links. In our example, the client stops sending to `leaf1` and everything gets sent over to `leaf2`.
4444

4545
In this tutorial, we will see how SR Linux's Event Handler framework enables oper-group capability.
4646

4747
[^1]: the following versions have been used to create this tutorial. The newer versions might work; please pin the version to the mentioned ones if they don't.
48-
49-
[eh-guide]: https://documentation.nokia.com/srlinux/22-6/SR_Linux_Book_Files/Event_Handler_Guide/eh-overview.html

docs/tutorials/programmability/event-handler/oper-group/problem-statement.md

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Before we meet the Event Handler framework of SR Linux and leverage it to config
66
As was mentioned in the [introduction](oper-group-intro.md), without oper-group feature traffic loss can occur should any leaf lose all its uplinks. Let's lab a couple of scenarios that highlight a problem that oper-group is set to remedy.
77

88
## Healthy fabric scenario
9+
910
The startup configuration that our lab is equipped with gets our fabric to a state where traffic can be exchanged between clients. Users can verify that by running a simple iperf-based traffic test.
1011

1112
In our lab, `client2` runs iperf3 server, while `client1` acts as a client. With the following command we can run a single stream of TCP data with a bitrate of 200 Kbps:
@@ -35,20 +36,22 @@ Connecting to host 192.168.100.2, port 5201
3536
[ 5] 0.00-10.00 sec 363 KBytes 298 Kbits/sec 0 sender
3637
[ 5] 0.00-10.00 sec 363 KBytes 298 Kbits/sec receiver
3738
```
38-
In addition to iperf results, users can monitor the throughput of `leaf1/2`` links using grafana dashboard:
39+
40+
In addition to iperf results, users can monitor the throughput of `leaf1/2` links using grafana dashboard:
3941
[![grafana](https://gitlab.com/rdodin/pics/-/wikis/uploads/99b290ba11971cc683f221655336ff23/image.png)](https://gitlab.com/rdodin/pics/-/wikis/uploads/99b290ba11971cc683f221655336ff23/image.png)
4042

4143
This visualization tells us that `client1` hashed its single stream[^1] over `client1:eth2` interface that connects to `leaf2:e1-1`. On the "Leaf2 e1-1 throughput" panel in the bottom right we see incoming traffic that indicates data is flowing in via this interface.
4244

4345
Next, we see that `leaf2` used its `e1-50` interface to send data over to a spine layer, through which it reaches `client2` side[^2].
4446

4547
### Load balancing on the client side
46-
Next, it is interesting to verify that client can utilize both links in its `bond0` interface since our L2 EVPN service uses an all-active multihoming mode for the ethernet segment. To test that we need to tell iperf to use at least two parallel streams; that is what `-P` flag is for.
4748

48-
With the following command we start two parallel streams, 200 Kbps bitrate each, and this time for 20 seconds.
49+
Next, it is interesting to verify that client can utilize both links in its `bond0` interface since our L2 EVPN service uses an all-active multihoming mode for the ethernet segment. To test that we need to tell iperf to use eight parallel streams; that is what `-P` flag is for.
50+
51+
With the following command we eight parallel streams, 50 Kbps bitrate each, and this time for 20 seconds.
4952

5053
```bash
51-
docker exec -it client1 iperf3 -c 192.168.100.2 -b 200K -P2 -t 20
54+
docker exec -it client1 iperf3 -c 192.168.100.2 -b 50K -P8 -t 20
5255
```
5356

5457
Our telemetry visualization makes it clear that client-side load balancing is indeed happening as both leaves receive traffic on their `e-1/1` interface.
@@ -57,40 +60,47 @@ Our telemetry visualization makes it clear that client-side load balancing is in
5760

5861
`leaf1` and `leaf2` both chose to use their `e1-49` interface to send the traffic to the spine layer.
5962

60-
??? "Load balancing in the fabric?"
61-
You may have noticed that when we sent two parallel streams client hashed two streams over two links in its bond interface. But then leaves used a single uplink interface towards the fabric. This is due to the fact that each leaf got a single "stream" and thus a single uplink interface was utilized.
63+
/// details | Load balancing in the fabric?
64+
You may have noticed that when we send a few streams (for example two parallel streams), the client may hash the two streams over two links in its bond interface. But then leaves used a single uplink interface towards the fabric. This is due to the fact that each leaf got a single "stream" and thus a single uplink interface was utilized.
6265

63-
We can see ECMP in the fabric happening if we send more streams, for example, eight of them:
64-
```bash
65-
docker exec -it client1 iperf3 -c 192.168.100.2 -b 200K -P 8 -t 20
66-
```
66+
We can see ECMP in the fabric happening if we send more streams, for example, eight of them:
6767

68-
That way leaves will have more streams to handle and they will load balance the streams nicely as shown in [this picture](https://gitlab.com/rdodin/pics/-/wikis/uploads/85bd945ff272db2da4d4cd1132c47803/image.png).
68+
```bash
69+
docker exec -it client1 iperf3 -c 192.168.100.2 -b 20K -P 10 -t 20
70+
```
71+
72+
That way leaves will have more streams to handle and they will load balance the streams nicely as shown in [this picture](https://gitlab.com/rdodin/pics/-/wikis/uploads/85bd945ff272db2da4d4cd1132c47803/image.png).
73+
///
6974

7075
## Traffic loss scenario
76+
7177
Now to the interesting part. What happens if one of the leaves suddenly loses all its uplinks while traffic is mid-flight? Will traffic be re-routed to healthy leaf? Will it be dropped? Let's lab it out.
7278

7379
We will send 4 streams for 40 seconds long and somewhere in the middle we will execute `set-uplinks.sh` script which administratively disables uplinks on a given leaf:
7480

7581
1. Start the traffic generators
82+
7683
```bash
77-
docker exec -it client1 iperf3 -c 192.168.100.2 -b 200K -P 4 -t 40
84+
docker exec -it client1 iperf3 -c 192.168.100.2 -b 50K -P 8 -t 40
7885
```
86+
7987
2. Wait ~20s for graphs to form shape
8088
3. Put down both uplinks on `leaf1`
89+
8190
```bash
8291
bash set-uplinks.sh leaf1 "{49..50}" disable
8392
```
93+
8494
4. Monitor the traffic distribution
8595

8696
Here is a video demonstrating this workflow:
8797

8898
<video width="100%" controls><source src="https://gitlab.com/rdodin/pics/-/wikis/uploads/140a5861e85014aa329804e8cecdb6c8/2022-05-06_14-54-41.mp4" type="video/mp4"></video>
8999

90-
Let's see what exactly is happening there.
100+
Let's see what exactly is happening there.
91101
92-
* [00:00 - 00:15] We started four streams 200Kbps bitrate each, summing up to 800Kbps. Those for streams were evenly distributed over the two links of a bond interface of our `client1`.
93-
Both leaves report 400 Kbps of traffic detected on their `e1-1` interface, so each leaf handles two streams each.
102+
* [00:00 - 00:15] We started eight streams. Those for streams were evenly distributed over the two links of a bond interface of our `client1`.
103+
Both leaves report the same amount of traffic detected on their `e1-1` interface, so each leaf handles two streams each.
94104
Leaves then load balance these two streams over their two uplinks. We see that both `e1-49` and `e1-50` report outgoing bitrate to be ~200Kbps, which is a bitrate of a single stream we configured. That way every uplink on our leaves is utilized and handling a stream of data.
95105
* [00:34 - 01:00] At this very moment, we execute `bash set-uplinks.sh leaf1 disable` putting uplinks on `leaf1` administratively down. The bottom left panel immediately indicates that the operational status of both uplinks went down.
96106
But pay close attention to what is happening with traffic throughput. Traffic rate on `leaf1` access interface drops immediately, as TCP sessions of the streams it was handling stopped to receive ACKs.
@@ -99,4 +109,4 @@ Let's see what exactly is happening there.
99109
This scenario opens the stage for oper-group, as this feature provides means to make sure that a client won't use a link that is connected to a leaf that has no means to forward traffic to the fabric.
100110
101111
[^1]: iperf3 sends data as a single stream, until `-P` flag is set.
102-
[^2]: when you start traffic for the first time, you might wonder why a leaf that is not used for traffic forwarding gets some traffic on its uplink interface for a brief moment as shown [here](https://twitter.com/ntdvps/status/1522265449265864706). Check out this link to see why is this happening.
112+
[^2]: when you start traffic for the first time, you might wonder why a leaf that is not used for traffic forwarding gets some traffic on its uplink interface for a brief moment as shown [here](https://twitter.com/ntdvps/status/1522265449265864706). Check out this link to see why is this happening.

macros/main.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ def define_env(env):
99
"""
1010

1111
@env.macro
12-
def diagram(url, page, title, zoom=2):
12+
def diagram(url, page: int, title: str, zoom: int = 2):
1313
"""
1414
Diagram macro
1515
"""
@@ -45,7 +45,7 @@ def video(url):
4545
"""
4646

4747
return video_tmpl
48-
48+
4949
@env.macro
5050
def youtube(url):
5151
"""

0 commit comments

Comments
 (0)