Skip to content

Commit 1b7dff9

Browse files
HLD for port FEC FLR support (#1948)
* [FEC FLR] Add initial version of HLD for FEC FLR support in SONiC This document provides information about the implementation of Port Forward Error Correction (FEC) Frame Loss Ratio (FLR) support in SONiC. * Update HLD * Update HLD to include predicted FEC FLR design * Add configurable interval factor and interleaving mapping. * Minor nit corrections * Address review comments 1) Add images to help understand predicted FEC FLR better. 2) Display accuracy (R^2) information along with predicted FEC FLR value. 3) Add small note on using window from 16 to 20. Update FEC FLR formula considering interleaving factor to include MFC. * Address review comments 1) portstat CLI fields renaming, FEC_FLR --> FLR(O) and FEC_FLR_PREDICTED --> FLR(P) 2) Rename "fec-flr-interval-factor" to "flr-interval-factor" 3) CLI output explanation is updated to highlight that atleast 2 non-zero bin values is needed for predicted flr computation.
1 parent 0de5d21 commit 1b7dff9

File tree

4 files changed

+293
-0
lines changed

4 files changed

+293
-0
lines changed
25.8 KB
Loading
31.3 KB
Loading
43.9 KB
Loading

doc/port_fec_flr/port_fec_flr.md

Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
# FEC FLR support in SONiC #
2+
3+
## Table of Content
4+
- [Revision](#revision)
5+
- [Scope](#scope)
6+
- [Definitions/Abbreviations](#abbreviations)
7+
- [1 Overview](#1-overview)
8+
- [2 Requirements](#2-requirements)
9+
- [2.1 Functional Requirements](#21-functional-requirements)
10+
- [2.2 CLI Requirements](#22-cli-requirements)
11+
- [3 Architecture Design](#3-architecture-design)
12+
- [4 High level design](#4-high-level-design)
13+
- [4.1 Assumptions](#41-assumptions)
14+
- [4.2 SAI counters used](#42-sai-counters-used)
15+
- [4.3 SAI API](#43-sai-api)
16+
- [4.4 FEC interleaving](#44-fec-interleaving)
17+
- [4.5 Observed FEC FLR](#45-observed-fec-flr)
18+
- [4.6 Predicted FEC FLR](#46-predicted-fec-flr)
19+
- [5 Sample output](#5-sample-output)
20+
- [6 Acknowledgements](#6-Acknowledgements)
21+
22+
### Revision
23+
24+
| Rev | Date | Author | Change Description |
25+
|:---:|:-----------:|:----------------------:|-----------------------------------|
26+
| 0.1 | 19-Mar-2025 | Pandurangan R S, Vinod Kumar Jammala (Arista Networks)| Initial version |
27+
| 0.2 | 07-Jul-2025 | Apoorv Sachan, Pandurangan R S, Vinod Kumar Jammala (Arista Networks)| Add predicted FEC FLR |
28+
29+
### Scope
30+
31+
This document describes the implementation of Port Forward Error Correction (FEC) Frame Loss Ratio (FLR) support in SONiC.
32+
33+
### Definitions/Abbreviations
34+
35+
| Term | Definition / Abbreviation |
36+
|---------|-----------------------------------------------------------------------|
37+
| CER | Codeword Error Ratio |
38+
| FEC | Forward Error Correction |
39+
| FLR | Frame Loss Ratio |
40+
41+
### 1 Overview
42+
Frame Loss Ratio (FLR) is a key performance metric used to measure the percentage of lost frames relative to the total transmitted frames over a network link.
43+
44+
FLR is expressed as,
45+
FLR = (Total Transmitted Frames - Total Received Frames) / Total Transmitted Frames
46+
47+
Based on the Forward Error Correction (FEC) data, receiver device can compute and estimate Codeword Error Ratio (CER), and FEC FLR will be calculated from CER.
48+
49+
## 2 Requirements
50+
### 2.1 Functional Requirements
51+
This HLD introduces the following enhancements:
52+
- Calculation of FEC FLR at a configurable interval.
53+
- Storing per-interface FEC FLR in the Redis DB for telemetry streaming.
54+
- Enhancement of the `show interfaces counters fec-stats` CLI to include FEC FLR statistics.
55+
56+
### 2.2 CLI Requirements
57+
58+
* The existing `show interfaces counters fec-stats` command will be enhanced to include the following FEC FLR columns:
59+
- FLR(O), to display observed FEC FLR values.
60+
- FLR(P), to display predicted FEC FLR values.
61+
* A new `counterpoll port` sub-command will be introduced to configure FEC FLR interval factor:
62+
- `counterpoll port flr-interval-factor FLR_INTERVAL_FACTOR`
63+
- The default value of FLR_INTERVAL_FACTOR will be 120.
64+
65+
## 3 Architecture Design
66+
67+
There are no changes to the current SONiC Architecture.
68+
69+
## 4 High-Level Design
70+
71+
* SWSS changes:
72+
73+
+ port_flr.lua
74+
75+
This new lua script will
76+
- Access the COUNTER_DB for already available counters for SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES, SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,
77+
and SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si representing codewords with i symbol errors where i ranges from 0 to 15 in case of RS-544 FEC.
78+
- Compute both observed and predicted FEC FLR per port.
79+
- Store the computed FEC FLR values and the previous Redis counter values back into the Redis DB.
80+
- Perform the FEC FLR computation on each port once every `port_stat POLL_INTERVAL * FLR_INTERVAL_FACTOR` seconds, where FLR_INTERVAL_FACTOR is retrieved from the FLEX_COUNTER_DB.
81+
82+
+ portsorch.cpp
83+
- Link the new "port_flr.lua" script as a plugin to the existing PORT_STAT_COUNTER_FLEX_COUNTER_GROUP, alongside "port_rates.lua".
84+
85+
+ flexcounterorch.cpp
86+
- Enhance "FlexCounterOrch" to propagate FLR_INTERVAL_FACTOR from CONFIG_DB to FLEX_COUNTER_DB.
87+
88+
* Utilities Common changes:
89+
90+
+ portstat.py:
91+
- Enhance the `portstat` command with the `-f` option (used by the CLI command `show interfaces counters fec-stats`) to include the FLR(O) and FLR(P) columns.
92+
93+
+ counterpoll/main.py:
94+
- Add a new argument `flr-interval-factor` to the exisiting `counterpoll port` command.
95+
96+
```
97+
root@sonic:~$ counterpoll port --help
98+
Usage: counterpoll port [OPTIONS] COMMAND [ARGS]...
99+
100+
Port counter commands
101+
102+
Options:
103+
--help Show this message and exit.
104+
105+
Commands:
106+
disable Disable port counter query
107+
enable Enable port counter query
108+
interval Set port counter query interval
109+
flr-interval-factor Set port fec flr interval factor
110+
111+
112+
root@sonic:~$ counterpoll port flr-interval-factor --help
113+
Usage: counterpoll port flr-interval-factor [OPTIONS] FLR_INTERVAL_FACTOR
114+
115+
Set port fec flr interval factor
116+
117+
Options:
118+
--help Show this message and exit.
119+
```
120+
121+
### 4.1 Assumptions
122+
123+
SAI provide access to each interface the following attributes
124+
- SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES, which represents the number of uncorrectable FEC codewords.
125+
- return not support if its not working for an interface
126+
- SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES, which represents the number of correctable FEC codewords.
127+
- return not support if its not working for an interface
128+
- SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si, which represents the number of codewords with i symbol errors.
129+
- return not support if its not working for an interface
130+
131+
132+
### 4.2 Sai Counters Used
133+
134+
The following redis DB entries will be accessed for the FEC FLR calculations
135+
136+
|Redis DB |Table|Entries|New, RW| Format | Description|
137+
|--------------|-------------|------------------|--------|----------------|----------------|
138+
|COUNTER_DB |COUNTERS_PORT_NAME_MAP | oid |R |string |Name to oid mapping |
139+
|COUNTER_DB |COUNTERS |SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES |R |number |Total number of uncorrectable codewords |
140+
|COUNTER_DB |COUNTERS |SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES |R |number |Total number of correctable codewords |
141+
|COUNTER_DB |COUNTERS |SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si |R |number |Total number of codewords with i symbol errors |
142+
|COUNTER_DB |RATES |FEC_FLR |New, RW| floating |calculated observed FEC FLR |
143+
|COUNTER_DB |RATES |FEC_FLR_PREDICTED |New, RW| floating |calculated predicted FEC FLR |
144+
|COUNTER_DB |RATES |SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES_last |NEW, RW |number |Last uncorrectable codewords |
145+
|COUNTER_DB |RATES |SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES_last |NEW, RW |number |Last correctable codewords |
146+
|COUNTER_DB |RATES |SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last |NEW, RW |number |Last codewords with i symbol errors |
147+
148+
149+
### 4.3 SAI API
150+
151+
No change in the SAI API. No new SAI object accessed.
152+
153+
### 4.4 FEC interleaving
154+
With FEC interleaving factor (X) incorporated, As per [IEEE 802.3df Logic Ad Hoc](https://www.ieee802.org/3/df/public/adhoc/logic/22_0630/opsasnick_3df_logic_220630a.pdf) FEC FLR is expressed as
155+
156+
FEC_FLR = CER * (1 + X * MFC)/MFC, where MFC (MAC frames per codeword) is 8 in the case of RS-544 FEC. Thus,
157+
158+
For X=1 (no interleaving), FEC_FLR = 1.125 * CER <br>
159+
For X=2, FEC_FLR = 2.125 * CER <br>
160+
For X=4, FEC_FLR = 4.125 * CER
161+
162+
To include the interleaving factor in the FEC FLR computation, a new SAI port attribute will be required to retrieve the underlying port interleaving factor.
163+
Until such an attribute is available, the interleaving factor can be derived based on the following port speed to interleaving factor mapping:
164+
165+
| Port Speed | No. of lanes | FEC interleaving factor(X) |
166+
|------------|--------------|----------------------------|
167+
| 1600G | 8 | 4 |
168+
| 800G | 8 | 4 |
169+
| 400G | 8 | 2 |
170+
| 400G | 4 | 2 |
171+
| 200G | 4 | 2 |
172+
| 200G | 2 | 2 |
173+
| 100G | 2 | 2 |
174+
| 100G | 1 | 1 or 2 (autonegotiated) |
175+
176+
### 4.5 Observed FEC FLR
177+
178+
```
179+
Step 1: calculate observed CER per interval
180+
Observed CER is expressed as, CER = Uncorrectable FEC codewords / Total FEC codewords Received, which can be expanded to
181+
182+
CER = Uncorrectable FEC codewords / (Uncorrectable FEC codewords + Codewords with no symbol errors + Correctable FEC codewords)
183+
184+
where, Uncorrectable FEC codewords = SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES - SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES_last
185+
Codewords with no symbol errors = SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0 - SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0_last
186+
Correctable FEC codewords = SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES - SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES_last
187+
188+
189+
Step 2: calculate FEC FLR using CER and considering interleaving factor (X)
190+
If X=1, FEC_FLR = 1.125 * CER
191+
If X=2, FEC_FLR = 2.125 * CER
192+
193+
194+
Step 3: the following data will be updated and its latest value will be stored in the COUNTER_DB:RATES table after each computation
195+
196+
FEC_FLR, SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES_last, SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES_last and SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0_last
197+
198+
```
199+
200+
### 4.6 Predicted FEC FLR
201+
202+
The goal is to estimate FEC FLR by extrapolating from observed codeword error distribution.
203+
```
204+
Step 1: Prepare codeword error index vector (x)
205+
206+
x = { 1, 2, ..., max_correctable_cw_symbol_errors }
207+
208+
where, max_correctable_cw_symbol_errors = 15 in case of RS-544
209+
210+
For each index i in vector x, codeword_errors[i] represents number of codewords with i symbol errors in the
211+
current interval i.e SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si - SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last.
212+
```
213+
214+
215+
The codeword error ratio typically follows an exponential decay curve, as shown in the image below.
216+
![Exponential decay curve of CER](./img/Exponential_decay_curve_of_CER.png)
217+
218+
```
219+
Step 2: Compute logarithm codeword error ratio vector (y)
220+
221+
By applying a logarithm to the codeword error ratio, the exponential decay curve is transformed into a
222+
linear pattern, making it suitable for linear regression modeling.
223+
224+
For each index i in vector x, compute logarithm of codeword error ratio y[i] as follows
225+
226+
y[i] = log10( codeword_errors[i] / total_codewords )
227+
where, total_codewords is total number of codewords
228+
i.e Σ from i=0 to 15 of (SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si - SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last)
229+
```
230+
231+
232+
The image below shows the linear pattern of the codeword error ratio (CER) after applying a logarithm.
233+
![Logarithm curve of CER](./img/Logarithm_curve_of_CER.png)
234+
235+
```
236+
Step 3: Perform linear regresion to arrive at slope and intercept
237+
238+
slope = (n * Σ(x*y) - Σx * Σy) / (n * Σ(x²) - (Σx)²)
239+
intercept = (Σy - slope * Σx) / n
240+
where, n: number of data points (length of x or y vector)
241+
242+
This gives the best-fit line, y = slope * x + intercept.
243+
```
244+
245+
246+
The image below shows the linear regression line along with the logarithmic curve of the codeword error ratio (CER).
247+
![Logarithm curve of CER and Linear regression fit](./img/Logarithm_curve_of_CER_and_Linear_regression_fit.png)
248+
249+
```
250+
Step 4: Compute extrapolated CER
251+
252+
Using linear regression line, predicted CER for an index representing j symbol errors is
253+
predicted_cer_j = 10 ^ ( j * slope + intercept )
254+
255+
The predicted CER for a window of codewords with uncorrectable symbol errors is calculated as:
256+
predicted_cer = Σ from j=16 to 20 of predicted_cer_j
257+
258+
Note: We use the uncorrectable symbol error window from 16 to 20 because for values above 20, the predicted CER becomes insignificant.
259+
260+
261+
Step 5: Compute FLR from extrapolated CER by considering interleaving factor
262+
If X=1, FEC_FLR_PREDICTED = 1.125 * predicted_cer
263+
If X=2, FEC_FLR_PREDICTED = 2.125 * predicted_cer
264+
265+
266+
Step 6: Store FEC_FLR_PREDICTED, SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last in the COUNTER_DB:RATES table
267+
```
268+
269+
## 5 Sample CLI Output
270+
```
271+
root@sonic:~$ portstat -f
272+
IFACE STATE FEC_CORR FEC_UNCORR FEC_SYMBOL_ERR FEC_PRE_BER FEC_POST_BER FLR(O) FLR(P) (Accuracy)
273+
----------- ------- ---------- ------------ ---------------- ------------- -------------- -------- -------------------
274+
Ethernet0 U 0 0 0 0.00e+00 0.00e+00 0 0
275+
Ethernet8 U 0 0 0 0.00e+00 0.00e+00 0 0
276+
Ethernet16 X 0 0 0 0.00e+00 0.00e+00 0 0
277+
Ethernet24 X 0 0 0 0.00e+00 0.00e+00 0 0
278+
Ethernet32 U 0 0 0 0.00e+00 0.00e+00 0 0
279+
Ethernet40 D 21 0 0 0.00e+00 0.00e+00 0 0
280+
Ethernet48 X 0 0 0 0.00e+00 0.00e+00 0 0
281+
Ethernet56 X 0 0 0 0.00e+00 0.00e+00 0 0
282+
Ethernet64 U 1,334 0 4 0.00e+00 0.00e+00 0 0
283+
Ethernet72 U 28,531 0 31 0.00e+00 0.00e+00 0 2.68e-09 (79%)
284+
Ethernet80 U 25,890 0 25 0.00e+00 0.00e+00 0 6.03e-09 (79%)
285+
Ethernet88 U 21,909 0 49 0.00e+00 0.00e+00 0 0
286+
Ethernet96 U 5,635 0 8 0.00e+00 0.00e+00 0 0
287+
Ethernet104 U 21,141 0 7 0.00e+00 0.00e+00 0 7.08e-09 (79%)
288+
```
289+
290+
If FEC is not supported for an interface, the FLR(O) and FLR(P) fields will display `N/A` for the corresponding entry. If there is insufficient data to compute the FEC FLR (for example, if the link is performing well and there are not at least 2 bins with non-zero values for predicting FLR), both the observed (FLR(O)) and predicted (FLR(P)) FLR fields will display `0`. This choice is made for readability and consistency with user expectations in CLI output, as `0` is clearer and more concise than `0.00e+00` in this context.
291+
292+
## 6 Acknowledgements
293+
Thanks to Prince and Cameron from Microsoft for sharing the details of the predicted FEC FLR algorithm and the mapping of port speed to interleaving factor.

0 commit comments

Comments
 (0)