|
| 1 | +# FEC FLR support in SONiC # |
| 2 | + |
| 3 | +## Table of Content |
| 4 | +- [Revision](#revision) |
| 5 | +- [Scope](#scope) |
| 6 | +- [Definitions/Abbreviations](#abbreviations) |
| 7 | +- [1 Overview](#1-overview) |
| 8 | +- [2 Requirements](#2-requirements) |
| 9 | + - [2.1 Functional Requirements](#21-functional-requirements) |
| 10 | + - [2.2 CLI Requirements](#22-cli-requirements) |
| 11 | +- [3 Architecture Design](#3-architecture-design) |
| 12 | +- [4 High level design](#4-high-level-design) |
| 13 | + - [4.1 Assumptions](#41-assumptions) |
| 14 | + - [4.2 SAI counters used](#42-sai-counters-used) |
| 15 | + - [4.3 SAI API](#43-sai-api) |
| 16 | + - [4.4 FEC interleaving](#44-fec-interleaving) |
| 17 | + - [4.5 Observed FEC FLR](#45-observed-fec-flr) |
| 18 | + - [4.6 Predicted FEC FLR](#46-predicted-fec-flr) |
| 19 | +- [5 Sample output](#5-sample-output) |
| 20 | +- [6 Acknowledgements](#6-Acknowledgements) |
| 21 | + |
| 22 | +### Revision |
| 23 | + |
| 24 | + | Rev | Date | Author | Change Description | |
| 25 | + |:---:|:-----------:|:----------------------:|-----------------------------------| |
| 26 | + | 0.1 | 19-Mar-2025 | Pandurangan R S, Vinod Kumar Jammala (Arista Networks)| Initial version | |
| 27 | + | 0.2 | 07-Jul-2025 | Apoorv Sachan, Pandurangan R S, Vinod Kumar Jammala (Arista Networks)| Add predicted FEC FLR | |
| 28 | + |
| 29 | +### Scope |
| 30 | + |
| 31 | +This document describes the implementation of Port Forward Error Correction (FEC) Frame Loss Ratio (FLR) support in SONiC. |
| 32 | + |
| 33 | +### Definitions/Abbreviations |
| 34 | + |
| 35 | + | Term | Definition / Abbreviation | |
| 36 | + |---------|-----------------------------------------------------------------------| |
| 37 | + | CER | Codeword Error Ratio | |
| 38 | + | FEC | Forward Error Correction | |
| 39 | + | FLR | Frame Loss Ratio | |
| 40 | + |
| 41 | +### 1 Overview |
| 42 | +Frame Loss Ratio (FLR) is a key performance metric used to measure the percentage of lost frames relative to the total transmitted frames over a network link. |
| 43 | + |
| 44 | +FLR is expressed as, |
| 45 | + FLR = (Total Transmitted Frames - Total Received Frames) / Total Transmitted Frames |
| 46 | + |
| 47 | +Based on the Forward Error Correction (FEC) data, receiver device can compute and estimate Codeword Error Ratio (CER), and FEC FLR will be calculated from CER. |
| 48 | + |
| 49 | +## 2 Requirements |
| 50 | +### 2.1 Functional Requirements |
| 51 | + This HLD introduces the following enhancements: |
| 52 | + - Calculation of FEC FLR at a configurable interval. |
| 53 | + - Storing per-interface FEC FLR in the Redis DB for telemetry streaming. |
| 54 | + - Enhancement of the `show interfaces counters fec-stats` CLI to include FEC FLR statistics. |
| 55 | + |
| 56 | +### 2.2 CLI Requirements |
| 57 | + |
| 58 | + * The existing `show interfaces counters fec-stats` command will be enhanced to include the following FEC FLR columns: |
| 59 | + - FLR(O), to display observed FEC FLR values. |
| 60 | + - FLR(P), to display predicted FEC FLR values. |
| 61 | + * A new `counterpoll port` sub-command will be introduced to configure FEC FLR interval factor: |
| 62 | + - `counterpoll port flr-interval-factor FLR_INTERVAL_FACTOR` |
| 63 | + - The default value of FLR_INTERVAL_FACTOR will be 120. |
| 64 | + |
| 65 | +## 3 Architecture Design |
| 66 | + |
| 67 | +There are no changes to the current SONiC Architecture. |
| 68 | + |
| 69 | +## 4 High-Level Design |
| 70 | + |
| 71 | + * SWSS changes: |
| 72 | + |
| 73 | + + port_flr.lua |
| 74 | + |
| 75 | + This new lua script will |
| 76 | + - Access the COUNTER_DB for already available counters for SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES, SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES, |
| 77 | + and SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si representing codewords with i symbol errors where i ranges from 0 to 15 in case of RS-544 FEC. |
| 78 | + - Compute both observed and predicted FEC FLR per port. |
| 79 | + - Store the computed FEC FLR values and the previous Redis counter values back into the Redis DB. |
| 80 | + - Perform the FEC FLR computation on each port once every `port_stat POLL_INTERVAL * FLR_INTERVAL_FACTOR` seconds, where FLR_INTERVAL_FACTOR is retrieved from the FLEX_COUNTER_DB. |
| 81 | + |
| 82 | + + portsorch.cpp |
| 83 | + - Link the new "port_flr.lua" script as a plugin to the existing PORT_STAT_COUNTER_FLEX_COUNTER_GROUP, alongside "port_rates.lua". |
| 84 | + |
| 85 | + + flexcounterorch.cpp |
| 86 | + - Enhance "FlexCounterOrch" to propagate FLR_INTERVAL_FACTOR from CONFIG_DB to FLEX_COUNTER_DB. |
| 87 | + |
| 88 | + * Utilities Common changes: |
| 89 | + |
| 90 | + + portstat.py: |
| 91 | + - Enhance the `portstat` command with the `-f` option (used by the CLI command `show interfaces counters fec-stats`) to include the FLR(O) and FLR(P) columns. |
| 92 | + |
| 93 | + + counterpoll/main.py: |
| 94 | + - Add a new argument `flr-interval-factor` to the exisiting `counterpoll port` command. |
| 95 | + |
| 96 | + ``` |
| 97 | + root@sonic:~$ counterpoll port --help |
| 98 | + Usage: counterpoll port [OPTIONS] COMMAND [ARGS]... |
| 99 | +
|
| 100 | + Port counter commands |
| 101 | +
|
| 102 | + Options: |
| 103 | + --help Show this message and exit. |
| 104 | +
|
| 105 | + Commands: |
| 106 | + disable Disable port counter query |
| 107 | + enable Enable port counter query |
| 108 | + interval Set port counter query interval |
| 109 | + flr-interval-factor Set port fec flr interval factor |
| 110 | +
|
| 111 | +
|
| 112 | + root@sonic:~$ counterpoll port flr-interval-factor --help |
| 113 | + Usage: counterpoll port flr-interval-factor [OPTIONS] FLR_INTERVAL_FACTOR |
| 114 | +
|
| 115 | + Set port fec flr interval factor |
| 116 | +
|
| 117 | + Options: |
| 118 | + --help Show this message and exit. |
| 119 | + ``` |
| 120 | +
|
| 121 | +### 4.1 Assumptions |
| 122 | +
|
| 123 | +SAI provide access to each interface the following attributes |
| 124 | +- SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES, which represents the number of uncorrectable FEC codewords. |
| 125 | + - return not support if its not working for an interface |
| 126 | +- SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES, which represents the number of correctable FEC codewords. |
| 127 | + - return not support if its not working for an interface |
| 128 | +- SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si, which represents the number of codewords with i symbol errors. |
| 129 | + - return not support if its not working for an interface |
| 130 | +
|
| 131 | +
|
| 132 | +### 4.2 Sai Counters Used |
| 133 | +
|
| 134 | +The following redis DB entries will be accessed for the FEC FLR calculations |
| 135 | +
|
| 136 | +|Redis DB |Table|Entries|New, RW| Format | Description| |
| 137 | +|--------------|-------------|------------------|--------|----------------|----------------| |
| 138 | +|COUNTER_DB |COUNTERS_PORT_NAME_MAP | oid |R |string |Name to oid mapping | |
| 139 | +|COUNTER_DB |COUNTERS |SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES |R |number |Total number of uncorrectable codewords | |
| 140 | +|COUNTER_DB |COUNTERS |SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES |R |number |Total number of correctable codewords | |
| 141 | +|COUNTER_DB |COUNTERS |SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si |R |number |Total number of codewords with i symbol errors | |
| 142 | +|COUNTER_DB |RATES |FEC_FLR |New, RW| floating |calculated observed FEC FLR | |
| 143 | +|COUNTER_DB |RATES |FEC_FLR_PREDICTED |New, RW| floating |calculated predicted FEC FLR | |
| 144 | +|COUNTER_DB |RATES |SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES_last |NEW, RW |number |Last uncorrectable codewords | |
| 145 | +|COUNTER_DB |RATES |SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES_last |NEW, RW |number |Last correctable codewords | |
| 146 | +|COUNTER_DB |RATES |SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last |NEW, RW |number |Last codewords with i symbol errors | |
| 147 | +
|
| 148 | +
|
| 149 | +### 4.3 SAI API |
| 150 | +
|
| 151 | +No change in the SAI API. No new SAI object accessed. |
| 152 | +
|
| 153 | +### 4.4 FEC interleaving |
| 154 | +With FEC interleaving factor (X) incorporated, As per [IEEE 802.3df Logic Ad Hoc](https://www.ieee802.org/3/df/public/adhoc/logic/22_0630/opsasnick_3df_logic_220630a.pdf) FEC FLR is expressed as |
| 155 | +
|
| 156 | +FEC_FLR = CER * (1 + X * MFC)/MFC, where MFC (MAC frames per codeword) is 8 in the case of RS-544 FEC. Thus, |
| 157 | +
|
| 158 | +For X=1 (no interleaving), FEC_FLR = 1.125 * CER <br> |
| 159 | +For X=2, FEC_FLR = 2.125 * CER <br> |
| 160 | +For X=4, FEC_FLR = 4.125 * CER |
| 161 | +
|
| 162 | +To include the interleaving factor in the FEC FLR computation, a new SAI port attribute will be required to retrieve the underlying port interleaving factor. |
| 163 | +Until such an attribute is available, the interleaving factor can be derived based on the following port speed to interleaving factor mapping: |
| 164 | +
|
| 165 | +| Port Speed | No. of lanes | FEC interleaving factor(X) | |
| 166 | +|------------|--------------|----------------------------| |
| 167 | +| 1600G | 8 | 4 | |
| 168 | +| 800G | 8 | 4 | |
| 169 | +| 400G | 8 | 2 | |
| 170 | +| 400G | 4 | 2 | |
| 171 | +| 200G | 4 | 2 | |
| 172 | +| 200G | 2 | 2 | |
| 173 | +| 100G | 2 | 2 | |
| 174 | +| 100G | 1 | 1 or 2 (autonegotiated) | |
| 175 | +
|
| 176 | +### 4.5 Observed FEC FLR |
| 177 | +
|
| 178 | +``` |
| 179 | +Step 1: calculate observed CER per interval |
| 180 | + Observed CER is expressed as, CER = Uncorrectable FEC codewords / Total FEC codewords Received, which can be expanded to |
| 181 | + |
| 182 | + CER = Uncorrectable FEC codewords / (Uncorrectable FEC codewords + Codewords with no symbol errors + Correctable FEC codewords) |
| 183 | + |
| 184 | + where, Uncorrectable FEC codewords = SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES - SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES_last |
| 185 | + Codewords with no symbol errors = SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0 - SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0_last |
| 186 | + Correctable FEC codewords = SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES - SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES_last |
| 187 | + |
| 188 | + |
| 189 | +Step 2: calculate FEC FLR using CER and considering interleaving factor (X) |
| 190 | + If X=1, FEC_FLR = 1.125 * CER |
| 191 | + If X=2, FEC_FLR = 2.125 * CER |
| 192 | + |
| 193 | + |
| 194 | +Step 3: the following data will be updated and its latest value will be stored in the COUNTER_DB:RATES table after each computation |
| 195 | + |
| 196 | + FEC_FLR, SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES_last, SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES_last and SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0_last |
| 197 | + |
| 198 | +``` |
| 199 | +
|
| 200 | +### 4.6 Predicted FEC FLR |
| 201 | +
|
| 202 | +The goal is to estimate FEC FLR by extrapolating from observed codeword error distribution. |
| 203 | +``` |
| 204 | +Step 1: Prepare codeword error index vector (x) |
| 205 | + |
| 206 | + x = { 1, 2, ..., max_correctable_cw_symbol_errors } |
| 207 | + |
| 208 | + where, max_correctable_cw_symbol_errors = 15 in case of RS-544 |
| 209 | + |
| 210 | + For each index i in vector x, codeword_errors[i] represents number of codewords with i symbol errors in the |
| 211 | + current interval i.e SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si - SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last. |
| 212 | +``` |
| 213 | +
|
| 214 | +
|
| 215 | +The codeword error ratio typically follows an exponential decay curve, as shown in the image below. |
| 216 | + |
| 217 | +
|
| 218 | +``` |
| 219 | +Step 2: Compute logarithm codeword error ratio vector (y) |
| 220 | + |
| 221 | + By applying a logarithm to the codeword error ratio, the exponential decay curve is transformed into a |
| 222 | + linear pattern, making it suitable for linear regression modeling. |
| 223 | + |
| 224 | + For each index i in vector x, compute logarithm of codeword error ratio y[i] as follows |
| 225 | + |
| 226 | + y[i] = log10( codeword_errors[i] / total_codewords ) |
| 227 | + where, total_codewords is total number of codewords |
| 228 | + i.e Σ from i=0 to 15 of (SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si - SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last) |
| 229 | +``` |
| 230 | +
|
| 231 | +
|
| 232 | +The image below shows the linear pattern of the codeword error ratio (CER) after applying a logarithm. |
| 233 | + |
| 234 | +
|
| 235 | +``` |
| 236 | +Step 3: Perform linear regresion to arrive at slope and intercept |
| 237 | + |
| 238 | + slope = (n * Σ(x*y) - Σx * Σy) / (n * Σ(x²) - (Σx)²) |
| 239 | + intercept = (Σy - slope * Σx) / n |
| 240 | + where, n: number of data points (length of x or y vector) |
| 241 | + |
| 242 | + This gives the best-fit line, y = slope * x + intercept. |
| 243 | +``` |
| 244 | +
|
| 245 | +
|
| 246 | +The image below shows the linear regression line along with the logarithmic curve of the codeword error ratio (CER). |
| 247 | + |
| 248 | +
|
| 249 | +``` |
| 250 | +Step 4: Compute extrapolated CER |
| 251 | + |
| 252 | + Using linear regression line, predicted CER for an index representing j symbol errors is |
| 253 | + predicted_cer_j = 10 ^ ( j * slope + intercept ) |
| 254 | + |
| 255 | + The predicted CER for a window of codewords with uncorrectable symbol errors is calculated as: |
| 256 | + predicted_cer = Σ from j=16 to 20 of predicted_cer_j |
| 257 | + |
| 258 | + Note: We use the uncorrectable symbol error window from 16 to 20 because for values above 20, the predicted CER becomes insignificant. |
| 259 | + |
| 260 | + |
| 261 | +Step 5: Compute FLR from extrapolated CER by considering interleaving factor |
| 262 | + If X=1, FEC_FLR_PREDICTED = 1.125 * predicted_cer |
| 263 | + If X=2, FEC_FLR_PREDICTED = 2.125 * predicted_cer |
| 264 | + |
| 265 | + |
| 266 | +Step 6: Store FEC_FLR_PREDICTED, SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_Si_last in the COUNTER_DB:RATES table |
| 267 | +``` |
| 268 | +
|
| 269 | +## 5 Sample CLI Output |
| 270 | +``` |
| 271 | +root@sonic:~$ portstat -f |
| 272 | + IFACE STATE FEC_CORR FEC_UNCORR FEC_SYMBOL_ERR FEC_PRE_BER FEC_POST_BER FLR(O) FLR(P) (Accuracy) |
| 273 | +----------- ------- ---------- ------------ ---------------- ------------- -------------- -------- ------------------- |
| 274 | + Ethernet0 U 0 0 0 0.00e+00 0.00e+00 0 0 |
| 275 | + Ethernet8 U 0 0 0 0.00e+00 0.00e+00 0 0 |
| 276 | + Ethernet16 X 0 0 0 0.00e+00 0.00e+00 0 0 |
| 277 | + Ethernet24 X 0 0 0 0.00e+00 0.00e+00 0 0 |
| 278 | + Ethernet32 U 0 0 0 0.00e+00 0.00e+00 0 0 |
| 279 | + Ethernet40 D 21 0 0 0.00e+00 0.00e+00 0 0 |
| 280 | + Ethernet48 X 0 0 0 0.00e+00 0.00e+00 0 0 |
| 281 | + Ethernet56 X 0 0 0 0.00e+00 0.00e+00 0 0 |
| 282 | + Ethernet64 U 1,334 0 4 0.00e+00 0.00e+00 0 0 |
| 283 | + Ethernet72 U 28,531 0 31 0.00e+00 0.00e+00 0 2.68e-09 (79%) |
| 284 | + Ethernet80 U 25,890 0 25 0.00e+00 0.00e+00 0 6.03e-09 (79%) |
| 285 | + Ethernet88 U 21,909 0 49 0.00e+00 0.00e+00 0 0 |
| 286 | + Ethernet96 U 5,635 0 8 0.00e+00 0.00e+00 0 0 |
| 287 | +Ethernet104 U 21,141 0 7 0.00e+00 0.00e+00 0 7.08e-09 (79%) |
| 288 | +``` |
| 289 | +
|
| 290 | +If FEC is not supported for an interface, the FLR(O) and FLR(P) fields will display `N/A` for the corresponding entry. If there is insufficient data to compute the FEC FLR (for example, if the link is performing well and there are not at least 2 bins with non-zero values for predicting FLR), both the observed (FLR(O)) and predicted (FLR(P)) FLR fields will display `0`. This choice is made for readability and consistency with user expectations in CLI output, as `0` is clearer and more concise than `0.00e+00` in this context. |
| 291 | +
|
| 292 | +## 6 Acknowledgements |
| 293 | +Thanks to Prince and Cameron from Microsoft for sharing the details of the predicted FEC FLR algorithm and the mapping of port speed to interleaving factor. |
0 commit comments