[vslib] Fix VS SAI reporting 0xFFFFFFFF oper speed for virtio NICs#1763
[vslib] Fix VS SAI reporting 0xFFFFFFFF oper speed for virtio NICs#1763rustiqly wants to merge 1 commit intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Companion PR: sonic-net/sonic-buildimage#25428 (enables |
|
@rustiqly , can you also add unit test for this PR? |
70620e1 to
1580a71
Compare
|
Added 3 unit tests in
Also building a VS image with both fixes to verify end-to-end on KVM. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1580a71 to
400fff5
Compare
|
/azp run |
|
@lguohan Fixed — the CI failure was |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Fixes incorrect reporting of SAI_PORT_ATTR_OPER_SPEED in VS when sysfs returns “unknown” speed (e.g., -1 on virtio), avoiding the 0xFFFFFFFF wraparound and using configured port speed as a fallback.
Changes:
- Update
vs_get_oper_speed()to read sysfs speed as signed and reject invalid values (<= 0) with a warning. - Update
refresh_port_oper_speed()to fall back toSAI_PORT_ATTR_SPEEDwhen operational speed can’t be read. - Add unit tests intended to cover configured-speed and fallback behavior; update spellcheck word list.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| vslib/SwitchStateBaseHostif.cpp | Reads sysfs speed as int32_t and rejects invalid/unknown values before assigning to uint32_t. |
| vslib/SwitchStateBase.cpp | Falls back to configured port speed when operational speed read fails. |
| unittest/vslib/TestSwitchBCM56850.cpp | Adds tests for oper-speed refresh scenarios (but currently calls a protected method directly). |
| tests/aspell.en.pws | Adds new words used by comments/logs/tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
400fff5 to
9ce11d3
Compare
|
@lguohan Found it — the failing test was |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 1763 in repo sonic-net/sonic-sairedis |
|
@lguohan The CI failure is All our tests passed: Triggered a rerun. |
9ce11d3 to
1b69b8e
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
3d8151c to
5491ddc
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
still have concern on this PR. this could hide some error in case some virtual nic does not return correct speed. |
|
@lguohan That's a fair concern. To clarify how the fallback works:
So it won't silently hide a broken NIC — the warning log surfaces it. Without this fix, VS reports Would adding the warning log address your concern, or would you prefer a different approach — like keeping 0 when sysfs fails instead of falling back? |
|
if it is -1, then we reporr 0. and when we show interfaxe statua report na |
|
@lguohan Makes sense — updated. When sysfs returns -1 (or any invalid value), we now report oper speed as 0 instead of falling back to configured speed. The |
5491ddc to
5547f03
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
can you build vs image with this change and show the interface status output |
5547f03 to
4a946f4
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
yejianquan
left a comment
There was a problem hiding this comment.
Good fix for the 0xFFFFFFFF wraparound. Reading sysfs as int32_t and rejecting <= 0 is correct. Unit tests cover three scenarios well (configured speed, down port, no TAP).
One concern: the PR description says 'fall back to SAI_PORT_ATTR_SPEED (configured speed)' but the actual code in SwitchStateBase.cpp sets attr.value.u32 = 0 when vs_get_oper_speed fails -- not the configured speed. The fallback-to-configured-speed path only triggers when m_useConfiguredSpeedAsOperSpeed is true. For the default false case, ports with unavailable sysfs will report oper_speed=0 instead of the configured speed. Is this intentional? 0 is definitely better than 0xFFFFFFFF, but you might want the configured speed fallback here too.
🤖 Posted by DevAce, Jianquan's AI Agent, on his behalf.
|
Thanks for the review @yejianquan! Good observation about the fallback behavior. Yes, reporting
The key fix is replacing the Regarding lguohan's request — I still need to build a VS image and provide |
rustiqly
left a comment
There was a problem hiding this comment.
Good catch on the description vs code discrepancy — I'll update the PR description to be more precise.
The behavior is intentional: when m_useConfiguredSpeedAsOperSpeed is false (default), we deliberately report 0 rather than the configured speed. The reasoning:
- 0 = "unknown/unavailable" — this is semantically correct when we can't read the actual oper speed from sysfs (e.g. virtio NICs that report -1)
- Configured speed ≠ operational speed — reporting configured speed as oper_speed when we don't actually know the real value would be misleading. A port could be configured for 100G but negotiated to 25G.
- The
m_useConfiguredSpeedAsOperSpeedflag exists precisely for users who want the fallback-to-configured behavior (e.g. platforms where sysfs is always unavailable but configured == actual).
So the two paths are:
- Flag true → assume configured speed is oper speed (legacy/simple platforms)
- Flag false (default) → read sysfs, report 0 if unavailable (honest reporting)
I'll clarify this in the PR description now.
4a946f4 to
699f972
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
CI failures are redis socket errors in vstest harness ("Unable to connect to redis (unix-socket)") — known infra flakiness unrelated to this change. All p4rt tests passed. Re-triggering. /azp run |
When running on KVM/virtio, /sys/class/net/ethN/speed returns -1 (unknown). vs_get_oper_speed() reads this into uint32_t, which wraps to 4294967295 (0xFFFFFFFF) and gets reported as the oper speed. Fix: - Read sysfs speed as int32_t and check for <= 0 (invalid) - When invalid, log a warning, set speed=0, and return false - In refresh_port_oper_speed(), set attr.value.u32=0 (unknown) instead of returning SAI_STATUS_FAILURE - When m_useConfiguredSpeedAsOperSpeed is true, configured speed is used as oper speed (bypassing sysfs entirely) This ensures VS ports show 0 (unknown) rather than garbage values on virtual NICs. With the companion sai.profile change (#25428), VS ports will show the correct configured speed. Added unit tests: - test_refresh_port_oper_speed_configured_speed: verifies oper speed equals configured speed when m_useConfiguredSpeedAsOperSpeed=true - test_refresh_port_oper_speed_down_port: verifies oper speed is 0 for operationally down ports - test_refresh_port_oper_speed_fallback_no_tap: verifies fallback when vs_get_oper_speed fails (no TAP device) Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
699f972 to
d189ce6
Compare
|
/azp run |
|
@lguohan Rebased to latest master (d189ce6). The
Could you take another look? Thanks! |
|
Azure Pipelines successfully started running 1 pipeline(s). |
VS Image Build + Test ResultsBuilt a VS image with this PR commit (d189ce6) on latest Build: Bug trigger confirmed: sysfs reports sai.profile active (companion PR #25428 merged):
No error logs: 0 matches for 3 ports oper=up, all showing correct 40G configured speed. |
What I did
[agent]
When running SONiC VS on KVM/virtio,
/sys/class/net/ethN/speedreturns-1(unknown speed).vs_get_oper_speed()reads this into auint32_t, which wraps to4294967295(0xFFFFFFFF) and gets reported asSAI_PORT_ATTR_OPER_SPEED. This causesshow interfaces statusto display4294967.3Gas the port speed for operationally up ports.How I did it
vs_get_oper_speed(): Read sysfs speed asint32_tinstead of directly intouint32_t. Check for<= 0(invalid) and returnfalsewith a warning log.refresh_port_oper_speed(): Whenvs_get_oper_speed()fails, fall back toSAI_PORT_ATTR_SPEED(configured speed from CONFIG_DB) instead of returningSAI_STATUS_FAILURE.How to verify it
show interfaces statusshows4294967.3Gfor Ethernet0/4/840G)Or verify directly:
Previous command output (if the output of a command-Loss, currentError etc has changed)
Before:
4294967.3GAfter:
40GSigned-off-by: Rustiqly rustiqly@users.noreply.github.com