BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on Tomahawk-1 (BCM56960) legacy platforms#1789
BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on Tomahawk-1 (BCM56960) legacy platforms#1789lipxu wants to merge 5 commits intosonic-net:masterfrom
Conversation
…_st_capability at runtime Signed-off-by: Liping Xu <xuliping@microsoft.com>
Signed-off-by: Liping Xu <xuliping@microsoft.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Review: Two items to address
|
|
Thanks for the review @Gfrom2016! Item 1 — Merge conflict with #1788: This is intentional — #1789 is stacked on top of #1788's branch to avoid a cherry-pick conflict on \VendorSai.cpp\ (both PRs modify the same function). The plan is:
Will rebase after #1788 lands. Item 2 — \SAI_STATS_EXT_SWITCH_SUPPORTED=0\ in buildimage: This key is present — it is in a separate companion PR sonic-net/sonic-buildimage#26014, which is the buildimage counterpart specifically for this PR (#1789). sonic-net/sonic-buildimage#26013 is the companion for PR #1788 (the \SAI_STATS_ST_CAPABILITY_SUPPORTED\ fix) and intentionally only contains that key. Summary of PR pairing:
I will update the PR description to make the buildimage companion link explicit. |
Signed-off-by: Liping Xu <xuliping@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…le keys Add unit tests covering: - isSwitchStatsExtSupported() returns true by default (no sai.profile key) - isSwitchStatsExtSupported() returns false when SAI_STATS_EXT_SWITCH_SUPPORTED=0 - apiInitialize() handles SAI_STATS_ST_CAPABILITY_SUPPORTED=0 without error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Gfrom2016
left a comment
There was a problem hiding this comment.
Good design -- virtual method in SaiInterface.h keeps the change isolated to VendorSai. FlexCounter change is surgical (only COUNTER_TYPE_SWITCH context affected). 3 unit tests covering default, disabled, and st_capability scenarios.
Confirmed merge order: #1788 first, then rebase #1789.
LGTM -- approving.
Problem
On Arista 7060cx (BCM56960_B1 / Tomahawk-1,
broadcom-legacyplatform), syncd crashes during FlexCounter polling with a SIGSEGV when collecting switch counters.Root cause: PR #1775 added
context->use_sai_stats_ext = truefor theCOUNTER_TYPE_SWITCHFlexCounter context, forcingsai_get_stats_extto be used instead ofsai_get_statsfor switch objects. While this is required for TH5, on TH1 (broadcom-legacy)sai_get_stats_extfor switch objects hits uninitialized internal state in the legacy SAI binary -> SIGSEGV.Crash confirmed at the same address
0x8ab6120in libsai.so via GDB analysis.Fix
Add a runtime guard controlled by
sai.profilekeySAI_STATS_EXT_SWITCH_SUPPORTED. If set to0,FlexCounter::createCounterContext()usessai_get_statsinstead ofsai_get_stats_extfor switch objects.Implementation:
meta/SaiInterface.h: Addvirtual bool isSwitchStatsExtSupported() const { return true; }(default enabled for all platforms)syncd/VendorSai.h: Declare override +bool m_switchStatsExtSupportedprivate membersyncd/VendorSai.cpp: InapiInitialize(), readSAI_STATS_EXT_SWITCH_SUPPORTEDfromsai.profile; implement accessorsyncd/FlexCounter.cpp: Usem_vendorSai->isSwitchStatsExtSupported()forCOUNTER_TYPE_SWITCHcontextXGS platforms (TH2/TH3/TH4/TH5) are unaffected - they do not set this key so
sai_get_stats_extremains enabled.All changes are tagged
BROADCOM_LEGACY_SAI_COMPATfor future searchability.Changes
meta/SaiInterface.h: +7 lines - virtualisSwitchStatsExtSupported()with defaulttruesyncd/VendorSai.h: +4 lines - override declaration + private membersyncd/VendorSai.cpp: +24 lines - sai.profile key read inapiInitialize()+ method implementationsyncd/FlexCounter.cpp: -1/+3 lines - conditionaluse_sai_stats_extforCOUNTER_TYPE_SWITCHTesting
SAI_STATS_EXT_SWITCH_SUPPORTED=0insai.profile-> syncd starts and FlexCounter runs without crashsai_get_stats_extstill used for switch countersRelated
sai_query_stats_st_capabilityfix (BROADCOM_LEGACY_SAI_COMPAT Issue 1); merge BROADCOM_LEGACY_SAI_COMPAT: Fix sai_query_stats_st_capability crash on Tomahawk-1 (BCM56960) legacy platforms #1788 first, then rebase this PRsai.profilekey change: BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on TH1 legacy image sonic-buildimage#26014 (SAI_STATS_EXT_SWITCH_SUPPORTED=0)