Skip to content

BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on Tomahawk-1 (BCM56960) legacy platforms#1789

Open
lipxu wants to merge 5 commits intosonic-net:masterfrom
lipxu:fix/brcm-legacy-compat-master-issue2
Open

BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on Tomahawk-1 (BCM56960) legacy platforms#1789
lipxu wants to merge 5 commits intosonic-net:masterfrom
lipxu:fix/brcm-legacy-compat-master-issue2

Conversation

@lipxu
Copy link

@lipxu lipxu commented Mar 11, 2026

Problem

On Arista 7060cx (BCM56960_B1 / Tomahawk-1, broadcom-legacy platform), syncd crashes during FlexCounter polling with a SIGSEGV when collecting switch counters.

Root cause: PR #1775 added context->use_sai_stats_ext = true for the COUNTER_TYPE_SWITCH FlexCounter context, forcing sai_get_stats_ext to be used instead of sai_get_stats for switch objects. While this is required for TH5, on TH1 (broadcom-legacy) sai_get_stats_ext for switch objects hits uninitialized internal state in the legacy SAI binary -> SIGSEGV.

Crash confirmed at the same address 0x8ab6120 in libsai.so via GDB analysis.

Fix

Add a runtime guard controlled by sai.profile key SAI_STATS_EXT_SWITCH_SUPPORTED. If set to 0, FlexCounter::createCounterContext() uses sai_get_stats instead of sai_get_stats_ext for switch objects.

Implementation:

  • meta/SaiInterface.h: Add virtual bool isSwitchStatsExtSupported() const { return true; } (default enabled for all platforms)
  • syncd/VendorSai.h: Declare override + bool m_switchStatsExtSupported private member
  • syncd/VendorSai.cpp: In apiInitialize(), read SAI_STATS_EXT_SWITCH_SUPPORTED from sai.profile; implement accessor
  • syncd/FlexCounter.cpp: Use m_vendorSai->isSwitchStatsExtSupported() for COUNTER_TYPE_SWITCH context

XGS platforms (TH2/TH3/TH4/TH5) are unaffected - they do not set this key so sai_get_stats_ext remains enabled.

All changes are tagged BROADCOM_LEGACY_SAI_COMPAT for future searchability.

Changes

  • meta/SaiInterface.h: +7 lines - virtual isSwitchStatsExtSupported() with default true
  • syncd/VendorSai.h: +4 lines - override declaration + private member
  • syncd/VendorSai.cpp: +24 lines - sai.profile key read in apiInitialize() + method implementation
  • syncd/FlexCounter.cpp: -1/+3 lines - conditional use_sai_stats_ext for COUNTER_TYPE_SWITCH

Testing

  • Arista 7060cx (BCM56960_B1, broadcom-legacy): SAI_STATS_EXT_SWITCH_SUPPORTED=0 in sai.profile -> syncd starts and FlexCounter runs without crash
  • TH5 (XGS): No sai.profile key set -> sai_get_stats_ext still used for switch counters

Related

lipxu added 2 commits March 11, 2026 01:55
…_st_capability at runtime

Signed-off-by: Liping Xu <xuliping@microsoft.com>
Signed-off-by: Liping Xu <xuliping@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Gfrom2016
Copy link
Contributor

Review: Two items to address

  1. Merge conflict with PR BROADCOM_LEGACY_SAI_COMPAT: Fix sai_query_stats_st_capability crash on Tomahawk-1 (BCM56960) legacy platforms #1788: This PR's VendorSai.cpp diff includes the identical HAVE_SAI_QUERY_STATS_ST_CAPABILITY guard block from BROADCOM_LEGACY_SAI_COMPAT: Fix sai_query_stats_st_capability crash on Tomahawk-1 (BCM56960) legacy platforms #1788. If both merge independently, there will be a git conflict. Suggest either combining them into one PR, or merging BROADCOM_LEGACY_SAI_COMPAT: Fix sai_query_stats_st_capability crash on Tomahawk-1 (BCM56960) legacy platforms #1788 first and rebasing this one on top.

  2. Companion buildimage PR BROADCOM_LEGACY_SAI_COMPAT: Allow platforms to disable sai_query_stats_st_capability at runtime sonic-buildimage#26013 is missing SAI_STATS_EXT_SWITCH_SUPPORTED=0: The sai.profile changes in #26013 only add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 (for the BROADCOM_LEGACY_SAI_COMPAT: Fix sai_query_stats_st_capability crash on Tomahawk-1 (BCM56960) legacy platforms #1788 fix), but do not add SAI_STATS_EXT_SWITCH_SUPPORTED=0 (needed by this PR). Without it, the sai_get_stats_ext crash on TH1 switch objects would still happen after all three PRs merge. Please update #26013 to include this key as well.

@lipxu
Copy link
Author

lipxu commented Mar 11, 2026

Thanks for the review @Gfrom2016!

Item 1 — Merge conflict with #1788:

This is intentional — #1789 is stacked on top of #1788's branch to avoid a cherry-pick conflict on \VendorSai.cpp\ (both PRs modify the same function). The plan is:

  1. Merge BROADCOM_LEGACY_SAI_COMPAT: Fix sai_query_stats_st_capability crash on Tomahawk-1 (BCM56960) legacy platforms #1788 into master first
  2. Rebase this branch (\ ix/brcm-legacy-compat-master-issue2) onto the updated master and force-push
  3. The \HAVE_SAI_QUERY_STATS_ST_CAPABILITY\ block will then appear as pre-existing context in the diff, not as added lines — no conflict

Will rebase after #1788 lands.

Item 2 — \SAI_STATS_EXT_SWITCH_SUPPORTED=0\ in buildimage:

This key is present — it is in a separate companion PR sonic-net/sonic-buildimage#26014, which is the buildimage counterpart specifically for this PR (#1789). sonic-net/sonic-buildimage#26013 is the companion for PR #1788 (the \SAI_STATS_ST_CAPABILITY_SUPPORTED\ fix) and intentionally only contains that key.

Summary of PR pairing:

sairedis PR buildimage PR sai.profile key
#1788 sonic-buildimage#26013 \SAI_STATS_ST_CAPABILITY_SUPPORTED=0\
#1789 (this PR) sonic-buildimage#26014 \SAI_STATS_EXT_SWITCH_SUPPORTED=0\

I will update the PR description to make the buildimage companion link explicit.

Signed-off-by: Liping Xu <xuliping@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…le keys

Add unit tests covering:
- isSwitchStatsExtSupported() returns true by default (no sai.profile key)
- isSwitchStatsExtSupported() returns false when SAI_STATS_EXT_SWITCH_SUPPORTED=0
- apiInitialize() handles SAI_STATS_ST_CAPABILITY_SUPPORTED=0 without error

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Liping Xu <xuliping@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Liping Xu <xuliping@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@Gfrom2016 Gfrom2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good design -- virtual method in SaiInterface.h keeps the change isolated to VendorSai. FlexCounter change is surgical (only COUNTER_TYPE_SWITCH context affected). 3 unit tests covering default, disabled, and st_capability scenarios.

Confirmed merge order: #1788 first, then rebase #1789.

LGTM -- approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants