Skip to content

Conversation

@jianyuewu
Copy link
Contributor

@jianyuewu jianyuewu commented Dec 31, 2025

Dependency

For 202511 branch, there is still one dependency:

  1. Cherry-pick from master merge conflict will be automatically resolved after merging this PR: [Mellanox] Fix issue: sfp.get_temperature_info cannot detect SFP replacement #24688

HW-MGMT version 7.0050.2930 has been merged into the 202511 branch, so it is no longer an external dependency.

Why I did it

At 40°C ambient temperature with current FW+SW, some modules have >7.6% probability of reaching 75°C, which triggers false temperature warnings.
This PR implements vendor-specific temperature threshold support to eliminate false warnings while maintaining accurate temperature telemetry for monitoring purposes.

How I did it

Implemented new API for vendor-specific temperature offset adjustments:

  1. New API:

    • Add get_vendor_info() API with caching support.
  2. Smart Module Detection:

    • Cache vendor information (Manufacturer + Part Number) for each module.
    • Skip redundant vendor info updates when the same module is replugged.

How to verify it

  1. Plug in optical module -> Verify vendor info sent to HW-MGMT.
  2. Unplug and replug same module -> Verify no redundant vendor info update.
  3. Replace with different module -> Verify new vendor info sent.

Which release branch to backport (provide reason below if selected)

  • 202412
  • 202511

Tested branch (Please provide the tested image version)

202412

A picture of a cute animal (not mandatory but encouraged)

    /\_/\  
   ( o.o ) 
    > ^ <
   /|   |\
  (_|   |_)
   Cool Cat~

On first detection or module replacement, if the serial number (SN) has changed,
call vendor_data_set_module() with the manufacturer (MFG) and part number (PN)
to send the vendor info to hw-management.

Sample output like:
NOTICE pmon#thermalctld: Module 0 vendor info updated \
- manufacturer: NVIDIA part_number: MCP4Y10-N001

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
@jianyuewu jianyuewu requested a review from lguohan as a code owner December 31, 2025 06:50
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

++ cat /tmp/tmp.RaKi3fI9Wm
WARNING: Image format was not specified for './sonic-installer.img' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
kvm: -serial telnet:127.0.0.1:9000,server: info: QEMU waiting for connection on: disconnected:telnet:127.0.0.1:9000,server=on
[W][17:04:19.190765] pw.conf      | [          conf.c: 1182 try_load_conf()] can't load config client.conf: No such file or directory
[E][17:04:19.190780] pw.conf      | [          conf.c: 1215 pw_conf_load_conf_for_context()] can't load config client.conf: No such file or directory
+ on_exit
+ rm -f /tmp/tmp.RaKi3fI9Wm
[  FAIL LOG END  ] [ target/sonic-vs.img.gz ]
make: *** [slave.mk:1450: target/sonic-vs.img.gz] Error 1
make[1]: *** [Makefile.work:621: target/sonic-vs.img.gz] Error 2
make[1]: Leaving directory '/data/vss/_work/1/s'
make: *** [Makefile:51: target/sonic-vs.img.gz] Error 2

##[error]Bash exited with code '2'.
Finishing: Build sonic image

Seems failure is not related with this change, in vm client.conf: No such file or directory

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

@r12f @prgeor Could you help review this PR for Innolight feature? Thank you~ BTW, 202511 branch also depends on #24688.

@jianyuewu jianyuewu marked this pull request as ready for review January 12, 2026 10:47
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik liat-grozovik merged commit 20b3670 into sonic-net:master Jan 13, 2026
13 checks passed
SahilChaudhari pushed a commit to SahilChaudhari/sonic-buildimage-sonic-net that referenced this pull request Jan 21, 2026
- Why I did it
At 40°C ambient temperature with current FW+SW, some modules have >7.6% probability of reaching 75°C, which triggers false temperature warnings.
This PR implements vendor-specific temperature threshold support to eliminate false warnings while maintaining accurate temperature telemetry for monitoring purposes.

- How I did it
Implemented new API for vendor-specific temperature offset adjustments:

New API:
Add get_vendor_info() API with caching support.
Smart Module Detection:

Cache vendor information (Manufacturer + Part Number) for each module.
Skip redundant vendor info updates when the same module is replugged.

- How to verify it
Plug in optical module -> Verify vendor info sent to Nvidia API
Unplug and replug same module -> Verify no redundant vendor info update.
Replace with different module -> Verify new vendor info sent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants