-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[Mellanox] feed module info to hw-management #24957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
liat-grozovik
merged 9 commits into
sonic-net:master
from
jianyuewu:master_innolight_thermal_algo
Jan 13, 2026
Merged
[Mellanox] feed module info to hw-management #24957
liat-grozovik
merged 9 commits into
sonic-net:master
from
jianyuewu:master_innolight_thermal_algo
Jan 13, 2026
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On first detection or module replacement, if the serial number (SN) has changed, call vendor_data_set_module() with the manufacturer (MFG) and part number (PN) to send the vendor info to hw-management. Sample output like: NOTICE pmon#thermalctld: Module 0 vendor info updated \ - manufacturer: NVIDIA part_number: MCP4Y10-N001 Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Junchao-Mellanox
approved these changes
Dec 31, 2025
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
++ cat /tmp/tmp.RaKi3fI9Wm
WARNING: Image format was not specified for './sonic-installer.img' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
kvm: -serial telnet:127.0.0.1:9000,server: info: QEMU waiting for connection on: disconnected:telnet:127.0.0.1:9000,server=on
[W][17:04:19.190765] pw.conf | [ conf.c: 1182 try_load_conf()] can't load config client.conf: No such file or directory
[E][17:04:19.190780] pw.conf | [ conf.c: 1215 pw_conf_load_conf_for_context()] can't load config client.conf: No such file or directory
+ on_exit
+ rm -f /tmp/tmp.RaKi3fI9Wm
[ FAIL LOG END ] [ target/sonic-vs.img.gz ]
make: *** [slave.mk:1450: target/sonic-vs.img.gz] Error 1
make[1]: *** [Makefile.work:621: target/sonic-vs.img.gz] Error 2
make[1]: Leaving directory '/data/vss/_work/1/s'
make: *** [Makefile:51: target/sonic-vs.img.gz] Error 2
##[error]Bash exited with code '2'.
Finishing: Build sonic imageSeems failure is not related with this change, in vm client.conf: No such file or directory |
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
liat-grozovik
approved these changes
Jan 13, 2026
SahilChaudhari
pushed a commit
to SahilChaudhari/sonic-buildimage-sonic-net
that referenced
this pull request
Jan 21, 2026
- Why I did it At 40°C ambient temperature with current FW+SW, some modules have >7.6% probability of reaching 75°C, which triggers false temperature warnings. This PR implements vendor-specific temperature threshold support to eliminate false warnings while maintaining accurate temperature telemetry for monitoring purposes. - How I did it Implemented new API for vendor-specific temperature offset adjustments: New API: Add get_vendor_info() API with caching support. Smart Module Detection: Cache vendor information (Manufacturer + Part Number) for each module. Skip redundant vendor info updates when the same module is replugged. - How to verify it Plug in optical module -> Verify vendor info sent to Nvidia API Unplug and replug same module -> Verify no redundant vendor info update. Replace with different module -> Verify new vendor info sent.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dependency
For 202511 branch, there is still one dependency:
HW-MGMT version 7.0050.2930 has been merged into the 202511 branch, so it is no longer an external dependency.
Why I did it
At 40°C ambient temperature with current FW+SW, some modules have >7.6% probability of reaching 75°C, which triggers false temperature warnings.
This PR implements vendor-specific temperature threshold support to eliminate false warnings while maintaining accurate temperature telemetry for monitoring purposes.
How I did it
Implemented new API for vendor-specific temperature offset adjustments:
New API:
Smart Module Detection:
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
202412
A picture of a cute animal (not mandatory but encouraged)