Soft lockup detection in linux through dmesg logs parsing and sending telemetry#3573
Open
adityagarg0911 wants to merge 10 commits intoAzure:developfrom
Open
Soft lockup detection in linux through dmesg logs parsing and sending telemetry#3573adityagarg0911 wants to merge 10 commits intoAzure:developfrom
adityagarg0911 wants to merge 10 commits intoAzure:developfrom
Conversation
added 9 commits
March 3, 2026 11:45
…porting. Also add unit test cases for this feature.
…gargaditya/kernel_soft_lockup_detection
nagworld9
reviewed
Mar 17, 2026
|
|
||
| if not found_timestamp: | ||
| logger.periodic_warn( | ||
| logger.EVERY_HOUR, |
Contributor
There was a problem hiding this comment.
Since you run soft lockup detection every 6 hours, I don’t think this periodic logging timer is needed. We try to avoid the logging timer as well in the past we have seen issue with this logic. When you needed, probably think about custom logic to avoid frequent logging.
Author
There was a problem hiding this comment.
Originally, I have kept it this way in case we decrease the time to less than an hour in future.
But this makes sense, I'll change it to logger.warn as it is greater than hour. Thanks
| log_event=False | ||
| ) | ||
| except Exception as e: | ||
| logger.periodic_warn( |
| try: | ||
| return run_command(['dmesg'], track_process=False, timeout=self._DMESG_TIMEOUT) | ||
| except Exception as e: | ||
| logger.periodic_warn( |
| json.dump(state, f) | ||
| except Exception as e: | ||
| logger.periodic_warn( | ||
| logger.EVERY_HOUR, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add kernel soft lockup monitoring to the Azure Linux Agent. This new feature periodically parses dmesg output to detect CPU soft lockup events (BUG: soft lockup - CPU#N stuck for Xs!), aggregates them by CPU, and reports summarized telemetry to Azure. This helps detect and diagnose VM health issues caused by CPUs being stuck in kernel code
Changes:
Issue #
PR information
developbranch.Quality of Code and Contribution Guidelines
Distro maintenance information, if applicable