Skip to content

Commit ff84b93

Browse files
Merge branch 'development' into alex_pcie_update
2 parents 1fef495 + a3b9a20 commit ff84b93

File tree

4 files changed

+134
-33
lines changed

4 files changed

+134
-33
lines changed

.github/workflows/update-plugin-docs.yml

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
1-
# Workflow to run plugin documentation generation then commit the updated changes
1+
# Workflow to run plugin documentation generation then create a PR with the updated changes
22

33
name: Plugin Documentation Generator
44

55
permissions:
66
contents: write
7+
pull-requests: write
78

89
on:
910
workflow_dispatch:
@@ -15,8 +16,15 @@ jobs:
1516
runs-on: [ self-hosted ]
1617
# To disable this workflow, set DISABLE_AUTO_DOCS to 'true' in repository variables
1718
if: vars.DISABLE_AUTO_DOCS != 'true'
19+
env:
20+
HOME: /tmp/github-actions-home
1821

1922
steps:
23+
- name: Setup HOME directory
24+
run: |
25+
mkdir -p /tmp/github-actions-home
26+
export HOME=/tmp/github-actions-home
27+
2028
- name: Checkout repository
2129
uses: actions/checkout@v4
2230
with:
@@ -37,10 +45,18 @@ jobs:
3745
source venv/bin/activate
3846
pre-commit run --files docs/PLUGIN_DOC.md || true
3947
40-
- name: Commit and push changes
41-
run: |
42-
git config user.name "github-actions[bot]"
43-
git config user.email "github-actions[bot]@users.noreply.github.com"
44-
git add docs/PLUGIN_DOC.md
45-
git diff --staged --quiet || git commit --no-verify -m "docs: Update plugin documentation [automated]"
46-
git push
48+
- name: Create Pull Request
49+
uses: peter-evans/create-pull-request@v6
50+
with:
51+
token: ${{ secrets.GITHUB_TOKEN }}
52+
commit-message: "docs: Update plugin documentation [automated]"
53+
committer: "github-actions[bot] <github-actions[bot]@users.noreply.github.com>"
54+
author: "github-actions[bot] <github-actions[bot]@users.noreply.github.com>"
55+
branch: automated-plugin-docs-update
56+
delete-branch: true
57+
title: "docs: Update plugin documentation [automated]"
58+
body: |
59+
Automated plugin documentation update generated by workflow.
60+
61+
This PR was automatically created by the Plugin Documentation Generator workflow.
62+
labels: documentation,automated

docs/PLUGIN_DOC.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
| Plugin | Collection | Analysis | DataModel | Collector | Analyzer |
66
| --- | --- | --- | --- | --- | --- |
7-
| AmdSmiPlugin | amd-smi firmware --json<br>amd-smi list --json<br>amd-smi partition --json<br>amd-smi process --json<br>amd-smi static -g all --json<br>amd-smi version --json | **Analyzer Args:**<br>- `check_static_data`: bool<br>- `expected_gpu_processes`: Optional[int]<br>- `expected_max_power`: Optional[int]<br>- `expected_driver_version`: Optional[str]<br>- `expected_memory_partition_mode`: Optional[str]<br>- `expected_compute_partition_mode`: Optional[str]<br>- `expected_pldm_version`: Optional[str]<br>- `l0_to_recovery_count_error_threshold`: Optional[int]<br>- `l0_to_recovery_count_warning_threshold`: Optional[int]<br>- `vendorid_ep`: Optional[str]<br>- `vendorid_ep_vf`: Optional[str]<br>- `devid_ep`: Optional[str]<br>- `devid_ep_vf`: Optional[str]<br>- `sku_name`: Optional[str] | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
7+
| AmdSmiPlugin | firmware --json<br>list --json<br>partition --json<br>process --json<br>ras --cper --folder={folder}<br>static -g all --json<br>static -g {gpu_id} --json<br>version --json | **Analyzer Args:**<br>- `check_static_data`: bool<br>- `expected_gpu_processes`: Optional[int]<br>- `expected_max_power`: Optional[int]<br>- `expected_driver_version`: Optional[str]<br>- `expected_memory_partition_mode`: Optional[str]<br>- `expected_compute_partition_mode`: Optional[str]<br>- `expected_pldm_version`: Optional[str]<br>- `l0_to_recovery_count_error_threshold`: Optional[int]<br>- `l0_to_recovery_count_warning_threshold`: Optional[int]<br>- `vendorid_ep`: Optional[str]<br>- `vendorid_ep_vf`: Optional[str]<br>- `devid_ep`: Optional[str]<br>- `devid_ep_vf`: Optional[str]<br>- `sku_name`: Optional[str]<br>- `expected_xgmi_speed`: Optional[list[float]]<br>- `analysis_range_start`: Optional[datetime.datetime]<br>- `analysis_range_end`: Optional[datetime.datetime] | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
88
| BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'<br>wmic bios get SMBIOSBIOSVersion /Value | **Analyzer Args:**<br>- `exp_bios_version`: list[str]<br>- `regex_match`: bool | [BiosDataModel](#BiosDataModel-Model) | [BiosCollector](#Collector-Class-BiosCollector) | [BiosAnalyzer](#Data-Analyzer-Class-BiosAnalyzer) |
99
| CmdlinePlugin | cat /proc/cmdline | **Analyzer Args:**<br>- `required_cmdline`: Union[str, list]<br>- `banned_cmdline`: Union[str, list] | [CmdlineDataModel](#CmdlineDataModel-Model) | [CmdlineCollector](#Collector-Class-CmdlineCollector) | [CmdlineAnalyzer](#Data-Analyzer-Class-CmdlineAnalyzer) |
1010
| DeviceEnumerationPlugin | lscpu \| grep Socket \| awk '{ print $2 }'<br>powershell -Command "(Get-WmiObject -Class Win32_Processor \| Measure-Object).Count"<br>lspci -d {vendorid_ep}: \| grep -i 'VGA\\|Display\\|3D' \| wc -l<br>powershell -Command "(wmic path win32_VideoController get name \| findstr AMD \| Measure-Object).Count"<br>lspci -d {vendorid_ep}: \| grep -i 'Virtual Function' \| wc -l<br>powershell -Command "(Get-VMHostPartitionableGpu \| Measure-Object).Count" | **Analyzer Args:**<br>- `cpu_count`: Optional[list[int]]<br>- `gpu_count`: Optional[list[int]]<br>- `vf_count`: Optional[list[int]] | [DeviceEnumerationDataModel](#DeviceEnumerationDataModel-Model) | [DeviceEnumerationCollector](#Collector-Class-DeviceEnumerationCollector) | [DeviceEnumerationAnalyzer](#Data-Analyzer-Class-DeviceEnumerationAnalyzer) |
@@ -14,7 +14,7 @@
1414
| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [JournalData](#JournalData-Model) | [JournalCollector](#Collector-Class-JournalCollector) | - |
1515
| KernelPlugin | sh -c 'uname -a'<br>wmic os get Version /Value | **Analyzer Args:**<br>- `exp_kernel`: Union[str, list]<br>- `regex_match`: bool | [KernelDataModel](#KernelDataModel-Model) | [KernelCollector](#Collector-Class-KernelCollector) | [KernelAnalyzer](#Data-Analyzer-Class-KernelAnalyzer) |
1616
| KernelModulePlugin | cat /proc/modules<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict]<br>- `regex_filter`: list[str] | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) |
17-
| MemoryPlugin | free -b<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
17+
| MemoryPlugin | free -b<br>/usr/bin/lsmem<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
1818
| NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name} | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - |
1919
| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/*release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br>cat /etc/*release \| grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list]<br>- `exact_match`: bool | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) |
2020
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]]<br>- `regex_match`: bool | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
@@ -42,25 +42,29 @@ Class for collection of inband tool amd-smi data.
4242

4343
- **AMD_SMI_EXE**: `amd-smi`
4444
- **SUPPORTED_OS_FAMILY**: `{<OSFamily.LINUX: 3>}`
45-
- **CMD_VERSION**: `amd-smi version --json`
46-
- **CMD_LIST**: `amd-smi list --json`
47-
- **CMD_PROCESS**: `amd-smi process --json`
48-
- **CMD_PARTITION**: `amd-smi partition --json`
49-
- **CMD_FIRMWARE**: `amd-smi firmware --json`
50-
- **CMD_STATIC**: `amd-smi static -g all --json`
45+
- **CMD_VERSION**: `version --json`
46+
- **CMD_LIST**: `list --json`
47+
- **CMD_PROCESS**: `process --json`
48+
- **CMD_PARTITION**: `partition --json`
49+
- **CMD_FIRMWARE**: `firmware --json`
50+
- **CMD_STATIC**: `static -g all --json`
51+
- **CMD_STATIC_GPU**: `static -g {gpu_id} --json`
52+
- **CMD_RAS**: `ras --cper --folder={folder}`
5153

5254
### Provides Data
5355

5456
AmdSmiDataModel
5557

5658
### Commands
5759

58-
- amd-smi firmware --json
59-
- amd-smi list --json
60-
- amd-smi partition --json
61-
- amd-smi process --json
62-
- amd-smi static -g all --json
63-
- amd-smi version --json
60+
- firmware --json
61+
- list --json
62+
- partition --json
63+
- process --json
64+
- ras --cper --folder={folder}
65+
- static -g all --json
66+
- static -g {gpu_id} --json
67+
- version --json
6468

6569
## Collector Class BiosCollector
6670

@@ -300,6 +304,7 @@ Collect memory usage details
300304

301305
- **CMD_WINDOWS**: `wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value`
302306
- **CMD**: `free -b`
307+
- **CMD_LSMEM**: `/usr/bin/lsmem`
303308

304309
### Provides Data
305310

@@ -308,6 +313,7 @@ MemoryDataModel
308313
### Commands
309314

310315
- free -b
316+
- /usr/bin/lsmem
311317
- wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value
312318

313319
## Collector Class NvmeCollector
@@ -646,10 +652,15 @@ Data model for amd-smi data.
646652
- **gpu_list**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiListItem]]`
647653
- **partition**: `Optional[nodescraper.plugins.inband.amdsmi.amdsmidata.Partition]`
648654
- **process**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Processes]]`
655+
- **topology**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Topo]]`
649656
- **firmware**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Fw]]`
650657
- **bad_pages**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.BadPages]]`
651658
- **static**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiStatic]]`
652659
- **metric**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiMetric]]`
660+
- **xgmi_metric**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.XgmiMetrics]]`
661+
- **xgmi_link**: `Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.XgmiLinks]]`
662+
- **cper_data**: `Optional[list[nodescraper.models.datamodel.FileModel]]`
663+
- **amdsmitst_data**: `nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiTstData`
653664

654665
## BiosDataModel Model
655666

@@ -763,6 +774,7 @@ Data model for journal logs
763774

764775
- **mem_free**: `str`
765776
- **mem_total**: `str`
777+
- **lsmem_output**: `Optional[dict]`
766778

767779
## NvmeDataModel Model
768780

@@ -915,7 +927,11 @@ Data model for in band syslog logs
915927

916928
## Data Analyzer Class AmdSmiAnalyzer
917929

918-
**Bases**: ['DataAnalyzer']
930+
### Description
931+
932+
Check AMD SMI Application data for PCIe, ECC errors, CPER data, and analyze amdsmitst metrics
933+
934+
**Bases**: ['CperAnalysisTaskMixin', 'DataAnalyzer']
919935

920936
**Link to code**: [amdsmi_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/amdsmi/amdsmi_analyzer.py)
921937

@@ -1213,6 +1229,9 @@ Check sysctl matches expected sysctl details
12131229
- **devid_ep**: `Optional[str]`
12141230
- **devid_ep_vf**: `Optional[str]`
12151231
- **sku_name**: `Optional[str]`
1232+
- **expected_xgmi_speed**: `Optional[list[float]]`
1233+
- **analysis_range_start**: `Optional[datetime.datetime]`
1234+
- **analysis_range_end**: `Optional[datetime.datetime]`
12161235

12171236
## Analyzer Args Class BiosAnalyzerArgs
12181237

nodescraper/plugins/inband/package/package_analyzer.py

Lines changed: 58 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def regex_version_data(
4444
package_data: dict[str, str],
4545
key_search: re.Pattern[str],
4646
value_search: Optional[Pattern[str]],
47-
) -> bool:
47+
) -> tuple[bool, list[tuple[str, str, str]]]:
4848
"""Searches the package values for the key and value search patterns
4949
5050
Args:
@@ -53,10 +53,12 @@ def regex_version_data(
5353
value_search (Optional[Pattern[str]]): a compiled regex pattern to search for the package version, if None then any version is accepted
5454
5555
Returns:
56-
bool: A boolean indicating if the value was found
56+
tuple: (value_found, version_mismatches) where value_found is a bool and
57+
version_mismatches is a list of (package_name, expected_pattern, found_version) tuples
5758
"""
5859

5960
value_found = False
61+
version_mismatches = []
6062
for name, version in package_data.items():
6163
self.logger.debug("Package data: %s, %s", name, version)
6264
key_search_res = key_search.search(name)
@@ -66,6 +68,7 @@ def regex_version_data(
6668
continue
6769
value_search_res = value_search.search(version)
6870
if not value_search_res:
71+
version_mismatches.append((name, value_search.pattern, version))
6972
self._log_event(
7073
EventCategory.APPLICATION,
7174
f"Package {key_search.pattern} Version Mismatch, Expected {value_search.pattern} but found {version}",
@@ -77,7 +80,7 @@ def regex_version_data(
7780
"found_version": version,
7881
},
7982
)
80-
return value_found
83+
return value_found, version_mismatches
8184

8285
def package_regex_search(
8386
self, package_data: dict[str, str], exp_package_data: dict[str, Optional[str]]
@@ -87,16 +90,23 @@ def package_regex_search(
8790
Args:
8891
package_data (dict[str, str]): a dictionary of package names and versions
8992
exp_package_data (dict[str, Optional[str]]): a dictionary of expected package names and versions
93+
94+
Returns:
95+
tuple: (not_found_keys, regex_errors, version_mismatches) containing lists of errors
9096
"""
9197
not_found_keys = []
98+
regex_errors = []
99+
version_mismatches = []
100+
92101
for exp_key, exp_value in exp_package_data.items():
93102
try:
94103
if exp_value is not None:
95104
value_search = re.compile(exp_value)
96105
else:
97106
value_search = None
98107
key_search = re.compile(exp_key)
99-
except re.error:
108+
except re.error as e:
109+
regex_errors.append((exp_key, exp_value, str(e)))
100110
self._log_event(
101111
EventCategory.RUNTIME,
102112
f"Regex Compile Error either {exp_key} {exp_value}",
@@ -108,10 +118,13 @@ def package_regex_search(
108118
)
109119
continue
110120

111-
key_found = self.regex_version_data(package_data, key_search, value_search)
121+
key_found, mismatches = self.regex_version_data(package_data, key_search, value_search)
122+
123+
# Collect version mismatches
124+
version_mismatches.extend(mismatches)
112125

113126
if not key_found:
114-
not_found_keys.append(exp_key)
127+
not_found_keys.append((exp_key, exp_value))
115128
self._log_event(
116129
EventCategory.APPLICATION,
117130
f"Package {exp_key} not found in the package list",
@@ -123,7 +136,8 @@ def package_regex_search(
123136
"found_version": None,
124137
},
125138
)
126-
return not_found_keys
139+
140+
return not_found_keys, regex_errors, version_mismatches
127141

128142
def package_exact_match(
129143
self, package_data: dict[str, str], exp_package_data: dict[str, Optional[str]]
@@ -190,9 +204,43 @@ def analyze_data(
190204
return self.result
191205

192206
if args.regex_match:
193-
not_found_keys = self.package_regex_search(data.version_info, args.exp_package_ver)
194-
self.result.message = f"Packages not found: {not_found_keys}"
195-
self.result.status = ExecutionStatus.ERROR
207+
not_found_keys, regex_errors, version_mismatches = self.package_regex_search(
208+
data.version_info, args.exp_package_ver
209+
)
210+
211+
# Adding details for err message
212+
error_parts = []
213+
if not_found_keys:
214+
packages_detail = ", ".join(
215+
[
216+
f"'{pkg}' (expected version: {ver if ver else 'any'})"
217+
for pkg, ver in not_found_keys
218+
]
219+
)
220+
error_parts.append(f"Packages not found: {packages_detail}")
221+
222+
if regex_errors:
223+
regex_detail = ", ".join(
224+
[f"'{pkg}' pattern (version: {ver})" for pkg, ver, _ in regex_errors]
225+
)
226+
error_parts.append(f"Regex compile errors: {regex_detail}")
227+
228+
if version_mismatches:
229+
version_detail = ", ".join(
230+
[
231+
f"'{pkg}' (expected: {exp}, found: {found})"
232+
for pkg, exp, found in version_mismatches
233+
]
234+
)
235+
error_parts.append(f"Version mismatches: {version_detail}")
236+
237+
total_errors = len(not_found_keys) + len(regex_errors) + len(version_mismatches)
238+
if total_errors > 0:
239+
self.result.message = f"{'; '.join(error_parts)}"
240+
self.result.status = ExecutionStatus.ERROR
241+
else:
242+
self.result.message = "All packages found and versions matched"
243+
self.result.status = ExecutionStatus.OK
196244
else:
197245
self.logger.info("Expected packages: %s", list(args.exp_package_ver.keys()))
198246
not_found_match, not_found_version = self.package_exact_match(

test/unit/plugin/test_package_analyzer.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,4 +89,22 @@ def test_data_version_regex(package_analyzer, default_data_lib):
8989
regex_match=True,
9090
)
9191
res = package_analyzer.analyze_data(default_data_lib, args=args)
92+
assert res.status == ExecutionStatus.OK
93+
assert res.message == "All packages found and versions matched"
94+
95+
96+
def test_data_multiple_errors_regex(package_analyzer, default_data_lib):
97+
"""Test that detailed error messages are shown for multiple package errors"""
98+
args = PackageAnalyzerArgs(
99+
exp_package_ver={
100+
"missing-package": None,
101+
"test-ubuntu-package\\.x86_64": "2\\.\\d+",
102+
"another-missing": "1\\.0",
103+
},
104+
regex_match=True,
105+
)
106+
res = package_analyzer.analyze_data(default_data_lib, args=args)
92107
assert res.status == ExecutionStatus.ERROR
108+
assert "missing-package" in res.message
109+
assert "another-missing" in res.message
110+
assert len(res.events) == 3

0 commit comments

Comments
 (0)