44
55| Plugin | Collection | Analysis | DataModel | Collector | Analyzer |
66| --- | --- | --- | --- | --- | --- |
7- | AmdSmiPlugin | amd-smi firmware --json<br >amd-smi list --json<br >amd-smi partition --json<br >amd-smi process --json<br >amd-smi static -g all --json<br >amd-smi version --json | ** Analyzer Args:** <br >- ` check_static_data ` : bool<br >- ` expected_gpu_processes ` : Optional[ int] <br >- ` expected_max_power ` : Optional[ int] <br >- ` expected_driver_version ` : Optional[ str] <br >- ` expected_memory_partition_mode ` : Optional[ str] <br >- ` expected_compute_partition_mode ` : Optional[ str] <br >- ` expected_pldm_version ` : Optional[ str] <br >- ` l0_to_recovery_count_error_threshold ` : Optional[ int] <br >- ` l0_to_recovery_count_warning_threshold ` : Optional[ int] <br >- ` vendorid_ep ` : Optional[ str] <br >- ` vendorid_ep_vf ` : Optional[ str] <br >- ` devid_ep ` : Optional[ str] <br >- ` devid_ep_vf ` : Optional[ str] <br >- ` sku_name ` : Optional[ str] | [ AmdSmiDataModel] ( #AmdSmiDataModel-Model ) | [ AmdSmiCollector] ( #Collector-Class-AmdSmiCollector ) | [ AmdSmiAnalyzer] ( #Data-Analyzer-Class-AmdSmiAnalyzer ) |
7+ | AmdSmiPlugin | firmware --json<br>list --json<br>partition --json<br>process --json<br>ras --cper --folder={folder}<br>static -g all --json<br>static -g {gpu_id} --json<br>version --json | **Analyzer Args:**<br>- `check_static_data`: bool<br>- `expected_gpu_processes`: Optional[int]<br>- `expected_max_power`: Optional[int]<br>- `expected_driver_version`: Optional[str]<br>- `expected_memory_partition_mode`: Optional[str]<br>- `expected_compute_partition_mode`: Optional[str]<br>- `expected_pldm_version`: Optional[str]<br>- `l0_to_recovery_count_error_threshold`: Optional[int]<br>- `l0_to_recovery_count_warning_threshold`: Optional[int]<br>- `vendorid_ep`: Optional[str]<br>- `vendorid_ep_vf`: Optional[str]<br>- `devid_ep`: Optional[str]<br>- `devid_ep_vf`: Optional[str]<br>- `sku_name`: Optional[str]<br>- `expected_xgmi_speed`: Optional[list[float]]<br>- `analysis_range_start`: Optional[datetime.datetime]<br>- `analysis_range_end`: Optional[datetime.datetime] | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
88| BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'<br >wmic bios get SMBIOSBIOSVersion /Value | ** Analyzer Args:** <br >- ` exp_bios_version ` : list[ str] <br >- ` regex_match ` : bool | [ BiosDataModel] ( #BiosDataModel-Model ) | [ BiosCollector] ( #Collector-Class-BiosCollector ) | [ BiosAnalyzer] ( #Data-Analyzer-Class-BiosAnalyzer ) |
99| CmdlinePlugin | cat /proc/cmdline | ** Analyzer Args:** <br >- ` required_cmdline ` : Union[ str, list] <br >- ` banned_cmdline ` : Union[ str, list] | [ CmdlineDataModel] ( #CmdlineDataModel-Model ) | [ CmdlineCollector] ( #Collector-Class-CmdlineCollector ) | [ CmdlineAnalyzer] ( #Data-Analyzer-Class-CmdlineAnalyzer ) |
1010| DeviceEnumerationPlugin | lscpu \| grep Socket \| awk '{ print $2 }'<br >powershell -Command "(Get-WmiObject -Class Win32_Processor \| Measure-Object).Count"<br >lspci -d {vendorid_ep}: \| grep -i 'VGA\\ | Display\\ | 3D' \| wc -l<br >powershell -Command "(wmic path win32_VideoController get name \| findstr AMD \| Measure-Object).Count"<br >lspci -d {vendorid_ep}: \| grep -i 'Virtual Function' \| wc -l<br >powershell -Command "(Get-VMHostPartitionableGpu \| Measure-Object).Count" | ** Analyzer Args:** <br >- ` cpu_count ` : Optional[ list[ int]] <br >- ` gpu_count ` : Optional[ list[ int]] <br >- ` vf_count ` : Optional[ list[ int]] | [ DeviceEnumerationDataModel] ( #DeviceEnumerationDataModel-Model ) | [ DeviceEnumerationCollector] ( #Collector-Class-DeviceEnumerationCollector ) | [ DeviceEnumerationAnalyzer] ( #Data-Analyzer-Class-DeviceEnumerationAnalyzer ) |
1414| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | - |
1515| KernelPlugin | sh -c 'uname -a'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` regex_match ` : bool | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
1616| KernelModulePlugin | cat /proc/modules<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
17- | MemoryPlugin | free -b<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
17+ | MemoryPlugin | free -b<br >/usr/bin/lsmem< br > wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
1818| NvmePlugin | nvme smart-log {dev}<br >nvme error-log {dev} --log-entries=256<br >nvme id-ctrl {dev}<br >nvme id-ns {dev}{ns}<br >nvme fw-log {dev}<br >nvme self-test-log {dev}<br >nvme get-log {dev} --log-id=6 --log-len=512<br >nvme telemetry-log {dev} --output-file={dev}_ {f_name} | - | [ NvmeDataModel] ( #NvmeDataModel-Model ) | [ NvmeCollector] ( #Collector-Class-NvmeCollector ) | - |
1919| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/* release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br >cat /etc/* release \| grep VERSION_ID<br >wmic os get Version /value<br >wmic os get Caption /Value | ** Analyzer Args:** <br >- ` exp_os ` : Union[ str, list] <br >- ` exact_match ` : bool | [ OsDataModel] ( #OsDataModel-Model ) | [ OsCollector] ( #Collector-Class-OsCollector ) | [ OsAnalyzer] ( #Data-Analyzer-Class-OsAnalyzer ) |
20- | PackagePlugin | dnf list --installed<br >dpkg-query -W<br >pacman -Q<br >cat /etc/* release<br >wmic product get name,version | ** Analyzer Args:** <br >- ` exp_package_ver ` : Dict[ str, Optional[ str]] <br >- ` regex_match ` : bool | [ PackageDataModel] ( #PackageDataModel-Model ) | [ PackageCollector] ( #Collector-Class-PackageCollector ) | [ PackageAnalyzer] ( #Data-Analyzer-Class-PackageAnalyzer ) |
21- | PciePlugin | lspci -d {vendor_id}: -nn<br >lspci -x<br >lspci -xxxx<br >lspci -PP<br >lspci -PP -d {vendor_id}:{dev_id}<br >lspci -vt <br >lspci -vvv | ** Analyzer Args:** <br >- ` exp_speed ` : int<br >- ` exp_width ` : int<br >- ` exp_sriov_count ` : int<br >- ` exp_gpu_count_override ` : Optional[ int] <br >- ` exp_max_payload_size ` : Union[ Dict[ int, int] , int, NoneType] <br >- ` exp_max_rd_req_size ` : Union[ Dict[ int, int] , int, NoneType] <br >- ` exp_ten_bit_tag_req_en ` : Union[ Dict[ int, int] , int, NoneType] | [ PcieDataModel] ( #PcieDataModel-Model ) | [ PcieCollector] ( #Collector-Class-PcieCollector ) | [ PcieAnalyzer] ( #Data-Analyzer-Class-PcieAnalyzer ) |
20+ | PackagePlugin | dnf list --installed<br >dpkg-query -W<br >pacman -Q<br >cat /etc/* release<br >wmic product get name,version | ** Analyzer Args:** <br >- ` exp_package_ver ` : Dict[ str, Optional[ str]] <br >- ` regex_match ` : bool< br >- ` rocm_regex ` : Optional [ str ] < br >- ` enable_rocm_regex ` : bool | [ PackageDataModel] ( #PackageDataModel-Model ) | [ PackageCollector] ( #Collector-Class-PackageCollector ) | [ PackageAnalyzer] ( #Data-Analyzer-Class-PackageAnalyzer ) |
21+ | PciePlugin | lspci -d {vendor_id}: -nn<br >lspci -x<br >lspci -xxxx<br >lspci -PP<br >lspci -PP -d {vendor_id}:{dev_id}<br >lspci -vvv <br >lspci -vvvt | ** Analyzer Args:** <br >- ` exp_speed ` : int<br >- ` exp_width ` : int<br >- ` exp_sriov_count ` : int<br >- ` exp_gpu_count_override ` : Optional[ int] <br >- ` exp_max_payload_size ` : Union[ Dict[ int, int] , int, NoneType] <br >- ` exp_max_rd_req_size ` : Union[ Dict[ int, int] , int, NoneType] <br >- ` exp_ten_bit_tag_req_en ` : Union[ Dict[ int, int] , int, NoneType] | [ PcieDataModel] ( #PcieDataModel-Model ) | [ PcieCollector] ( #Collector-Class-PcieCollector ) | [ PcieAnalyzer] ( #Data-Analyzer-Class-PcieAnalyzer ) |
2222| ProcessPlugin | top -b -n 1<br >rocm-smi --showpids<br >top -b -n 1 -o %CPU | ** Analyzer Args:** <br >- ` max_kfd_processes ` : int<br >- ` max_cpu_usage ` : float | [ ProcessDataModel] ( #ProcessDataModel-Model ) | [ ProcessCollector] ( #Collector-Class-ProcessCollector ) | [ ProcessAnalyzer] ( #Data-Analyzer-Class-ProcessAnalyzer ) |
2323| RocmPlugin | {rocm_path}/opencl/bin/* /clinfo<br >env \| grep -Ei 'rocm\| hsa\| hip\| mpi\| openmp\| ucx\| miopen'<br >ls /sys/class/kfd/kfd/proc/<br >grep -i -E 'rocm' /etc/ld.so.conf.d/* <br >{rocm_path}/bin/rocminfo<br >ls -v -d /opt/rocm* <br >ls -v -d /opt/rocm-[ 3-7] * \| tail -1<br >ldconfig -p \| grep -i -E 'rocm'<br >/opt/rocm/.info/version-rocm<br >/opt/rocm/.info/version | ** Analyzer Args:** <br >- ` exp_rocm ` : Union[ str, list] <br >- ` exp_rocm_latest ` : str | [ RocmDataModel] ( #RocmDataModel-Model ) | [ RocmCollector] ( #Collector-Class-RocmCollector ) | [ RocmAnalyzer] ( #Data-Analyzer-Class-RocmAnalyzer ) |
2424| StoragePlugin | sh -c 'df -lH -B1 \| grep -v 'boot''<br >wmic LogicalDisk Where DriveType="3" Get DeviceId,Size,FreeSpace | - | [ StorageDataModel] ( #StorageDataModel-Model ) | [ StorageCollector] ( #Collector-Class-StorageCollector ) | [ StorageAnalyzer] ( #Data-Analyzer-Class-StorageAnalyzer ) |
@@ -42,25 +42,29 @@ Class for collection of inband tool amd-smi data.
4242
4343- ** AMD_SMI_EXE** : ` amd-smi `
4444- ** SUPPORTED_OS_FAMILY** : ` {<OSFamily.LINUX: 3>} `
45- - ** CMD_VERSION** : ` amd-smi version --json `
46- - ** CMD_LIST** : ` amd-smi list --json `
47- - ** CMD_PROCESS** : ` amd-smi process --json `
48- - ** CMD_PARTITION** : ` amd-smi partition --json `
49- - ** CMD_FIRMWARE** : ` amd-smi firmware --json `
50- - ** CMD_STATIC** : ` amd-smi static -g all --json `
45+ - ** CMD_VERSION** : ` version --json `
46+ - ** CMD_LIST** : ` list --json `
47+ - ** CMD_PROCESS** : ` process --json `
48+ - ** CMD_PARTITION** : ` partition --json `
49+ - ** CMD_FIRMWARE** : ` firmware --json `
50+ - ** CMD_STATIC** : ` static -g all --json `
51+ - ** CMD_STATIC_GPU** : ` static -g {gpu_id} --json `
52+ - ** CMD_RAS** : ` ras --cper --folder={folder} `
5153
5254### Provides Data
5355
5456AmdSmiDataModel
5557
5658### Commands
5759
58- - amd-smi firmware --json
59- - amd-smi list --json
60- - amd-smi partition --json
61- - amd-smi process --json
62- - amd-smi static -g all --json
63- - amd-smi version --json
60+ - firmware --json
61+ - list --json
62+ - partition --json
63+ - process --json
64+ - ras --cper --folder={folder}
65+ - static -g all --json
66+ - static -g {gpu_id} --json
67+ - version --json
6468
6569## Collector Class BiosCollector
6670
@@ -300,6 +304,7 @@ Collect memory usage details
300304
301305- ** CMD_WINDOWS** : ` wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value `
302306- ** CMD** : ` free -b `
307+ - ** CMD_LSMEM** : ` /usr/bin/lsmem `
303308
304309### Provides Data
305310
@@ -308,6 +313,7 @@ MemoryDataModel
308313### Commands
309314
310315- free -b
316+ - /usr/bin/lsmem
311317- wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value
312318
313319## Collector Class NvmeCollector
@@ -422,7 +428,7 @@ class for collection of PCIe data only supports Linux OS type.
422428
423429 This class will collect important PCIe data from the system running the commands
424430 - `lspci -vvv` : Verbose collection of PCIe data
425- - `lspci -vt `: Tree view of PCIe data
431+ - `lspci -vvvt `: Verbose tree view of PCIe data
426432 - `lspci -PP`: Path view of PCIe data for the GPUs
427433 - If system interaction level is set to STANDARD or higher, the following commands will be run with sudo:
428434 - `lspci -xxxx`: Hex view of PCIe data for the GPUs
@@ -442,7 +448,7 @@ class for collection of PCIe data only supports Linux OS type.
442448
443449- ** SUPPORTED_OS_FAMILY** : ` {<OSFamily.LINUX: 3>} `
444450- ** CMD_LSPCI_VERBOSE** : ` lspci -vvv `
445- - ** CMD_LSPCI_TREE ** : ` lspci -vt `
451+ - ** CMD_LSPCI_VERBOSE_TREE ** : ` lspci -vvvt `
446452- ** CMD_LSPCI_PATH** : ` lspci -PP `
447453- ** CMD_LSPCI_HEX_SUDO** : ` lspci -xxxx `
448454- ** CMD_LSPCI_HEX** : ` lspci -x `
@@ -460,8 +466,8 @@ PcieDataModel
460466- lspci -xxxx
461467- lspci -PP
462468- lspci -PP -d {vendor_id}:{dev_id}
463- - lspci -vt
464469- lspci -vvv
470+ - lspci -vvvt
465471
466472## Collector Class ProcessCollector
467473
@@ -646,10 +652,15 @@ Data model for amd-smi data.
646652- ** gpu_list** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiListItem]] `
647653- ** partition** : ` Optional[nodescraper.plugins.inband.amdsmi.amdsmidata.Partition] `
648654- ** process** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Processes]] `
655+ - ** topology** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Topo]] `
649656- ** firmware** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.Fw]] `
650657- ** bad_pages** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.BadPages]] `
651658- ** static** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiStatic]] `
652659- ** metric** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiMetric]] `
660+ - ** xgmi_metric** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.XgmiMetrics]] `
661+ - ** xgmi_link** : ` Optional[list[nodescraper.plugins.inband.amdsmi.amdsmidata.XgmiLinks]] `
662+ - ** cper_data** : ` Optional[list[nodescraper.models.datamodel.FileModel]] `
663+ - ** amdsmitst_data** : ` nodescraper.plugins.inband.amdsmi.amdsmidata.AmdSmiTstData `
653664
654665## BiosDataModel Model
655666
@@ -763,6 +774,7 @@ Data model for journal logs
763774
764775- ** mem_free** : ` str `
765776- ** mem_total** : ` str `
777+ - ** lsmem_output** : ` Optional[dict] `
766778
767779## NvmeDataModel Model
768780
@@ -798,6 +810,8 @@ Pacakge data contains the package data for the system
798810### Model annotations and fields
799811
800812- ** version_info** : ` dict[str, str] `
813+ - ** rocm_regex** : ` str `
814+ - ** enable_rocm_regex** : ` bool `
801815
802816## PcieDataModel Model
803817
@@ -915,7 +929,11 @@ Data model for in band syslog logs
915929
916930## Data Analyzer Class AmdSmiAnalyzer
917931
918- ** Bases** : [ 'DataAnalyzer']
932+ ### Description
933+
934+ Check AMD SMI Application data for PCIe, ECC errors, CPER data, and analyze amdsmitst metrics
935+
936+ ** Bases** : [ 'CperAnalysisTaskMixin', 'DataAnalyzer']
919937
920938** Link to code** : [ amdsmi_analyzer.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/amdsmi/amdsmi_analyzer.py )
921939
@@ -1213,6 +1231,9 @@ Check sysctl matches expected sysctl details
12131231- ** devid_ep** : ` Optional[str] `
12141232- ** devid_ep_vf** : ` Optional[str] `
12151233- ** sku_name** : ` Optional[str] `
1234+ - ** expected_xgmi_speed** : ` Optional[list[float]] `
1235+ - ** analysis_range_start** : ` Optional[datetime.datetime] `
1236+ - ** analysis_range_end** : ` Optional[datetime.datetime] `
12161237
12171238## Analyzer Args Class BiosAnalyzerArgs
12181239
@@ -1303,6 +1324,8 @@ Check sysctl matches expected sysctl details
13031324
13041325- ** exp_package_ver** : ` Dict[str, Optional[str]] `
13051326- ** regex_match** : ` bool `
1327+ - ** rocm_regex** : ` Optional[str] `
1328+ - ** enable_rocm_regex** : ` bool `
13061329
13071330## Analyzer Args Class PcieAnalyzerArgs
13081331
0 commit comments