77| AmdSmiPlugin | firmware --json<br>list --json<br>partition --json<br>process --json<br>ras --cper --folder={folder}<br>static -g all --json<br>static -g {gpu_id} --json<br>version --json | **Analyzer Args:**<br>- `check_static_data`: bool<br>- `expected_gpu_processes`: Optional[int]<br>- `expected_max_power`: Optional[int]<br>- `expected_driver_version`: Optional[str]<br>- `expected_memory_partition_mode`: Optional[str]<br>- `expected_compute_partition_mode`: Optional[str]<br>- `expected_pldm_version`: Optional[str]<br>- `l0_to_recovery_count_error_threshold`: Optional[int]<br>- `l0_to_recovery_count_warning_threshold`: Optional[int]<br>- `vendorid_ep`: Optional[str]<br>- `vendorid_ep_vf`: Optional[str]<br>- `devid_ep`: Optional[str]<br>- `devid_ep_vf`: Optional[str]<br>- `sku_name`: Optional[str]<br>- `expected_xgmi_speed`: Optional[list[float]]<br>- `analysis_range_start`: Optional[datetime.datetime]<br>- `analysis_range_end`: Optional[datetime.datetime] | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
88| BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'<br >wmic bios get SMBIOSBIOSVersion /Value | ** Analyzer Args:** <br >- ` exp_bios_version ` : list[ str] <br >- ` regex_match ` : bool | [ BiosDataModel] ( #BiosDataModel-Model ) | [ BiosCollector] ( #Collector-Class-BiosCollector ) | [ BiosAnalyzer] ( #Data-Analyzer-Class-BiosAnalyzer ) |
99| CmdlinePlugin | cat /proc/cmdline | ** Analyzer Args:** <br >- ` required_cmdline ` : Union[ str, list] <br >- ` banned_cmdline ` : Union[ str, list] | [ CmdlineDataModel] ( #CmdlineDataModel-Model ) | [ CmdlineCollector] ( #Collector-Class-CmdlineCollector ) | [ CmdlineAnalyzer] ( #Data-Analyzer-Class-CmdlineAnalyzer ) |
10- | DeviceEnumerationPlugin | lscpu \| grep Socket \| awk '{ print $2 }'< br > powershell -Command "(Get-WmiObject -Class Win32_Processor \| Measure-Object).Count"<br >lspci -d {vendorid_ep}: \| grep -i 'VGA\\ | Display\\ | 3D' \| wc -l<br >powershell -Command "(wmic path win32_VideoController get name \| findstr AMD \| Measure-Object).Count"<br >lspci -d {vendorid_ep}: \| grep -i 'Virtual Function' \| wc -l<br >powershell -Command "(Get-VMHostPartitionableGpu \| Measure-Object).Count" | ** Analyzer Args:** <br >- ` cpu_count ` : Optional[ list[ int]] <br >- ` gpu_count ` : Optional[ list[ int]] <br >- ` vf_count ` : Optional[ list[ int]] | [ DeviceEnumerationDataModel] ( #DeviceEnumerationDataModel-Model ) | [ DeviceEnumerationCollector] ( #Collector-Class-DeviceEnumerationCollector ) | [ DeviceEnumerationAnalyzer] ( #Data-Analyzer-Class-DeviceEnumerationAnalyzer ) |
11- | DimmPlugin | sh -c 'dmidecode -t 17 \| tr -s " " \| grep -v "Volatile\\ | None\\ | Module" \| grep Size' 2>/dev/null<br >wmic memorychip get Capacity | - | [ DimmDataModel] ( #DimmDataModel-Model ) | [ DimmCollector] ( #Collector-Class-DimmCollector ) | - |
10+ | DeviceEnumerationPlugin | powershell -Command "(Get-WmiObject -Class Win32_Processor \| Measure-Object).Count"<br >lspci -d {vendorid_ep}: \| grep -i 'VGA\\ | Display\\ | 3D' \| wc -l<br >powershell -Command "(wmic path win32_VideoController get name \| findstr AMD \| Measure-Object).Count"< br >lscpu< br >lshw <br >lspci -d {vendorid_ep}: \| grep -i 'Virtual Function' \| wc -l<br >powershell -Command "(Get-VMHostPartitionableGpu \| Measure-Object).Count" | ** Analyzer Args:** <br >- ` cpu_count ` : Optional[ list[ int]] <br >- ` gpu_count ` : Optional[ list[ int]] <br >- ` vf_count ` : Optional[ list[ int]] | [ DeviceEnumerationDataModel] ( #DeviceEnumerationDataModel-Model ) | [ DeviceEnumerationCollector] ( #Collector-Class-DeviceEnumerationCollector ) | [ DeviceEnumerationAnalyzer] ( #Data-Analyzer-Class-DeviceEnumerationAnalyzer ) |
11+ | DimmPlugin | sh -c 'dmidecode -t 17 \| tr -s " " \| grep -v "Volatile\\ | None\\ | Module" \| grep Size' 2>/dev/null<br >dmidecode< br > wmic memorychip get Capacity | - | [ DimmDataModel] ( #DimmDataModel-Model ) | [ DimmCollector] ( #Collector-Class-DimmCollector ) | - |
1212| DkmsPlugin | dkms status<br >dkms --version | ** Analyzer Args:** <br >- ` dkms_status ` : Union[ str, list] <br >- ` dkms_version ` : Union[ str, list] <br >- ` regex_match ` : bool | [ DkmsDataModel] ( #DkmsDataModel-Model ) | [ DkmsCollector] ( #Collector-Class-DkmsCollector ) | [ DkmsAnalyzer] ( #Data-Analyzer-Class-DkmsAnalyzer ) |
1313| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
1414| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | - |
1515| KernelPlugin | sh -c 'uname -a'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` regex_match ` : bool | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
1616| KernelModulePlugin | cat /proc/modules<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
17- | MemoryPlugin | free -b<br >/usr/bin/lsmem<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
17+ | MemoryPlugin | free -b<br >/usr/bin/lsmem<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | ** Analyzer Args: ** < br >- ` ratio ` : float< br >- ` memory_threshold ` : str | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
1818| NvmePlugin | nvme smart-log {dev}<br >nvme error-log {dev} --log-entries=256<br >nvme id-ctrl {dev}<br >nvme id-ns {dev}{ns}<br >nvme fw-log {dev}<br >nvme self-test-log {dev}<br >nvme get-log {dev} --log-id=6 --log-len=512<br >nvme telemetry-log {dev} --output-file={dev}_ {f_name} | - | [ NvmeDataModel] ( #NvmeDataModel-Model ) | [ NvmeCollector] ( #Collector-Class-NvmeCollector ) | - |
1919| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/* release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br >cat /etc/* release \| grep VERSION_ID<br >wmic os get Version /value<br >wmic os get Caption /Value | ** Analyzer Args:** <br >- ` exp_os ` : Union[ str, list] <br >- ` exact_match ` : bool | [ OsDataModel] ( #OsDataModel-Model ) | [ OsCollector] ( #Collector-Class-OsCollector ) | [ OsAnalyzer] ( #Data-Analyzer-Class-OsAnalyzer ) |
2020| PackagePlugin | dnf list --installed<br >dpkg-query -W<br >pacman -Q<br >cat /etc/* release<br >wmic product get name,version | ** Analyzer Args:** <br >- ` exp_package_ver ` : Dict[ str, Optional[ str]] <br >- ` regex_match ` : bool<br >- ` rocm_regex ` : Optional[ str] <br >- ` enable_rocm_regex ` : bool | [ PackageDataModel] ( #PackageDataModel-Model ) | [ PackageCollector] ( #Collector-Class-PackageCollector ) | [ PackageAnalyzer] ( #Data-Analyzer-Class-PackageAnalyzer ) |
@@ -125,9 +125,10 @@ Collect CPU and GPU count
125125
126126### Class Variables
127127
128- - ** CMD_CPU_COUNT_LINUX** : ` lscpu | grep Socket | awk '{ print $2 }' `
129128- ** CMD_GPU_COUNT_LINUX** : ` lspci -d {vendorid_ep}: | grep -i 'VGA\|Display\|3D' | wc -l `
130129- ** CMD_VF_COUNT_LINUX** : ` lspci -d {vendorid_ep}: | grep -i 'Virtual Function' | wc -l `
130+ - ** CMD_LSCPU_LINUX** : ` lscpu `
131+ - ** CMD_LSHW_LINUX** : ` lshw `
131132- ** CMD_CPU_COUNT_WINDOWS** : ` powershell -Command "(Get-WmiObject -Class Win32_Processor | Measure-Object).Count" `
132133- ** CMD_GPU_COUNT_WINDOWS** : ` powershell -Command "(wmic path win32_VideoController get name | findstr AMD | Measure-Object).Count" `
133134- ** CMD_VF_COUNT_WINDOWS** : ` powershell -Command "(Get-VMHostPartitionableGpu | Measure-Object).Count" `
@@ -138,10 +139,11 @@ DeviceEnumerationDataModel
138139
139140### Commands
140141
141- - lscpu | grep Socket | awk '{ print $2 }'
142142- powershell -Command "(Get-WmiObject -Class Win32_Processor | Measure-Object).Count"
143143- lspci -d {vendorid_ep}: | grep -i 'VGA\| Display\| 3D' | wc -l
144144- powershell -Command "(wmic path win32_VideoController get name | findstr AMD | Measure-Object).Count"
145+ - lscpu
146+ - lshw
145147- lspci -d {vendorid_ep}: | grep -i 'Virtual Function' | wc -l
146148- powershell -Command "(Get-VMHostPartitionableGpu | Measure-Object).Count"
147149
@@ -159,6 +161,7 @@ Collect data on installed DIMMs
159161
160162- ** CMD_WINDOWS** : ` wmic memorychip get Capacity `
161163- ** CMD** : ` sh -c 'dmidecode -t 17 | tr -s " " | grep -v "Volatile\|None\|Module" | grep Size' 2>/dev/null `
164+ - ** CMD_DMIDECODE_FULL** : ` dmidecode `
162165
163166### Provides Data
164167
@@ -167,6 +170,7 @@ DimmDataModel
167170### Commands
168171
169172- sh -c 'dmidecode -t 17 | tr -s " " | grep -v "Volatile\| None\| Module" | grep Size' 2>/dev/null
173+ - dmidecode
170174- wmic memorychip get Capacity
171175
172176## Collector Class DkmsCollector
@@ -693,6 +697,8 @@ Data model for amd-smi data.
693697- ** cpu_count** : ` Optional[int] `
694698- ** gpu_count** : ` Optional[int] `
695699- ** vf_count** : ` Optional[int] `
700+ - ** lscpu_output** : ` Optional[str] `
701+ - ** lshw_output** : ` Optional[str] `
696702
697703## DimmDataModel Model
698704
@@ -1303,6 +1309,17 @@ Check sysctl matches expected sysctl details
13031309- ** kernel_modules** : ` dict[str, dict] `
13041310- ** regex_filter** : ` list[str] `
13051311
1312+ ## Analyzer Args Class MemoryAnalyzerArgs
1313+
1314+ ** Bases** : [ 'AnalyzerArgs']
1315+
1316+ ** Link to code** : [ analyzer_args.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/memory/analyzer_args.py )
1317+
1318+ ### Annotations / fields
1319+
1320+ - ** ratio** : ` float `
1321+ - ** memory_threshold** : ` str `
1322+
13061323## Analyzer Args Class OsAnalyzerArgs
13071324
13081325** Bases** : [ 'AnalyzerArgs']
0 commit comments