1313| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
1414| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | - |
1515| KernelPlugin | sh -c 'uname -a'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` regex_match ` : bool | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
16- | KernelModulePlugin | cat /proc/modules<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
17- | MemoryPlugin | free -b<br >/usr/bin/lsmem<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | ** Analyzer Args:** <br >- ` ratio ` : float<br >- ` memory_threshold ` : str | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
16+ | KernelModulePlugin | cat /proc/modules<br >modinfo amdgpu<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
17+ | MemoryPlugin | free -b<br >lsmem<br >numactl -H<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | ** Analyzer Args:** <br >- ` ratio ` : float<br >- ` memory_threshold ` : str | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
18+ | NetworkPlugin | ip addr show<br >ip neighbor show<br >ip route show<br >ip rule show | - | [ NetworkDataModel] ( #NetworkDataModel-Model ) | [ NetworkCollector] ( #Collector-Class-NetworkCollector ) | - |
1819| NvmePlugin | nvme smart-log {dev}<br >nvme error-log {dev} --log-entries=256<br >nvme id-ctrl {dev}<br >nvme id-ns {dev}{ns}<br >nvme fw-log {dev}<br >nvme self-test-log {dev}<br >nvme get-log {dev} --log-id=6 --log-len=512<br >nvme telemetry-log {dev} --output-file={dev}_ {f_name} | - | [ NvmeDataModel] ( #NvmeDataModel-Model ) | [ NvmeCollector] ( #Collector-Class-NvmeCollector ) | - |
1920| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/* release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br >cat /etc/* release \| grep VERSION_ID<br >wmic os get Version /value<br >wmic os get Caption /Value | ** Analyzer Args:** <br >- ` exp_os ` : Union[ str, list] <br >- ` exact_match ` : bool | [ OsDataModel] ( #OsDataModel-Model ) | [ OsCollector] ( #Collector-Class-OsCollector ) | [ OsAnalyzer] ( #Data-Analyzer-Class-OsAnalyzer ) |
2021| PackagePlugin | dnf list --installed<br >dpkg-query -W<br >pacman -Q<br >cat /etc/* release<br >wmic product get name,version | ** Analyzer Args:** <br >- ` exp_package_ver ` : Dict[ str, Optional[ str]] <br >- ` regex_match ` : bool<br >- ` rocm_regex ` : Optional[ str] <br >- ` enable_rocm_regex ` : bool | [ PackageDataModel] ( #PackageDataModel-Model ) | [ PackageCollector] ( #Collector-Class-PackageCollector ) | [ PackageAnalyzer] ( #Data-Analyzer-Class-PackageAnalyzer ) |
@@ -284,6 +285,7 @@ Read kernel modules and associated parameters
284285
285286- ** CMD_WINDOWS** : ` wmic os get Version /Value `
286287- ** CMD** : ` cat /proc/modules `
288+ - ** CMD_MODINFO_AMDGPU** : ` modinfo amdgpu `
287289
288290### Provides Data
289291
@@ -292,6 +294,7 @@ KernelModuleDataModel
292294### Commands
293295
294296- cat /proc/modules
297+ - modinfo amdgpu
295298- wmic os get Version /Value
296299
297300## Collector Class MemoryCollector
@@ -308,7 +311,8 @@ Collect memory usage details
308311
309312- ** CMD_WINDOWS** : ` wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value `
310313- ** CMD** : ` free -b `
311- - ** CMD_LSMEM** : ` /usr/bin/lsmem `
314+ - ** CMD_LSMEM** : ` lsmem `
315+ - ** CMD_NUMACTL** : ` numactl -H `
312316
313317### Provides Data
314318
@@ -317,9 +321,38 @@ MemoryDataModel
317321### Commands
318322
319323- free -b
320- - /usr/bin/lsmem
324+ - lsmem
325+ - numactl -H
321326- wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value
322327
328+ ## Collector Class NetworkCollector
329+
330+ ### Description
331+
332+ Collect network configuration details using ip command
333+
334+ ** Bases** : [ 'InBandDataCollector']
335+
336+ ** Link to code** : [ network_collector.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/network/network_collector.py )
337+
338+ ### Class Variables
339+
340+ - ** CMD_ADDR** : ` ip addr show `
341+ - ** CMD_ROUTE** : ` ip route show `
342+ - ** CMD_RULE** : ` ip rule show `
343+ - ** CMD_NEIGHBOR** : ` ip neighbor show `
344+
345+ ### Provides Data
346+
347+ NetworkDataModel
348+
349+ ### Commands
350+
351+ - ip addr show
352+ - ip neighbor show
353+ - ip route show
354+ - ip rule show
355+
323356## Collector Class NvmeCollector
324357
325358### Description
@@ -769,9 +802,14 @@ Data model for journal logs
769802### Model annotations and fields
770803
771804- ** kernel_modules** : ` dict `
805+ - ** amdgpu_modinfo** : ` Optional[nodescraper.plugins.inband.kernel_module.kernel_module_data.ModuleInfo] `
772806
773807## MemoryDataModel Model
774808
809+ ### Description
810+
811+ Memory data model
812+
775813** Link to code** : [ memorydata.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/memory/memorydata.py )
776814
777815** Bases** : [ 'DataModel']
@@ -780,7 +818,25 @@ Data model for journal logs
780818
781819- ** mem_free** : ` str `
782820- ** mem_total** : ` str `
783- - ** lsmem_output** : ` Optional[dict] `
821+ - ** lsmem_data** : ` Optional[nodescraper.plugins.inband.memory.memorydata.LsmemData] `
822+ - ** numa_topology** : ` Optional[nodescraper.plugins.inband.memory.memorydata.NumaTopology] `
823+
824+ ## NetworkDataModel Model
825+
826+ ### Description
827+
828+ Complete network configuration data
829+
830+ ** Link to code** : [ networkdata.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/network/networkdata.py )
831+
832+ ** Bases** : [ 'DataModel']
833+
834+ ### Model annotations and fields
835+
836+ - ** interfaces** : ` List[nodescraper.plugins.inband.network.networkdata.NetworkInterface] `
837+ - ** routes** : ` List[nodescraper.plugins.inband.network.networkdata.Route] `
838+ - ** rules** : ` List[nodescraper.plugins.inband.network.networkdata.RoutingRule] `
839+ - ** neighbors** : ` List[nodescraper.plugins.inband.network.networkdata.Neighbor] `
784840
785841## NvmeDataModel Model
786842
0 commit comments