Skip to content

Commit 1117434

Browse files
Merge branch 'development' into jaspal_ethtool
2 parents 158b766 + 07fe1a1 commit 1117434

File tree

1 file changed

+61
-5
lines changed

1 file changed

+61
-5
lines changed

docs/PLUGIN_DOC.md

Lines changed: 61 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,9 @@
1313
| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
1414
| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [JournalData](#JournalData-Model) | [JournalCollector](#Collector-Class-JournalCollector) | - |
1515
| KernelPlugin | sh -c 'uname -a'<br>wmic os get Version /Value | **Analyzer Args:**<br>- `exp_kernel`: Union[str, list]<br>- `regex_match`: bool | [KernelDataModel](#KernelDataModel-Model) | [KernelCollector](#Collector-Class-KernelCollector) | [KernelAnalyzer](#Data-Analyzer-Class-KernelAnalyzer) |
16-
| KernelModulePlugin | cat /proc/modules<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict]<br>- `regex_filter`: list[str] | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) |
17-
| MemoryPlugin | free -b<br>/usr/bin/lsmem<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | **Analyzer Args:**<br>- `ratio`: float<br>- `memory_threshold`: str | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
16+
| KernelModulePlugin | cat /proc/modules<br>modinfo amdgpu<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict]<br>- `regex_filter`: list[str] | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) |
17+
| MemoryPlugin | free -b<br>lsmem<br>numactl -H<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | **Analyzer Args:**<br>- `ratio`: float<br>- `memory_threshold`: str | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
18+
| NetworkPlugin | ip addr show<br>ip neighbor show<br>ip route show<br>ip rule show | - | [NetworkDataModel](#NetworkDataModel-Model) | [NetworkCollector](#Collector-Class-NetworkCollector) | - |
1819
| NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name} | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - |
1920
| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/*release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br>cat /etc/*release \| grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list]<br>- `exact_match`: bool | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) |
2021
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]]<br>- `regex_match`: bool<br>- `rocm_regex`: Optional[str]<br>- `enable_rocm_regex`: bool | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
@@ -284,6 +285,7 @@ Read kernel modules and associated parameters
284285

285286
- **CMD_WINDOWS**: `wmic os get Version /Value`
286287
- **CMD**: `cat /proc/modules`
288+
- **CMD_MODINFO_AMDGPU**: `modinfo amdgpu`
287289

288290
### Provides Data
289291

@@ -292,6 +294,7 @@ KernelModuleDataModel
292294
### Commands
293295

294296
- cat /proc/modules
297+
- modinfo amdgpu
295298
- wmic os get Version /Value
296299

297300
## Collector Class MemoryCollector
@@ -308,7 +311,8 @@ Collect memory usage details
308311

309312
- **CMD_WINDOWS**: `wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value`
310313
- **CMD**: `free -b`
311-
- **CMD_LSMEM**: `/usr/bin/lsmem`
314+
- **CMD_LSMEM**: `lsmem`
315+
- **CMD_NUMACTL**: `numactl -H`
312316

313317
### Provides Data
314318

@@ -317,9 +321,38 @@ MemoryDataModel
317321
### Commands
318322

319323
- free -b
320-
- /usr/bin/lsmem
324+
- lsmem
325+
- numactl -H
321326
- wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value
322327

328+
## Collector Class NetworkCollector
329+
330+
### Description
331+
332+
Collect network configuration details using ip command
333+
334+
**Bases**: ['InBandDataCollector']
335+
336+
**Link to code**: [network_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/network/network_collector.py)
337+
338+
### Class Variables
339+
340+
- **CMD_ADDR**: `ip addr show`
341+
- **CMD_ROUTE**: `ip route show`
342+
- **CMD_RULE**: `ip rule show`
343+
- **CMD_NEIGHBOR**: `ip neighbor show`
344+
345+
### Provides Data
346+
347+
NetworkDataModel
348+
349+
### Commands
350+
351+
- ip addr show
352+
- ip neighbor show
353+
- ip route show
354+
- ip rule show
355+
323356
## Collector Class NvmeCollector
324357

325358
### Description
@@ -769,9 +802,14 @@ Data model for journal logs
769802
### Model annotations and fields
770803

771804
- **kernel_modules**: `dict`
805+
- **amdgpu_modinfo**: `Optional[nodescraper.plugins.inband.kernel_module.kernel_module_data.ModuleInfo]`
772806

773807
## MemoryDataModel Model
774808

809+
### Description
810+
811+
Memory data model
812+
775813
**Link to code**: [memorydata.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/memory/memorydata.py)
776814

777815
**Bases**: ['DataModel']
@@ -780,7 +818,25 @@ Data model for journal logs
780818

781819
- **mem_free**: `str`
782820
- **mem_total**: `str`
783-
- **lsmem_output**: `Optional[dict]`
821+
- **lsmem_data**: `Optional[nodescraper.plugins.inband.memory.memorydata.LsmemData]`
822+
- **numa_topology**: `Optional[nodescraper.plugins.inband.memory.memorydata.NumaTopology]`
823+
824+
## NetworkDataModel Model
825+
826+
### Description
827+
828+
Complete network configuration data
829+
830+
**Link to code**: [networkdata.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/network/networkdata.py)
831+
832+
**Bases**: ['DataModel']
833+
834+
### Model annotations and fields
835+
836+
- **interfaces**: `List[nodescraper.plugins.inband.network.networkdata.NetworkInterface]`
837+
- **routes**: `List[nodescraper.plugins.inband.network.networkdata.Route]`
838+
- **rules**: `List[nodescraper.plugins.inband.network.networkdata.RoutingRule]`
839+
- **neighbors**: `List[nodescraper.plugins.inband.network.networkdata.Neighbor]`
784840

785841
## NvmeDataModel Model
786842

0 commit comments

Comments
 (0)