Skip to content

Commit 83d0bd0

Browse files
Merge branch 'development' into alex_devenum_update
2 parents ee05840 + 720832b commit 83d0bd0

File tree

7 files changed

+199
-16
lines changed

7 files changed

+199
-16
lines changed

docs/PLUGIN_DOC.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
| MemoryPlugin | free -b<br>/usr/bin/lsmem<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
1818
| NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name} | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - |
1919
| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/*release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br>cat /etc/*release \| grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list]<br>- `exact_match`: bool | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) |
20-
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]]<br>- `regex_match`: bool | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
21-
| PciePlugin | lspci -d {vendor_id}: -nn<br>lspci -x<br>lspci -xxxx<br>lspci -PP<br>lspci -PP -d {vendor_id}:{dev_id}<br>lspci -vt<br>lspci -vvv | **Analyzer Args:**<br>- `exp_speed`: int<br>- `exp_width`: int<br>- `exp_sriov_count`: int<br>- `exp_gpu_count_override`: Optional[int]<br>- `exp_max_payload_size`: Union[Dict[int, int], int, NoneType]<br>- `exp_max_rd_req_size`: Union[Dict[int, int], int, NoneType]<br>- `exp_ten_bit_tag_req_en`: Union[Dict[int, int], int, NoneType] | [PcieDataModel](#PcieDataModel-Model) | [PcieCollector](#Collector-Class-PcieCollector) | [PcieAnalyzer](#Data-Analyzer-Class-PcieAnalyzer) |
20+
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]]<br>- `regex_match`: bool<br>- `rocm_regex`: Optional[str]<br>- `enable_rocm_regex`: bool | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
21+
| PciePlugin | lspci -d {vendor_id}: -nn<br>lspci -x<br>lspci -xxxx<br>lspci -PP<br>lspci -PP -d {vendor_id}:{dev_id}<br>lspci -vvv<br>lspci -vvvt | **Analyzer Args:**<br>- `exp_speed`: int<br>- `exp_width`: int<br>- `exp_sriov_count`: int<br>- `exp_gpu_count_override`: Optional[int]<br>- `exp_max_payload_size`: Union[Dict[int, int], int, NoneType]<br>- `exp_max_rd_req_size`: Union[Dict[int, int], int, NoneType]<br>- `exp_ten_bit_tag_req_en`: Union[Dict[int, int], int, NoneType] | [PcieDataModel](#PcieDataModel-Model) | [PcieCollector](#Collector-Class-PcieCollector) | [PcieAnalyzer](#Data-Analyzer-Class-PcieAnalyzer) |
2222
| ProcessPlugin | top -b -n 1<br>rocm-smi --showpids<br>top -b -n 1 -o %CPU | **Analyzer Args:**<br>- `max_kfd_processes`: int<br>- `max_cpu_usage`: float | [ProcessDataModel](#ProcessDataModel-Model) | [ProcessCollector](#Collector-Class-ProcessCollector) | [ProcessAnalyzer](#Data-Analyzer-Class-ProcessAnalyzer) |
2323
| RocmPlugin | {rocm_path}/opencl/bin/*/clinfo<br>env \| grep -Ei 'rocm\|hsa\|hip\|mpi\|openmp\|ucx\|miopen'<br>ls /sys/class/kfd/kfd/proc/<br>grep -i -E 'rocm' /etc/ld.so.conf.d/*<br>{rocm_path}/bin/rocminfo<br>ls -v -d /opt/rocm*<br>ls -v -d /opt/rocm-[3-7]* \| tail -1<br>ldconfig -p \| grep -i -E 'rocm'<br>/opt/rocm/.info/version-rocm<br>/opt/rocm/.info/version | **Analyzer Args:**<br>- `exp_rocm`: Union[str, list]<br>- `exp_rocm_latest`: str | [RocmDataModel](#RocmDataModel-Model) | [RocmCollector](#Collector-Class-RocmCollector) | [RocmAnalyzer](#Data-Analyzer-Class-RocmAnalyzer) |
2424
| StoragePlugin | sh -c 'df -lH -B1 \| grep -v 'boot''<br>wmic LogicalDisk Where DriveType="3" Get DeviceId,Size,FreeSpace | - | [StorageDataModel](#StorageDataModel-Model) | [StorageCollector](#Collector-Class-StorageCollector) | [StorageAnalyzer](#Data-Analyzer-Class-StorageAnalyzer) |
@@ -428,7 +428,7 @@ class for collection of PCIe data only supports Linux OS type.
428428

429429
This class will collect important PCIe data from the system running the commands
430430
- `lspci -vvv` : Verbose collection of PCIe data
431-
- `lspci -vt`: Tree view of PCIe data
431+
- `lspci -vvvt`: Verbose tree view of PCIe data
432432
- `lspci -PP`: Path view of PCIe data for the GPUs
433433
- If system interaction level is set to STANDARD or higher, the following commands will be run with sudo:
434434
- `lspci -xxxx`: Hex view of PCIe data for the GPUs
@@ -448,7 +448,7 @@ class for collection of PCIe data only supports Linux OS type.
448448

449449
- **SUPPORTED_OS_FAMILY**: `{<OSFamily.LINUX: 3>}`
450450
- **CMD_LSPCI_VERBOSE**: `lspci -vvv`
451-
- **CMD_LSPCI_TREE**: `lspci -vt`
451+
- **CMD_LSPCI_VERBOSE_TREE**: `lspci -vvvt`
452452
- **CMD_LSPCI_PATH**: `lspci -PP`
453453
- **CMD_LSPCI_HEX_SUDO**: `lspci -xxxx`
454454
- **CMD_LSPCI_HEX**: `lspci -x`
@@ -466,8 +466,8 @@ PcieDataModel
466466
- lspci -xxxx
467467
- lspci -PP
468468
- lspci -PP -d {vendor_id}:{dev_id}
469-
- lspci -vt
470469
- lspci -vvv
470+
- lspci -vvvt
471471

472472
## Collector Class ProcessCollector
473473

@@ -810,6 +810,8 @@ Pacakge data contains the package data for the system
810810
### Model annotations and fields
811811

812812
- **version_info**: `dict[str, str]`
813+
- **rocm_regex**: `str`
814+
- **enable_rocm_regex**: `bool`
813815

814816
## PcieDataModel Model
815817

@@ -1322,6 +1324,8 @@ Check sysctl matches expected sysctl details
13221324

13231325
- **exp_package_ver**: `Dict[str, Optional[str]]`
13241326
- **regex_match**: `bool`
1327+
- **rocm_regex**: `Optional[str]`
1328+
- **enable_rocm_regex**: `bool`
13251329

13261330
## Analyzer Args Class PcieAnalyzerArgs
13271331

nodescraper/plugins/inband/package/analyzer_args.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,15 @@
3434
class PackageAnalyzerArgs(AnalyzerArgs):
3535
exp_package_ver: Dict[str, Optional[str]] = Field(default_factory=dict)
3636
regex_match: bool = False
37+
# rocm_regex is optional and should be specified in plugin_config.json if needed
38+
rocm_regex: Optional[str] = None
39+
enable_rocm_regex: bool = False
3740

3841
@classmethod
3942
def build_from_model(cls, datamodel: PackageDataModel) -> "PackageAnalyzerArgs":
40-
return cls(exp_package_ver=datamodel.version_info)
43+
# Use custom rocm_regex from collection_args if enable_rocm_regex is true
44+
rocm_regex = None
45+
if datamodel.enable_rocm_regex and datamodel.rocm_regex:
46+
rocm_regex = datamodel.rocm_regex
47+
48+
return cls(exp_package_ver=datamodel.version_info, rocm_regex=rocm_regex)

nodescraper/plugins/inband/package/package_collector.py

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,11 @@
3434
from nodescraper.models import TaskResult
3535
from nodescraper.utils import get_exception_details
3636

37+
from .analyzer_args import PackageAnalyzerArgs
3738
from .packagedata import PackageDataModel
3839

3940

40-
class PackageCollector(InBandDataCollector[PackageDataModel, None]):
41+
class PackageCollector(InBandDataCollector[PackageDataModel, PackageAnalyzerArgs]):
4142
"""Collecting Package information from the system"""
4243

4344
DATA_MODEL = PackageDataModel
@@ -181,9 +182,34 @@ def _handle_command_failure(self, command_artifact: CommandArtifact):
181182
self.result.message = "Failed to run Package Manager command"
182183
self.result.status = ExecutionStatus.EXECUTION_FAILURE
183184

184-
def collect_data(self, args=None) -> tuple[TaskResult, Optional[PackageDataModel]]:
185+
def _filter_rocm_packages(self, packages: dict[str, str], rocm_pattern: str) -> dict[str, str]:
186+
"""Filter ROCm-related packages from a package dictionary.
187+
188+
This method searches package names for ROCm-related patterns and returns
189+
only the matching packages.
190+
191+
Args:
192+
packages (dict[str, str]): Dictionary with package names as keys and versions as values.
193+
rocm_pattern (str): Regex pattern to match ROCm-related package names.
194+
195+
Returns:
196+
dict[str, str]: Filtered dictionary containing only ROCm-related packages.
197+
"""
198+
rocm_packages = {}
199+
pattern = re.compile(rocm_pattern, re.IGNORECASE)
200+
for package_name, version in packages.items():
201+
if pattern.search(package_name):
202+
rocm_packages[package_name] = version
203+
return rocm_packages
204+
205+
def collect_data(
206+
self, args: Optional[PackageAnalyzerArgs] = None
207+
) -> tuple[TaskResult, Optional[PackageDataModel]]:
185208
"""Collect package information from the system.
186209
210+
Args:
211+
args (Optional[PackageAnalyzerArgs]): Optional arguments containing ROCm regex pattern.
212+
187213
Returns:
188214
tuple[TaskResult, Optional[PackageDataModel]]: tuple containing the task result and a PackageDataModel instance
189215
with the collected package information, or None if there was an error.
@@ -205,8 +231,36 @@ def collect_data(self, args=None) -> tuple[TaskResult, Optional[PackageDataModel
205231
self.result.message = "Unsupported OS"
206232
self.result.status = ExecutionStatus.NOT_RAN
207233
return self.result, None
234+
235+
# Filter and log ROCm packages if on Linux and rocm_regex is provided
236+
if self.system_info.os_family == OSFamily.LINUX and packages:
237+
# Get ROCm pattern from args if provided
238+
rocm_pattern = args.rocm_regex if args else None
239+
if rocm_pattern:
240+
self.logger.info("Using rocm_pattern: %s", rocm_pattern)
241+
rocm_packages = self._filter_rocm_packages(packages, rocm_pattern)
242+
if rocm_packages:
243+
self.result.message = (
244+
f"Found {len(rocm_packages)} ROCm-related packages installed"
245+
)
246+
self.result.status = ExecutionStatus.OK
247+
self._log_event(
248+
category=EventCategory.OS,
249+
description=f"Found {len(rocm_packages)} ROCm-related packages installed",
250+
priority=EventPriority.INFO,
251+
data={"rocm_packages": sorted(rocm_packages.keys())},
252+
)
253+
else:
254+
self.logger.info("No rocm_regex provided, skipping ROCm package filtering")
255+
256+
# Extract rocm_regex and enable_rocm_regex from args if provided
257+
rocm_regex = args.rocm_regex if (args and args.rocm_regex) else ""
258+
enable_rocm_regex = getattr(args, "enable_rocm_regex", False) if args else False
259+
208260
try:
209-
package_model = PackageDataModel(version_info=packages)
261+
package_model = PackageDataModel(
262+
version_info=packages, rocm_regex=rocm_regex, enable_rocm_regex=enable_rocm_regex
263+
)
210264
except ValidationError as val_err:
211265
self._log_event(
212266
category=EventCategory.RUNTIME,

nodescraper/plugins/inband/package/packagedata.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ class PackageDataModel(DataModel):
3232
Attributes:
3333
version_info (dict[str, str]): The version information for the package
3434
Key is the package name and value is the version of the package
35+
rocm_regex (str): Regular expression pattern for ROCm package filtering
36+
enable_rocm_regex (bool): Whether to use custom ROCm regex from collection_args
3537
"""
3638

3739
version_info: dict[str, str]
40+
rocm_regex: str = ""
41+
enable_rocm_regex: bool = False

nodescraper/plugins/inband/pcie/pcie_collector.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ class PcieCollector(InBandDataCollector[PcieDataModel, None]):
6464
6565
This class will collect important PCIe data from the system running the commands
6666
- `lspci -vvv` : Verbose collection of PCIe data
67-
- `lspci -vt`: Tree view of PCIe data
67+
- `lspci -vvvt`: Verbose tree view of PCIe data
6868
- `lspci -PP`: Path view of PCIe data for the GPUs
6969
- If system interaction level is set to STANDARD or higher, the following commands will be run with sudo:
7070
- `lspci -xxxx`: Hex view of PCIe data for the GPUs
@@ -83,7 +83,7 @@ class PcieCollector(InBandDataCollector[PcieDataModel, None]):
8383
DATA_MODEL = PcieDataModel
8484

8585
CMD_LSPCI_VERBOSE = "lspci -vvv"
86-
CMD_LSPCI_TREE = "lspci -vt"
86+
CMD_LSPCI_VERBOSE_TREE = "lspci -vvvt"
8787
CMD_LSPCI_PATH = "lspci -PP"
8888
CMD_LSPCI_HEX_SUDO = "lspci -xxxx"
8989
CMD_LSPCI_HEX = "lspci -x"
@@ -142,8 +142,8 @@ def show_lspci_verbose(self, sudo=True) -> Optional[str]:
142142
return self._run_os_cmd(self.CMD_LSPCI_VERBOSE, sudo=sudo)
143143

144144
def show_lspci_verbose_tree(self, sudo=True) -> Optional[str]:
145-
"""Show lspci with -vt."""
146-
return self._run_os_cmd(self.CMD_LSPCI_TREE, sudo=sudo)
145+
"""Show lspci with -vvvt (verbose tree view)."""
146+
return self._run_os_cmd(self.CMD_LSPCI_VERBOSE_TREE, sudo=sudo)
147147

148148
def show_lspci_path(self, sudo=True) -> Optional[str]:
149149
"""Show lspci with -PP."""
@@ -548,13 +548,13 @@ def _log_pcie_artifacts(
548548
self,
549549
lspci_pp: Optional[str],
550550
lspci_hex: Optional[str],
551-
lspci_tree: Optional[str],
551+
lspci_verbose_tree: Optional[str],
552552
lspci_verbose: Optional[str],
553553
):
554554
"""Log the file artifacts for the PCIe data collector."""
555555
name_log_map = {
556556
"lspci_hex.txt": lspci_hex,
557-
"lspci_tree.txt": lspci_tree,
557+
"lspci_verbose_tree.txt": lspci_verbose_tree,
558558
"lspci_verbose.txt": lspci_verbose,
559559
"lspci_pp.txt": lspci_pp,
560560
}
@@ -629,7 +629,7 @@ def _get_pcie_data(
629629
self._log_pcie_artifacts(
630630
lspci_pp=lspci_path,
631631
lspci_hex=lspci_hex,
632-
lspci_tree=lspci_verbose_tree,
632+
lspci_verbose_tree=lspci_verbose_tree,
633633
lspci_verbose=lspci_verbose,
634634
)
635635
pcie_data = PcieDataModel(

test/functional/fixtures/package_plugin_config.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22
"global_args": {},
33
"plugins": {
44
"PackagePlugin": {
5+
"collection_args": {
6+
"rocm_regex": "rocm|hip|hsa|amdgpu",
7+
"enable_rocm_regex": true
8+
},
59
"analysis_args": {
610
"exp_package_ver": {
711
"gcc": "11.4.0"

test/unit/plugin/test_package_collector.py

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,3 +222,112 @@ def test_bad_splits_ubuntu(collector, conn_mock, command_results):
222222
]
223223
res, _ = collector.collect_data()
224224
assert res.status == ExecutionStatus.OK
225+
226+
227+
def test_rocm_package_filtering_custom_regex(collector, conn_mock, command_results):
228+
"""Test ROCm package filtering with custom regex pattern."""
229+
from nodescraper.plugins.inband.package.analyzer_args import PackageAnalyzerArgs
230+
231+
# Mock Ubuntu system with ROCm packages
232+
ubuntu_packages = """rocm-core 5.7.0
233+
hip-runtime-amd 5.7.0
234+
hsa-rocr 1.9.0
235+
amdgpu-dkms 6.3.6
236+
gcc 11.4.0
237+
python3 3.10.12"""
238+
239+
conn_mock.run_command.side_effect = [
240+
CommandArtifact(
241+
command="",
242+
exit_code=0,
243+
stdout=command_results["ubuntu_rel"],
244+
stderr="",
245+
),
246+
CommandArtifact(
247+
command="",
248+
exit_code=0,
249+
stdout=ubuntu_packages,
250+
stderr="",
251+
),
252+
]
253+
254+
# Use custom regex that only matches 'rocm' and 'hip'
255+
args = PackageAnalyzerArgs(rocm_regex="rocm|hip")
256+
res, data = collector.collect_data(args)
257+
assert res.status == ExecutionStatus.OK
258+
# Check that ROCm packages are found
259+
assert "found 2 rocm-related packages" in res.message.lower()
260+
assert data is not None
261+
262+
263+
def test_rocm_package_filtering_no_matches(collector, conn_mock, command_results):
264+
"""Test ROCm package filtering when no ROCm packages are installed."""
265+
from nodescraper.plugins.inband.package.analyzer_args import PackageAnalyzerArgs
266+
267+
# Mock Ubuntu system without ROCm packages
268+
ubuntu_packages = """gcc 11.4.0
269+
python3 3.10.12
270+
vim 8.2.3995"""
271+
272+
conn_mock.run_command.side_effect = [
273+
CommandArtifact(
274+
command="",
275+
exit_code=0,
276+
stdout=command_results["ubuntu_rel"],
277+
stderr="",
278+
),
279+
CommandArtifact(
280+
command="",
281+
exit_code=0,
282+
stdout=ubuntu_packages,
283+
stderr="",
284+
),
285+
]
286+
287+
args = PackageAnalyzerArgs(rocm_regex="rocm|hip|hsa")
288+
res, data = collector.collect_data(args)
289+
assert res.status == ExecutionStatus.OK
290+
# No ROCm packages found, so message should not mention them
291+
assert "rocm" not in res.message.lower() or res.message == ""
292+
assert data is not None
293+
assert len(data.version_info) == 3
294+
295+
296+
def test_filter_rocm_packages_method(collector):
297+
"""Test _filter_rocm_packages method directly."""
298+
packages = {
299+
"rocm-core": "5.7.0",
300+
"hip-runtime-amd": "5.7.0",
301+
"hsa-rocr": "1.9.0",
302+
"amdgpu-dkms": "6.3.6",
303+
"gcc": "11.4.0",
304+
"python3": "3.10.12",
305+
}
306+
307+
# Test with default-like pattern
308+
rocm_pattern = "rocm|hip|hsa|amdgpu"
309+
filtered = collector._filter_rocm_packages(packages, rocm_pattern)
310+
311+
assert len(filtered) == 4
312+
assert "rocm-core" in filtered
313+
assert "hip-runtime-amd" in filtered
314+
assert "hsa-rocr" in filtered
315+
assert "amdgpu-dkms" in filtered
316+
assert "gcc" not in filtered
317+
assert "python3" not in filtered
318+
319+
320+
def test_filter_rocm_packages_case_insensitive(collector):
321+
"""Test that ROCm package filtering is case-insensitive."""
322+
packages = {
323+
"ROCM-Core": "5.7.0",
324+
"HIP-Runtime-AMD": "5.7.0",
325+
"gcc": "11.4.0",
326+
}
327+
328+
rocm_pattern = "rocm|hip"
329+
filtered = collector._filter_rocm_packages(packages, rocm_pattern)
330+
331+
assert len(filtered) == 2
332+
assert "ROCM-Core" in filtered
333+
assert "HIP-Runtime-AMD" in filtered

0 commit comments

Comments
 (0)