platform-alteration-hugepages-config: added support for more hugepagesz units. (#3245)

greyerof · web-flow · commit 9ffe712e6401 · 2025-09-25T16:50:17.000+02:00
- Added support for both uppercase and lowercase units and sizes in
kilobytes and terabytes.
- Added error handling.
- Updated check description.
diff --git a/CATALOG.md b/CATALOG.md
@@ -7,7 +7,7 @@ Depending on the workload type, not all tests are required to pass to satisfy be
 
 ## Test cases summary
 
-### Total test cases: 119
+### Total test cases: 120
 
 ### Total suites: 10
 
@@ -22,7 +22,7 @@ Depending on the workload type, not all tests are required to pass to satisfy be
 |operator|12|[operator](#operator)|
 |performance|6|[performance](#performance)|
 |platform-alteration|14|[platform-alteration](#platform-alteration)|
-|preflight|18|[preflight](#preflight)|
+|preflight|19|[preflight](#preflight)|
 
 ### Extended specific tests only: 13
 
@@ -36,11 +36,11 @@ Depending on the workload type, not all tests are required to pass to satisfy be
 |---|---|---|
 |8|1|
 
-### Non-Telco specific tests only: 70
+### Non-Telco specific tests only: 71
 
 |Mandatory|Optional|
 |---|---|---|
-|43|27|
+|43|28|
 
 ### Telco specific tests only: 27
 
@@ -383,7 +383,7 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
 |---|---|
 |Unique ID|access-control-security-context|
 |Description|Checks the security context matches one of the 4 categories|
-|Suggested Remediation|Exception possible if a workload uses mlock(), mlockall(), shmctl(), mmap(); exception will be considered for DPDK applications. Must identify which container requires the capability and document why. If the container had the right configuration of the allowed category from the 4 approved list then the test will pass. The 4 categories are defined in Requirement ID 94118 in the [security context categories](#security-context-categories)|
+|Suggested Remediation|Exception possible if a workload uses mlock(), mlockall(), shmctl(), mmap(); exception will be considered for DPDK applications. Must identify which container requires the capability and document why. If the container had the right configuration of the allowed category from the 4 approved list then the test will pass. The 4 categories are defined in Requirement ID 94118 [here](#security-context-categories)|
 |Best Practice Reference|https://redhat-best-practices-for-k8s.github.io/guide/#k8s-best-practices-linux-capabilities|
 |Exception Process|no exception needed for optional/extended test|
 |Impact Statement|Incorrect security context configurations can weaken container isolation, enable privilege escalation, and create exploitable attack vectors.|
@@ -1639,8 +1639,8 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
 |Property|Description|
 |---|---|
 |Unique ID|platform-alteration-hugepages-config|
-|Description|Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet. First, the "worker" MachineConfig is polled, and the Hugepage settings are extracted. Next, the underlying Nodes are polled for configured HugePages through inspection of /proc/meminfo. The results are compared, and the test passes only if they are the same.|
-|Suggested Remediation|HugePage settings should be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. This ensures that OpenShift is aware of the special MachineConfig requirements, and can provision your workload on a Node that is part of the corresponding MachineConfigSet. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet.|
+|Description|Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are labeled as workers with the standard label "node-role.kubernetes.io/worker". First, the MachineConfig is inspected for hugepage settings in systemd units. If not, the MC's .spec.kernelArguments are inspected for hugepage settings. The sizes and page numbers are compared, and the test passes only if they are the same than then ones in node's /sys/kernel/mm/hugepages/hugepages-X folders.|
+|Suggested Remediation|HugePage settings for worker nodes must be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings.|
 |Best Practice Reference|https://redhat-best-practices-for-k8s.github.io/guide/#k8s-best-practices-huge-pages|
 |Exception Process|No exceptions|
 |Impact Statement|Manual hugepage configuration bypasses cluster management, can cause node instability, and creates configuration drift issues.|
@@ -1812,7 +1812,7 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
 |---|---|
 |Unique ID|preflight-BasedOnUbi|
 |Description|Checking if the container's base image is based upon the Red Hat Universal Base Image (UBI)|
-|Suggested Remediation|Change the FROM directive in your Dockerfile or Containerfile to FROM registry.access.redhat.com/ubi8/ubi|
+|Suggested Remediation|Change the FROM directive in your Dockerfile or Containerfile, for the latest list of images and details refer to: https://catalog.redhat.com/software/base-images|
 |Best Practice Reference|No Doc Link|
 |Exception Process|There is no documented exception process for this.|
 |Impact Statement|Non-UBI base images may lack security updates, enterprise support, and compliance certifications required for production use.|
@@ -1908,6 +1908,23 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
 |Non-Telco|Optional|
 |Telco|Optional|
 
+#### preflight-HasNoProhibitedLabels
+
+|Property|Description|
+|---|---|
+|Unique ID|preflight-HasNoProhibitedLabels|
+|Description|Checking if the labels (name, vendor, maintainer) violate Red Hat trademark.|
+|Suggested Remediation|Ensure the name, vendor, and maintainer label on your image do not violate the Red Hat trademark.|
+|Best Practice Reference|No Doc Link|
+|Exception Process|There is no documented exception process for this.|
+|Impact Statement|Misuse of Red Hat trademarks in name, vendor, or maintainer labels creates legal and compliance risks that can block certification and publication.|
+|Tags|common,preflight|
+|**Scenario**|**Optional/Mandatory**|
+|Extended|Optional|
+|Far-Edge|Optional|
+|Non-Telco|Optional|
+|Telco|Optional|
+
 #### preflight-HasNoProhibitedPackages
 
 |Property|Description|
@@ -1947,8 +1964,8 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
 |Property|Description|
 |---|---|
 |Unique ID|preflight-HasRequiredLabel|
-|Description|Checking if the required labels (name, vendor, version, release, summary, description, maintainer) are present in the container metadata and that they do not violate Red Hat trademark.|
-|Suggested Remediation|Add the following labels to your Dockerfile or Containerfile: name, vendor, version, release, summary, description, maintainer and validate that they do not violate Red Hat trademark.|
+|Description|Checking if the required labels (name, vendor, version, release, summary, description, maintainer) are present in the container metadata|
+|Suggested Remediation|Add the following labels to your Dockerfile or Containerfile: name, vendor, version, release, summary, description, maintainer.|
 |Best Practice Reference|No Doc Link|
 |Exception Process|There is no documented exception process for this.|
 |Impact Statement|Missing required labels prevent proper metadata management and can cause deployment and management issues.|
diff --git a/tests/identifiers/identifiers.go b/tests/identifiers/identifiers.go
@@ -748,7 +748,7 @@ func InitCatalog() map[claim.Identifier]claim.TestCaseDescription {
 	TestHugepagesNotManuallyManipulated = AddCatalogEntry(
 		"hugepages-config",
 		common.PlatformAlterationTestKey,
-		`Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet. First, the "worker" MachineConfig is polled, and the Hugepage settings are extracted. Next, the underlying Nodes are polled for configured HugePages through inspection of /proc/meminfo. The results are compared, and the test passes only if they are the same.`, //nolint:lll
+		`Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are labeled as workers with the standard label "node-role.kubernetes.io/worker". First, the MachineConfig is inspected for hugepage settings in systemd units. If not, the MC's .spec.kernelArguments are inspected for hugepage settings. The sizes and page numbers are compared, and the test passes only if they are the same than then ones in node's /sys/kernel/mm/hugepages/hugepages-X folders.`, //nolint:lll
 		HugepagesNotManuallyManipulatedRemediation,
 		NoExceptions,
 		TestHugepagesNotManuallyManipulatedDocLink,
diff --git a/tests/identifiers/remediation.go b/tests/identifiers/remediation.go
@@ -59,7 +59,7 @@ const (
 
 	PodHostPIDRemediation = `Set the spec.HostPid parameter to false in the pod configuration. Workloads should avoid accessing host resources - spec.HostPid should be false.`
 
-	HugepagesNotManuallyManipulatedRemediation = `HugePage settings should be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. This ensures that OpenShift is aware of the special MachineConfig requirements, and can provision your workload on a Node that is part of the corresponding MachineConfigSet. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet.`
+	HugepagesNotManuallyManipulatedRemediation = `HugePage settings for worker nodes must be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings.`
 
 	ICMPv4ConnectivityRemediation = `Ensure that the workload is able to communicate via the Default OpenShift network. In some rare cases, workloads may require routing table changes in order to communicate over the Default network. To exclude a particular pod from ICMPv4 connectivity tests, add the redhat-best-practices-for-k8s.com/skip_connectivity_tests label to it. The label value is trivial, only its presence.`
 
diff --git a/tests/platform/hugepages/hugepages.go b/tests/platform/hugepages/hugepages.go
@@ -7,6 +7,7 @@ import (
 	"sort"
 	"strconv"
 	"strings"
+	"unicode"
 
 	"github.com/redhat-best-practices-for-k8s/certsuite/internal/clientsholder"
 	"github.com/redhat-best-practices-for-k8s/certsuite/internal/log"
@@ -63,17 +64,53 @@ type Tester struct {
 	mcSystemdHugepagesByNuma hugepagesByNuma
 }
 
-func hugepageSizeToInt(s string) int {
-	num, _ := strconv.Atoi(s[:len(s)-1])
-	unit := s[len(s)-1]
+// hugepageSizeToInt converts a hugepage size string to an integer.
+// The output is always in kilobytes.
+// It supports the following units: K, M, G, and T.
+// If no unit provided, it returns the size as is.
+func hugepageSizeToInt(s string) (int, error) {
+	// Remove any trailing 'B' or 'b' if present, as in "2048kB" or "1MB"
+	s = strings.TrimRight(s, "Bb")
+	lastChar := s[len(s)-1]
+
+	var sizeStr string
+
+	isLastCharDigit := unicode.IsDigit(rune(lastChar))
+	if isLastCharDigit {
+		sizeStr = s
+	} else {
+		// Get the number without the unit suffix letter.
+		sizeStr = s[:len(s)-1]
+	}
+
+	size, err := strconv.Atoi(sizeStr)
+	if err != nil {
+		return 0, fmt.Errorf("failed to parse int %s, err: %w", sizeStr, err)
+	}
+
+	// If no unit provided, return the size in kilobytes.
+	if isLastCharDigit {
+		if size%1024 != 0 {
+			return 0, fmt.Errorf("parsed size %d is not a multiple of 1024", size)
+		}
+		return size / 1024, nil
+	}
+
+	// Get the unit (last character) and the numeric part
+	unit := strings.ToUpper(string(lastChar))
+
 	switch unit {
-	case 'M':
-		num *= 1024
-	case 'G':
-		num *= 1024 * 1024
+	case "K":
+		return size, nil
+	case "M":
+		return size * 1024, nil
+	case "G":
+		return size * 1024 * 1024, nil
+	case "T":
+		return size * 1024 * 1024 * 1024, nil
 	}
 
-	return num
+	return 0, fmt.Errorf("unsupported hugepage size unit: %s", s)
 }
 
 func NewTester(node *provider.Node, probePod *corev1.Pod, commander clientsholder.Command) (*Tester, error) {
@@ -173,7 +210,10 @@ func (tester *Tester) TestNodeHugepagesWithMcSystemd() (bool, error) {
 // The total count of hugepages of the size defined in the kernelArguments must match the kernArgs' hugepages value.
 // For other sizes, the sum should be 0.
 func (tester *Tester) TestNodeHugepagesWithKernelArgs() (bool, error) {
-	kernelArgsHpCountBySize, _ := getMcHugepagesFromMcKernelArguments(&tester.node.Mc)
+	kernelArgsHpCountBySize, _, err := getMcHugepagesFromMcKernelArguments(&tester.node.Mc)
+	if err != nil {
+		return false, fmt.Errorf("failed to get kernelArguments hugepages config, err: %w", err)
+	}
 
 	// First, check that all the actual hp sizes across all numas exist in the kernelArguments.
 	for nodeNumaIdx, nodeCountBySize := range tester.nodeHugepagesByNuma {
@@ -286,15 +326,15 @@ func getMcSystemdUnitsHugepagesConfig(mc *provider.MachineConfig) (hugepages hug
 
 func logMcKernelArgumentsHugepages(hugepagesPerSize map[int]int, defhugepagesz int) {
 	var sb strings.Builder
-	sb.WriteString(fmt.Sprintf("MC KernelArguments hugepages config: default_hugepagesz=%d-kB", defhugepagesz))
+	sb.WriteString(fmt.Sprintf("MC KernelArguments hugepages config: default_hugepagesz=%dkB", defhugepagesz))
 	for size, count := range hugepagesPerSize {
 		sb.WriteString(fmt.Sprintf(", size=%dkB - count=%d", size, count))
 	}
 	log.Info("%s", sb.String())
 }
 
 // getMcHugepagesFromMcKernelArguments gets the hugepages params from machineconfig's kernelArguments
-func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesPerSize map[int]int, defhugepagesz int) {
+func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesPerSize map[int]int, defhugepagesz int, err error) {
 	defhugepagesz = RhelDefaultHugepagesz
 	hugepagesPerSize = map[int]int{}
 
@@ -319,13 +359,21 @@ func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesP
 		}
 
 		if key == HugepageszParam && value != "" {
-			hugepagesz = hugepageSizeToInt(value)
+			var err error
+			hugepagesz, err = hugepageSizeToInt(value)
+			if err != nil {
+				return map[int]int{}, defhugepagesz, fmt.Errorf("failed to convert hugepage size (%s) to int, err: %w", value, err)
+			}
 			// Create new map entry for this size
 			hugepagesPerSize[hugepagesz] = 0
 		}
 
 		if key == DefaultHugepagesz && value != "" {
-			defhugepagesz = hugepageSizeToInt(value)
+			var err error
+			defhugepagesz, err = hugepageSizeToInt(value)
+			if err != nil {
+				return map[int]int{}, defhugepagesz, fmt.Errorf("failed to convert hugepage size (%s) to int, err: %w", value, err)
+			}
 			// In case only default_hugepagesz and hugepages values are provided. The actual value should be
 			// parsed next and this default value overwritten.
 			hugepagesPerSize[defhugepagesz] = RhelDefaultHugepages
@@ -339,5 +387,5 @@ func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesP
 	}
 
 	logMcKernelArgumentsHugepages(hugepagesPerSize, defhugepagesz)
-	return hugepagesPerSize, defhugepagesz
+	return hugepagesPerSize, defhugepagesz, nil
 }
diff --git a/tests/platform/hugepages/hugepages_test.go b/tests/platform/hugepages/hugepages_test.go
@@ -132,8 +132,8 @@ func Test_hugepagesFromKernelArgsFunc(t *testing.T) {
 		mc.Spec.KernelArguments = tc.kernelArgs
 
 		// Call the function under test.
-		hugepagesPerSize, defSize := getMcHugepagesFromMcKernelArguments(&mc)
-
+		hugepagesPerSize, defSize, err := getMcHugepagesFromMcKernelArguments(&mc)
+		assert.NoError(t, err)
 		assert.Equal(t, defSize, tc.expectedHugepagesDefSize)
 		assert.Equal(t, hugepagesPerSize, tc.expectedHugepagesPerSize)
 	}
@@ -584,6 +584,14 @@ func TestPositiveMachineConfigKernelArgsHugepages(t *testing.T) {
 									 /host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:16`,
 			mcKernelArgs: []string{"hugepagesz=1G", "hugepages=16", "hugepagesz=2M", "hugepages=256"},
 		},
+		// Node has two numas and one size in kB units.
+		{
+			nodeHugePagesCmdOutput: `/host/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages count:0
+									 /host/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages count:15
+									 /host/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages count:0
+									 /host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:15`,
+			mcKernelArgs: []string{"hugepagesz=1048576kB", "hugepages=30"},
+		},
 	}
 
 	// instantiate the fakeClient so we can mock the output from each command to get the node's hugepages files.
@@ -681,6 +689,24 @@ func TestNegativeMachineConfigKernelArgsHugepages(t *testing.T) {
 			mcKernelArgs:     []string{"hugepagesz=1G", "hugepages=8", "hugepagesz=2M", "hugepages=256"},
 			expectedErrorMsg: "failed to compare machineConfig KernelArguments with node ones, err: total hugepages of size 1048576 will not match (node count=16, expected=8)",
 		},
+		// Node has two numas and one size in kB units but total pages (35) will not match kernelArgs (30).
+		{
+			nodeHugePagesCmdOutput: `/host/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages count:0
+											 /host/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages count:15
+											 /host/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages count:0
+											 /host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:20`,
+			mcKernelArgs:     []string{"hugepagesz=1048576kB", "hugepages=30"},
+			expectedErrorMsg: "failed to compare machineConfig KernelArguments with node ones, err: total hugepages of size 1048576 will not match (node count=35, expected=30)",
+		},
+		// Invalid kernelArgs size: not a multiple of 1024.
+		{
+			nodeHugePagesCmdOutput: `/host/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages count:0
+											 /host/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages count:15
+											 /host/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages count:0
+											 /host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:20`,
+			mcKernelArgs:     []string{"hugepagesz=1045", "hugepages=30"},
+			expectedErrorMsg: "failed to compare machineConfig KernelArguments with node ones, err: failed to get kernelArguments hugepages config, err: failed to convert hugepage size (1045) to int, err: parsed size 1045 is not a multiple of 1024",
+		},
 	}
 
 	// instantiate the fakeClient so we can mock the output from each command to get the node's hugepages files.