Skip to content

Commit 9ffe712

Browse files
authored
platform-alteration-hugepages-config: added support for more hugepagesz units. (#3245)
- Added support for both uppercase and lowercase units and sizes in kilobytes and terabytes. - Added error handling. - Updated check description.
1 parent dfdff00 commit 9ffe712

File tree

5 files changed

+119
-28
lines changed

5 files changed

+119
-28
lines changed

CATALOG.md

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Depending on the workload type, not all tests are required to pass to satisfy be
77

88
## Test cases summary
99

10-
### Total test cases: 119
10+
### Total test cases: 120
1111

1212
### Total suites: 10
1313

@@ -22,7 +22,7 @@ Depending on the workload type, not all tests are required to pass to satisfy be
2222
|operator|12|[operator](#operator)|
2323
|performance|6|[performance](#performance)|
2424
|platform-alteration|14|[platform-alteration](#platform-alteration)|
25-
|preflight|18|[preflight](#preflight)|
25+
|preflight|19|[preflight](#preflight)|
2626

2727
### Extended specific tests only: 13
2828

@@ -36,11 +36,11 @@ Depending on the workload type, not all tests are required to pass to satisfy be
3636
|---|---|---|
3737
|8|1|
3838

39-
### Non-Telco specific tests only: 70
39+
### Non-Telco specific tests only: 71
4040

4141
|Mandatory|Optional|
4242
|---|---|---|
43-
|43|27|
43+
|43|28|
4444

4545
### Telco specific tests only: 27
4646

@@ -383,7 +383,7 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
383383
|---|---|
384384
|Unique ID|access-control-security-context|
385385
|Description|Checks the security context matches one of the 4 categories|
386-
|Suggested Remediation|Exception possible if a workload uses mlock(), mlockall(), shmctl(), mmap(); exception will be considered for DPDK applications. Must identify which container requires the capability and document why. If the container had the right configuration of the allowed category from the 4 approved list then the test will pass. The 4 categories are defined in Requirement ID 94118 in the [security context categories](#security-context-categories)|
386+
|Suggested Remediation|Exception possible if a workload uses mlock(), mlockall(), shmctl(), mmap(); exception will be considered for DPDK applications. Must identify which container requires the capability and document why. If the container had the right configuration of the allowed category from the 4 approved list then the test will pass. The 4 categories are defined in Requirement ID 94118 [here](#security-context-categories)|
387387
|Best Practice Reference|https://redhat-best-practices-for-k8s.github.io/guide/#k8s-best-practices-linux-capabilities|
388388
|Exception Process|no exception needed for optional/extended test|
389389
|Impact Statement|Incorrect security context configurations can weaken container isolation, enable privilege escalation, and create exploitable attack vectors.|
@@ -1639,8 +1639,8 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
16391639
|Property|Description|
16401640
|---|---|
16411641
|Unique ID|platform-alteration-hugepages-config|
1642-
|Description|Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet. First, the "worker" MachineConfig is polled, and the Hugepage settings are extracted. Next, the underlying Nodes are polled for configured HugePages through inspection of /proc/meminfo. The results are compared, and the test passes only if they are the same.|
1643-
|Suggested Remediation|HugePage settings should be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. This ensures that OpenShift is aware of the special MachineConfig requirements, and can provision your workload on a Node that is part of the corresponding MachineConfigSet. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet.|
1642+
|Description|Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are labeled as workers with the standard label "node-role.kubernetes.io/worker". First, the MachineConfig is inspected for hugepage settings in systemd units. If not, the MC's .spec.kernelArguments are inspected for hugepage settings. The sizes and page numbers are compared, and the test passes only if they are the same than then ones in node's /sys/kernel/mm/hugepages/hugepages-X folders.|
1643+
|Suggested Remediation|HugePage settings for worker nodes must be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings.|
16441644
|Best Practice Reference|https://redhat-best-practices-for-k8s.github.io/guide/#k8s-best-practices-huge-pages|
16451645
|Exception Process|No exceptions|
16461646
|Impact Statement|Manual hugepage configuration bypasses cluster management, can cause node instability, and creates configuration drift issues.|
@@ -1812,7 +1812,7 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
18121812
|---|---|
18131813
|Unique ID|preflight-BasedOnUbi|
18141814
|Description|Checking if the container's base image is based upon the Red Hat Universal Base Image (UBI)|
1815-
|Suggested Remediation|Change the FROM directive in your Dockerfile or Containerfile to FROM registry.access.redhat.com/ubi8/ubi|
1815+
|Suggested Remediation|Change the FROM directive in your Dockerfile or Containerfile, for the latest list of images and details refer to: https://catalog.redhat.com/software/base-images|
18161816
|Best Practice Reference|No Doc Link|
18171817
|Exception Process|There is no documented exception process for this.|
18181818
|Impact Statement|Non-UBI base images may lack security updates, enterprise support, and compliance certifications required for production use.|
@@ -1908,6 +1908,23 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
19081908
|Non-Telco|Optional|
19091909
|Telco|Optional|
19101910

1911+
#### preflight-HasNoProhibitedLabels
1912+
1913+
|Property|Description|
1914+
|---|---|
1915+
|Unique ID|preflight-HasNoProhibitedLabels|
1916+
|Description|Checking if the labels (name, vendor, maintainer) violate Red Hat trademark.|
1917+
|Suggested Remediation|Ensure the name, vendor, and maintainer label on your image do not violate the Red Hat trademark.|
1918+
|Best Practice Reference|No Doc Link|
1919+
|Exception Process|There is no documented exception process for this.|
1920+
|Impact Statement|Misuse of Red Hat trademarks in name, vendor, or maintainer labels creates legal and compliance risks that can block certification and publication.|
1921+
|Tags|common,preflight|
1922+
|**Scenario**|**Optional/Mandatory**|
1923+
|Extended|Optional|
1924+
|Far-Edge|Optional|
1925+
|Non-Telco|Optional|
1926+
|Telco|Optional|
1927+
19111928
#### preflight-HasNoProhibitedPackages
19121929

19131930
|Property|Description|
@@ -1947,8 +1964,8 @@ Test Cases are the specifications used to perform a meaningful test. Test cases
19471964
|Property|Description|
19481965
|---|---|
19491966
|Unique ID|preflight-HasRequiredLabel|
1950-
|Description|Checking if the required labels (name, vendor, version, release, summary, description, maintainer) are present in the container metadata and that they do not violate Red Hat trademark.|
1951-
|Suggested Remediation|Add the following labels to your Dockerfile or Containerfile: name, vendor, version, release, summary, description, maintainer and validate that they do not violate Red Hat trademark.|
1967+
|Description|Checking if the required labels (name, vendor, version, release, summary, description, maintainer) are present in the container metadata|
1968+
|Suggested Remediation|Add the following labels to your Dockerfile or Containerfile: name, vendor, version, release, summary, description, maintainer.|
19521969
|Best Practice Reference|No Doc Link|
19531970
|Exception Process|There is no documented exception process for this.|
19541971
|Impact Statement|Missing required labels prevent proper metadata management and can cause deployment and management issues.|

tests/identifiers/identifiers.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -748,7 +748,7 @@ func InitCatalog() map[claim.Identifier]claim.TestCaseDescription {
748748
TestHugepagesNotManuallyManipulated = AddCatalogEntry(
749749
"hugepages-config",
750750
common.PlatformAlterationTestKey,
751-
`Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet. First, the "worker" MachineConfig is polled, and the Hugepage settings are extracted. Next, the underlying Nodes are polled for configured HugePages through inspection of /proc/meminfo. The results are compared, and the test passes only if they are the same.`, //nolint:lll
751+
`Checks to see that HugePage settings have been configured through MachineConfig, and not manually on the underlying Node. This test case applies only to Nodes that are labeled as workers with the standard label "node-role.kubernetes.io/worker". First, the MachineConfig is inspected for hugepage settings in systemd units. If not, the MC's .spec.kernelArguments are inspected for hugepage settings. The sizes and page numbers are compared, and the test passes only if they are the same than then ones in node's /sys/kernel/mm/hugepages/hugepages-X folders.`, //nolint:lll
752752
HugepagesNotManuallyManipulatedRemediation,
753753
NoExceptions,
754754
TestHugepagesNotManuallyManipulatedDocLink,

tests/identifiers/remediation.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ const (
5959

6060
PodHostPIDRemediation = `Set the spec.HostPid parameter to false in the pod configuration. Workloads should avoid accessing host resources - spec.HostPid should be false.`
6161

62-
HugepagesNotManuallyManipulatedRemediation = `HugePage settings should be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. This ensures that OpenShift is aware of the special MachineConfig requirements, and can provision your workload on a Node that is part of the corresponding MachineConfigSet. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings. This test case applies only to Nodes that are configured with the "worker" MachineConfigSet.`
62+
HugepagesNotManuallyManipulatedRemediation = `HugePage settings for worker nodes must be configured either directly through the MachineConfigOperator or indirectly using the PerformanceAddonOperator. Avoid making changes directly to an underlying Node, and let OpenShift handle the heavy lifting of configuring advanced settings.`
6363

6464
ICMPv4ConnectivityRemediation = `Ensure that the workload is able to communicate via the Default OpenShift network. In some rare cases, workloads may require routing table changes in order to communicate over the Default network. To exclude a particular pod from ICMPv4 connectivity tests, add the redhat-best-practices-for-k8s.com/skip_connectivity_tests label to it. The label value is trivial, only its presence.`
6565

tests/platform/hugepages/hugepages.go

Lines changed: 62 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
"sort"
88
"strconv"
99
"strings"
10+
"unicode"
1011

1112
"github.com/redhat-best-practices-for-k8s/certsuite/internal/clientsholder"
1213
"github.com/redhat-best-practices-for-k8s/certsuite/internal/log"
@@ -63,17 +64,53 @@ type Tester struct {
6364
mcSystemdHugepagesByNuma hugepagesByNuma
6465
}
6566

66-
func hugepageSizeToInt(s string) int {
67-
num, _ := strconv.Atoi(s[:len(s)-1])
68-
unit := s[len(s)-1]
67+
// hugepageSizeToInt converts a hugepage size string to an integer.
68+
// The output is always in kilobytes.
69+
// It supports the following units: K, M, G, and T.
70+
// If no unit provided, it returns the size as is.
71+
func hugepageSizeToInt(s string) (int, error) {
72+
// Remove any trailing 'B' or 'b' if present, as in "2048kB" or "1MB"
73+
s = strings.TrimRight(s, "Bb")
74+
lastChar := s[len(s)-1]
75+
76+
var sizeStr string
77+
78+
isLastCharDigit := unicode.IsDigit(rune(lastChar))
79+
if isLastCharDigit {
80+
sizeStr = s
81+
} else {
82+
// Get the number without the unit suffix letter.
83+
sizeStr = s[:len(s)-1]
84+
}
85+
86+
size, err := strconv.Atoi(sizeStr)
87+
if err != nil {
88+
return 0, fmt.Errorf("failed to parse int %s, err: %w", sizeStr, err)
89+
}
90+
91+
// If no unit provided, return the size in kilobytes.
92+
if isLastCharDigit {
93+
if size%1024 != 0 {
94+
return 0, fmt.Errorf("parsed size %d is not a multiple of 1024", size)
95+
}
96+
return size / 1024, nil
97+
}
98+
99+
// Get the unit (last character) and the numeric part
100+
unit := strings.ToUpper(string(lastChar))
101+
69102
switch unit {
70-
case 'M':
71-
num *= 1024
72-
case 'G':
73-
num *= 1024 * 1024
103+
case "K":
104+
return size, nil
105+
case "M":
106+
return size * 1024, nil
107+
case "G":
108+
return size * 1024 * 1024, nil
109+
case "T":
110+
return size * 1024 * 1024 * 1024, nil
74111
}
75112

76-
return num
113+
return 0, fmt.Errorf("unsupported hugepage size unit: %s", s)
77114
}
78115

79116
func NewTester(node *provider.Node, probePod *corev1.Pod, commander clientsholder.Command) (*Tester, error) {
@@ -173,7 +210,10 @@ func (tester *Tester) TestNodeHugepagesWithMcSystemd() (bool, error) {
173210
// The total count of hugepages of the size defined in the kernelArguments must match the kernArgs' hugepages value.
174211
// For other sizes, the sum should be 0.
175212
func (tester *Tester) TestNodeHugepagesWithKernelArgs() (bool, error) {
176-
kernelArgsHpCountBySize, _ := getMcHugepagesFromMcKernelArguments(&tester.node.Mc)
213+
kernelArgsHpCountBySize, _, err := getMcHugepagesFromMcKernelArguments(&tester.node.Mc)
214+
if err != nil {
215+
return false, fmt.Errorf("failed to get kernelArguments hugepages config, err: %w", err)
216+
}
177217

178218
// First, check that all the actual hp sizes across all numas exist in the kernelArguments.
179219
for nodeNumaIdx, nodeCountBySize := range tester.nodeHugepagesByNuma {
@@ -286,15 +326,15 @@ func getMcSystemdUnitsHugepagesConfig(mc *provider.MachineConfig) (hugepages hug
286326

287327
func logMcKernelArgumentsHugepages(hugepagesPerSize map[int]int, defhugepagesz int) {
288328
var sb strings.Builder
289-
sb.WriteString(fmt.Sprintf("MC KernelArguments hugepages config: default_hugepagesz=%d-kB", defhugepagesz))
329+
sb.WriteString(fmt.Sprintf("MC KernelArguments hugepages config: default_hugepagesz=%dkB", defhugepagesz))
290330
for size, count := range hugepagesPerSize {
291331
sb.WriteString(fmt.Sprintf(", size=%dkB - count=%d", size, count))
292332
}
293333
log.Info("%s", sb.String())
294334
}
295335

296336
// getMcHugepagesFromMcKernelArguments gets the hugepages params from machineconfig's kernelArguments
297-
func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesPerSize map[int]int, defhugepagesz int) {
337+
func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesPerSize map[int]int, defhugepagesz int, err error) {
298338
defhugepagesz = RhelDefaultHugepagesz
299339
hugepagesPerSize = map[int]int{}
300340

@@ -319,13 +359,21 @@ func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesP
319359
}
320360

321361
if key == HugepageszParam && value != "" {
322-
hugepagesz = hugepageSizeToInt(value)
362+
var err error
363+
hugepagesz, err = hugepageSizeToInt(value)
364+
if err != nil {
365+
return map[int]int{}, defhugepagesz, fmt.Errorf("failed to convert hugepage size (%s) to int, err: %w", value, err)
366+
}
323367
// Create new map entry for this size
324368
hugepagesPerSize[hugepagesz] = 0
325369
}
326370

327371
if key == DefaultHugepagesz && value != "" {
328-
defhugepagesz = hugepageSizeToInt(value)
372+
var err error
373+
defhugepagesz, err = hugepageSizeToInt(value)
374+
if err != nil {
375+
return map[int]int{}, defhugepagesz, fmt.Errorf("failed to convert hugepage size (%s) to int, err: %w", value, err)
376+
}
329377
// In case only default_hugepagesz and hugepages values are provided. The actual value should be
330378
// parsed next and this default value overwritten.
331379
hugepagesPerSize[defhugepagesz] = RhelDefaultHugepages
@@ -339,5 +387,5 @@ func getMcHugepagesFromMcKernelArguments(mc *provider.MachineConfig) (hugepagesP
339387
}
340388

341389
logMcKernelArgumentsHugepages(hugepagesPerSize, defhugepagesz)
342-
return hugepagesPerSize, defhugepagesz
390+
return hugepagesPerSize, defhugepagesz, nil
343391
}

tests/platform/hugepages/hugepages_test.go

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,8 +132,8 @@ func Test_hugepagesFromKernelArgsFunc(t *testing.T) {
132132
mc.Spec.KernelArguments = tc.kernelArgs
133133

134134
// Call the function under test.
135-
hugepagesPerSize, defSize := getMcHugepagesFromMcKernelArguments(&mc)
136-
135+
hugepagesPerSize, defSize, err := getMcHugepagesFromMcKernelArguments(&mc)
136+
assert.NoError(t, err)
137137
assert.Equal(t, defSize, tc.expectedHugepagesDefSize)
138138
assert.Equal(t, hugepagesPerSize, tc.expectedHugepagesPerSize)
139139
}
@@ -584,6 +584,14 @@ func TestPositiveMachineConfigKernelArgsHugepages(t *testing.T) {
584584
/host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:16`,
585585
mcKernelArgs: []string{"hugepagesz=1G", "hugepages=16", "hugepagesz=2M", "hugepages=256"},
586586
},
587+
// Node has two numas and one size in kB units.
588+
{
589+
nodeHugePagesCmdOutput: `/host/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages count:0
590+
/host/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages count:15
591+
/host/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages count:0
592+
/host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:15`,
593+
mcKernelArgs: []string{"hugepagesz=1048576kB", "hugepages=30"},
594+
},
587595
}
588596

589597
// instantiate the fakeClient so we can mock the output from each command to get the node's hugepages files.
@@ -681,6 +689,24 @@ func TestNegativeMachineConfigKernelArgsHugepages(t *testing.T) {
681689
mcKernelArgs: []string{"hugepagesz=1G", "hugepages=8", "hugepagesz=2M", "hugepages=256"},
682690
expectedErrorMsg: "failed to compare machineConfig KernelArguments with node ones, err: total hugepages of size 1048576 will not match (node count=16, expected=8)",
683691
},
692+
// Node has two numas and one size in kB units but total pages (35) will not match kernelArgs (30).
693+
{
694+
nodeHugePagesCmdOutput: `/host/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages count:0
695+
/host/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages count:15
696+
/host/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages count:0
697+
/host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:20`,
698+
mcKernelArgs: []string{"hugepagesz=1048576kB", "hugepages=30"},
699+
expectedErrorMsg: "failed to compare machineConfig KernelArguments with node ones, err: total hugepages of size 1048576 will not match (node count=35, expected=30)",
700+
},
701+
// Invalid kernelArgs size: not a multiple of 1024.
702+
{
703+
nodeHugePagesCmdOutput: `/host/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages count:0
704+
/host/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages count:15
705+
/host/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages count:0
706+
/host/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages count:20`,
707+
mcKernelArgs: []string{"hugepagesz=1045", "hugepages=30"},
708+
expectedErrorMsg: "failed to compare machineConfig KernelArguments with node ones, err: failed to get kernelArguments hugepages config, err: failed to convert hugepage size (1045) to int, err: parsed size 1045 is not a multiple of 1024",
709+
},
684710
}
685711

686712
// instantiate the fakeClient so we can mock the output from each command to get the node's hugepages files.

0 commit comments

Comments
 (0)