Conversation
WalkthroughThis pull request refactors the SR-IOV network configuration handling by introducing a new Changes
Sequence Diagram(s)sequenceDiagram
participant RS as runServiceCmd
participant SC as ServiceConfig
participant PH as HostHelpers/Plugin
RS->>SC: newServiceConfig()
SC->>SC: readConf()
SC->>SC: initSupportedNics()
SC->>SC: phasePre()
SC->>PH: callPlugin("phase")
PH-->>SC: Plugin response
SC->>SC: phasePost()
SC->>RS: Complete execution
sequenceDiagram
participant DM as DaemonManager
participant HM as HostManager
participant SD as Systemd
DM->>HM: Init()
HM->>SD: CleanSriovFilesFromHost(isOpenShift)
DM->>HM: checkSystemdStatus()
HM->>SD: ReadSriovResult()
SD-->>HM: Return result
DM->>HM: apply(...)
HM->>SD: Write/Remove operations
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (11)
🚧 Files skipped from review as they are similar to previous changes (7)
⏰ Context from checks skipped due to timeout of 90000ms (7)
🔇 Additional comments (34)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
Pull Request Test Coverage Report for Build 13677346477Details
💛 - Coveralls |
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
pkg/host/internal/systemd/systemd.go (1)
53-135: 🛠️ Refactor suggestionWriteConfFile method
While creating directories on lines 70-72, consider using explicit permissions instead ofos.ModeDirfor clarity and correctness:- err = os.Mkdir(utils.GetHostExtensionPath(consts.SriovConfBasePath), os.ModeDir) + err = os.MkdirAll(utils.GetHostExtensionPath(consts.SriovConfBasePath), 0o755)
🧹 Nitpick comments (40)
controllers/sriovnetworknodepolicy_controller.go (1)
229-229: Simplified ResourceList empty checkThe condition for checking if
ResourceListis empty has been simplified by removing the nil check. This change assumes thatResourceListis always initialized before reaching this code path.While the simplification is cleaner, make sure that
data.ResourceListcannot be nil to avoid potential nil pointer panics. Consider the following safer pattern:-if len(data.ResourceList) == 0 { +if data == nil || len(data.ResourceList) == 0 {.github/workflows/test.yml (1)
55-55: Fixed typo in step nameCorrected the spelling in the step name from "opensfhit" to "openshift".
pkg/utils/cluster.go (1)
130-133: Improve object modification approach.The code now directly modifies the original object (
obj) rather than working on a copy (newObj). While this change is more efficient as it avoids an extra deep copy, it's important to ensure all callers are aware the original object is modified.This approach is more memory-efficient, but consider adding a comment indicating that the function modifies the passed object, to make this behavior explicit to callers.
pkg/daemon/daemon_suite_test.go (3)
64-64: Consider extracting admission plugin configuration to a constant.The disabled admission plugins are hardcoded directly in the API server configuration.
Consider extracting these plugin names to constants for better maintainability, especially if they might be used elsewhere in tests.
-testEnv.ControlPlane.GetAPIServer().Configure().Set("disable-admission-plugins", "MutatingAdmissionWebhook", "ValidatingAdmissionWebhook") +const disabledAdmissionPlugins = "MutatingAdmissionWebhook,ValidatingAdmissionWebhook" +testEnv.ControlPlane.GetAPIServer().Configure().Set("disable-admission-plugins", disabledAdmissionPlugins)
111-120: Consider extracting infrastructure setup to a helper function.The code directly creates an
Infrastructureobject with hardcoded values.Consider extracting this setup to a helper function for better reusability, especially if similar infrastructure setup might be needed in other tests:
func createTestInfrastructure(client client.Client) error { infra := &openshiftconfigv1.Infrastructure{ ObjectMeta: metav1.ObjectMeta{ Name: "cluster", }, Spec: openshiftconfigv1.InfrastructureSpec{}, Status: openshiftconfigv1.InfrastructureStatus{ ControlPlaneTopology: openshiftconfigv1.HighlyAvailableTopologyMode, }, } return client.Create(context.Background(), infra) }
126-129: Consider handling errors in teardown more gracefully.The test uses Eventually to handle the potential delay in stopping the test environment, but doesn't handle the error case beyond failing the test.
Consider adding additional logging or cleanup steps if the environment stop fails. This would make diagnosing test teardown issues easier.
pkg/daemon/config.go (1)
28-58: Avoid reliance on global variables.Updating global variables (
vars.DisableDrainandvars.FeatureGate) from the controller may lead to unforeseen side effects or race conditions. Consider a refactor to pass these values through function parameters or an injected configuration object for better testability and maintainability.hack/virtual-cluster-redeploy.sh (1)
31-44: Consolidate duplicate environment variables.Many of these environment variables (e.g.,
ADMISSION_CONTROLLERS_ENABLED,SKIP_VAR_SET,OPERATOR_LEADER_ELECTION_ENABLE) are set again under lines 65–72 below. Consider refactoring to avoid duplication and maintain consistent values across the script.pkg/consts/constants.go (1)
17-19: Consider externalizing requeue intervals.
These constants for requeue timings look correct, but consider allowing them to be configurable externally (e.g., via environment variables) to simplify fine-tuning.pkg/daemon/config_test.go (2)
1-21: Improve negative testing coverage.
The import statements and global variables are fine. It might be helpful to introduce negative tests or boundary checks (e.g., invalid config values) to ensure more robust coverage.
64-100: Validate log level boundary conditions.
These tests confirm the runtime log level updates. Adding tests for invalid or out-of-range log levels (if applicable) would further strengthen coverage.pkg/host/types/interfaces.go (1)
210-220: Systemd interface methods look comprehensive.
This interface introduces a robust set of methods for SR-IOV file operations. Pay attention to concurrency and file locking if multiple processes can read/write simultaneously. Provide tests covering partial failures or missing files to ensure resilience.pkg/daemon/status.go (2)
98-126: Check discover logic for platform differences.
updateStatusFromHosthas separate logic forVirtualOpenStackvs. other platforms. Consider edge cases like unrecognized platform types or hybrid scenarios. This function might benefit from logging or error messages if unexpected platforms appear.
128-142: Event recording is clear and robust.
Recording the status transition event keeps auditing simpler. If other transitions occur simultaneously (e.g., drained → rebooting), ensure the event aggregator is not overly spammed.cmd/sriov-network-config-daemon/service_test.go (2)
37-66: Sensible test helper for SR-IOV config.
The newgetTestSriovInterfaceConfigfunction succinctly returns a test configuration. Make sure to extend coverage if additional interface properties become relevant.
69-69: Potential clarity improvement.
testSriovSupportedNicIDslumps together multiple fields in a single string. Consider structuring them in a clearer format, e.g.,vendorID,deviceID,subsystemIDin separate variables or a small struct if that ever becomes complex.cmd/sriov-network-config-daemon/start.go (3)
117-146: Global configuration function.
configGlobalVariablescentralizes environment handling. Consider adding validation or logging for config values that affect system behavior (e.g., concurrency flags).
194-199: Log level initialization.
initLogLeveladjusts logging at runtime. Confirm logging does not revert unexpectedly if config changes post-initialization.
201-358: Refactored Daemon startup logic.
The revised flow (creating clients, customizing manager cache, feature gates, log level, etc.) is well-structured. Consider adding more granular error messages for each init step to aid debugging.pkg/daemon/daemon_test.go (2)
51-55: Constant definitions for wait times.
waitTimeandretryTimeare helpful for controlling test durations. Watch out for excessively long timeouts that might slow CI/CD.
270-277: Helper function for annotations patching.
patchAnnotationis well-implemented, but consider a more direct approach like a single method to handle both reading and writing annotations if needed frequently.pkg/daemon/daemon.go (13)
43-44: Add doc comment for usage
Consider clarifying howdisabledPluginsare used to skip plugin application, for clearer maintainability.
58-68: Validation of input parameters
Consider validating parameters likeclient,hostHelpers, orplatformHelperto prevent potential nil pointer dereferences.
74-75: Check error handling
Ensure theerrvariable is effectively leveraged and consistently checked throughout the method.
78-82: Excessive logging
Using log level 0 for normal operation can be too verbose; consider a higher log level if this is routine.
85-87: Consider logging success
After cleaning SR-IOV files, adding a success log can help confirm the operation completed as expected.
90-90: Refine log message
"Run(): daemon running in systemd mode" might be more consistent if it references the actual method name, e.g. "Init():".
94-94: Better error message context
Include node identification or environment details in this error log if possible for easier troubleshooting.
140-275: Ensure correct error handling and requeue logic
This method is fairly large. Splitting it into smaller, focused functions might improve readability. Also, confirm partial successes do not leave the system in a broken state.
341-430: Break down plugin application steps
This method covers many steps. Consider splitting logic into smaller methods, improving maintainability and testability.
473-507: File write and cleanup
Removing the old result file unconditionally can cause issues if the new config is never fully applied. Consider a backup or a rollback strategy when config fails.
599-624: Validate external side effects
Rebooting is disruptive. Double-check logs, caches, or pending writes are synced before triggering.
644-658: Simplify conditions
Multiple annotation checks could be combined or refactored for readability.
660-683: Enhanced annotation logs
Logging the annotation value in case of errors would aid debugging.cmd/sriov-network-config-daemon/service.go (6)
71-74: Embedding multiple fields
Having bothHostHelpersInterfaceandLoggeras embedded fields can be less explicit. Consider defining them as named fields for clarity.
130-133: Global variable usage
Storing config fields in global variables can introduce tight coupling. Consider replacing with dependency injection.
135-136: Missing success logs
After successfully initializing NICs, a success log might help with debugging.
236-267: Platform-based plugin selection
If new platforms are introduced, consider a dynamic registry or mapping to scale easily.
326-338: Write result file
Consider adding timestamps or a unique signature for improved traceability.
357-387: Timeout logic
Polling with deadlines is a reasonable solution. A future event-driven approach might improve performance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (40)
.github/workflows/test.yml(1 hunks)bindata/manifests/sriov-config-service/kubernetes/sriov-config-post-network-service.yaml(1 hunks)bindata/manifests/sriov-config-service/kubernetes/sriov-config-service.yaml(1 hunks)bindata/manifests/sriov-config-service/openshift/sriov-config-service.yaml(2 hunks)cmd/sriov-network-config-daemon/service.go(8 hunks)cmd/sriov-network-config-daemon/service_test.go(5 hunks)cmd/sriov-network-config-daemon/start.go(5 hunks)controllers/drain_controller.go(2 hunks)controllers/drain_controller_helper.go(3 hunks)controllers/drain_controller_test.go(1 hunks)controllers/helper.go(1 hunks)controllers/sriovnetworknodepolicy_controller.go(2 hunks)go.mod(1 hunks)hack/virtual-cluster-redeploy.sh(1 hunks)main.go(1 hunks)pkg/consts/constants.go(2 hunks)pkg/daemon/config.go(1 hunks)pkg/daemon/config_test.go(1 hunks)pkg/daemon/daemon.go(8 hunks)pkg/daemon/daemon_suite_test.go(1 hunks)pkg/daemon/daemon_test.go(1 hunks)pkg/daemon/event_recorder.go(3 hunks)pkg/daemon/status.go(1 hunks)pkg/daemon/writer.go(0 hunks)pkg/helper/mock/mock_helper.go(5 hunks)pkg/host/internal/network/network.go(1 hunks)pkg/host/internal/systemd/systemd.go(8 hunks)pkg/host/manager.go(5 hunks)pkg/host/mock/mock_host.go(5 hunks)pkg/host/store/store.go(1 hunks)pkg/host/types/interfaces.go(1 hunks)pkg/host/types/types.go(2 hunks)pkg/log/log.go(1 hunks)pkg/plugins/mellanox/mellanox_plugin.go(1 hunks)pkg/utils/cluster.go(1 hunks)pkg/utils/shutdown.go(1 hunks)pkg/vars/vars.go(3 hunks)test/conformance/tests/test_sriov_operator.go(1 hunks)test/util/k8sreporter/reporter.go(2 hunks)test/util/namespaces/namespaces.go(2 hunks)
💤 Files with no reviewable changes (1)
- pkg/daemon/writer.go
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: test-coverage
- GitHub Check: Golangci-lint
- GitHub Check: test
- GitHub Check: build
- GitHub Check: Analyze (go)
🔇 Additional comments (139)
pkg/log/log.go (1)
63-65: Implementation ofGetLogLevel()looks good.This new function provides a clean way to retrieve the operator's log level by converting from Zap's log level format. It complements the existing
SetLogLevel()function and follows the same pattern of using helper functions for the conversion.bindata/manifests/sriov-config-service/kubernetes/sriov-config-service.yaml (1)
10-10: Removed verbosity flags from service execution.The verbosity flags
-v 2 --zap-log-level 2have been removed from the systemd service configuration, simplifying the execution command. This aligns with the PR objective to "Move systemd to host package" and allows log level to be controlled through other mechanisms rather than being hardcoded in the service file.go.mod (1)
153-153: Marked golang.org/x/time as indirect dependency.The dependency has been appropriately marked as indirect, indicating that it's no longer directly imported by the codebase but is still required as a transitive dependency.
test/util/k8sreporter/reporter.go (2)
9-10: Reordered imports for better organization.The imports for
monitoringv1andrbacv1have been reordered. This is a minor style change that doesn't affect functionality.
65-65: Added SriovNetworkPoolConfigList to monitored resources.The
SriovNetworkPoolConfigListhas been added to the list of CRDs that the reporter will handle. This is a good addition that ensures the reporter collects information about this resource type during reporting.pkg/host/store/store.go (1)
177-177: Improved error visibilityChanged from verbose logging (V(2).Error) to standard error logging, which ensures the error message will be more visible in standard log outputs.
bindata/manifests/sriov-config-service/kubernetes/sriov-config-post-network-service.yaml (1)
9-9: Removed hardcoded verbosity flags from systemd serviceThe verbosity flags (
-v 2 --zap-log-level 2) have been removed from the systemd service ExecStart command. This is part of the PR's goal to move systemd to host package, simplifying service configurations.main.go (1)
99-99: Improved error message accuracyChanged error message from "unable to start manager" to "unable to create manager" to more accurately reflect what operation failed. This is a minor but useful clarification.
pkg/utils/shutdown.go (1)
38-38: Simplified conditional checkRemoved the explicit check for
nilfinalizers sincelen(nil)returns 0 for a nil slice in Go. This simplified condition is functionally equivalent and more concise.controllers/drain_controller_helper.go (3)
45-45: Standardization of requeue duration using constantsGood change, replacing hardcoded durations with a constant improves configurability and maintainability.
107-107: Standardization of requeue duration using constantsGood change, replacing hardcoded durations with a constant improves configurability and maintainability.
181-181: Standardization of requeue duration using constantsGood change, replacing hardcoded durations with a constant improves configurability and maintainability. This addresses the TODO comment about making the time configurable.
controllers/drain_controller.go (3)
109-113: Improved error handling for annotation existence checkGood addition of error handling that was previously missing. The variable renaming from
nodeStateExisttocurrentNodeStateExistmakes the code more explicit about what's being checked.
114-114: Adding desireNodeStateExist check for better state trackingGood addition that improves state tracking by explicitly checking for the desired node state annotation.
128-128: Enhanced requeue condition with additional state checksThe requeue condition now properly includes all relevant state checks, ensuring proper synchronization of annotations before proceeding.
pkg/host/internal/network/network.go (1)
78-78: Changed error logging to info logging with verbosityThis changes an error log to a verbosity-based info log. This is appropriate if failing to get an interface name is an expected occurrence during normal operation rather than an actual error condition.
controllers/helper.go (1)
111-111:Details
✅ Verification successful
Simplified predicate check by removing explicit label check
The predicate now returns true for any non-nil object, making it more lenient than before when it specifically checked for a label. This change aligns with the broader refactoring of drain controller logic.
To ensure this change doesn't cause too many unnecessary reconciliations, monitor the reconcile loop frequency in your logs after deployment:
🏁 Script executed:
#!/bin/bash # Check how frequently drain controller reconciles are triggered # This will help verify if the simplified predicate is causing extra processing rg -A 3 "Reconciling Drain" | wc -lLength of output: 39
Verified: Simplified Predicate Check Does Not Trigger Excess Reconciliations
The change in
controllers/helper.go(line 111) now returnse.Object != nil, which simplifies the predicate by no longer checking for a specific label. Based on the log monitoring script, the "Reconciling Drain" message appears only 4 times, indicating that this change does not cause an excessive number of reconciliation loops.No further action is required.
test/conformance/tests/test_sriov_operator.go (1)
71-76: Added test cleanup with AfterAllThe addition of the
Orderedflag to theDescribefunction and theAfterAllblock improves test stability by ensuring proper resource cleanup. This is a good practice that addresses potential resource leaks between test runs.controllers/drain_controller_test.go (1)
431-433: Fixed metadata storage from Labels to AnnotationsThis change updates the
createNodeWithLabelfunction to use Annotations instead of Labels for storing the node state drain information. This is a key architectural change that aligns with how this information is tracked in the rest of the system.The change ensures consistency with how drain state information is stored across the codebase, which previously used a mix of labels and annotations based on the error message in
expectNodeStateAnnotation()on line 328 that referencesnodeState.GetLabels().pkg/host/types/types.go (1)
37-50: Added structured types for SR-IOV configuration and result trackingThe introduction of
SriovConfigandSriovResultstructs enhances code organization by providing clear, typed data structures for SR-IOV configuration and status tracking. This is a good architectural change that improves type safety and maintainability.The YAML tags ensure proper serialization when these structures are used in configuration files or API responses.
controllers/sriovnetworknodepolicy_controller.go (1)
91-91: Replaced hardcoded requeue duration with a constantUsing
constants.DrainControllerRequeueTimeinstead of a hardcoded duration value is a good practice that improves maintainability and consistency across the codebase.pkg/plugins/mellanox/mellanox_plugin.go (1)
215-215: Improved feature management with feature gatesThe code now uses a feature gate check instead of directly checking a variable. This approach provides better flexibility and control for managing features.
test/util/namespaces/namespaces.go (2)
132-149: Good addition of pool cleanup functionalityThis new function follows the same pattern as the other cleanup functions in the file, providing consistent handling of SriovNetworkPoolConfig resources during test cleanup.
186-187: Clean function enhancementThe Clean function is now properly enhanced to include the new pool cleanup functionality, ensuring more thorough resource cleanup in tests.
pkg/host/manager.go (1)
16-16: Well-structured systemd component integrationThe changes properly integrate the systemd interface into the host management framework. This refactoring improves modularity by moving systemd-related functionality into the host package.
Also applies to: 36-36, 50-50, 71-71, 83-83
.github/workflows/test.yml (1)
43-47: Enhanced test coverage with platform-specific testingSplitting the package tests into separate steps for Kubernetes and OpenShift is a good approach to catch platform-specific issues early in the CI process.
pkg/utils/cluster.go (2)
135-142: Check and update annotations on the original object.The code now checks and modifies annotations directly on the original object, which is consistent with the earlier change. The patch is now created using the backup copy of the original state.
The approach is logically sound - keeping the original state for creating the patch while modifying the working object directly.
143-145: Using the original object for patching.The patching process now correctly uses the original state of the object to create a patch. This maintains a consistent approach of tracking what changed from the original state.
pkg/daemon/daemon_suite_test.go (1)
1-136: Well-structured test suite setup.The daemon test suite is well-organized following BDD style with Ginkgo. It properly:
- Sets up logging
- Bootstraps the test environment
- Registers necessary schemes
- Creates required Kubernetes objects
- Provides proper cleanup in AfterSuite
The thorough setup of infrastructure and API objects will enable comprehensive testing of the daemon functionality.
pkg/vars/vars.go (3)
11-11: Add new featuregate package import.Added import for the featuregate package which will be used for feature gate functionality.
77-82: Add configuration variables for node draining and feature gates.Two new variables have been added:
DisableDrainto control whether the daemon drains the node before configurationFeatureGateto interact with feature gatesThese additions enhance the configurability of the system, allowing for more flexible behavior based on deployment needs.
103-103: Initialize feature gate in init function.The FeatureGate variable is now properly initialized during package initialization.
This ensures the feature gate functionality is ready for use as soon as the package is imported.
pkg/daemon/event_recorder.go (3)
8-8: Add runtime package import.Added import for the Kubernetes runtime package, which is needed for the scheme parameter in
NewEventRecorder.
25-35: Update EventRecorder to accept a runtime scheme.The
NewEventRecorderfunction now takes an explicit scheme parameter instead of using a global one, allowing for better flexibility and testability.This change makes the event recorder more configurable and less dependent on global state, which is good practice for modularity and testing.
38-40: Add context parameter to SendEvent method.The
SendEventmethod now accepts a context parameter rather than creating one internally. This allows for propagation of request context, timeouts, and cancellation signals.This is a good practice improvement as it follows the context propagation pattern recommended for Go services, giving callers control over timeouts and cancellation.
bindata/manifests/sriov-config-service/openshift/sriov-config-service.yaml (2)
24-24: Confirm removal of verbosity flags.Previously, the
-vand--zap-log-levelflags provided granular logging. If detailed debug logs are still needed for troubleshooting, consider reintroducing them or adding a separate mechanism to manage logging verbosity.
41-41: Reevaluate post-phase logging requirements.Similarly to the pre-phase service, the removal of debug flags may make it harder to diagnose issues in post-phase operations. Verify that your current logging setup still meets debugging requirements.
pkg/daemon/config.go (3)
17-21: Check concurrency for shared state.
latestFeatureGatesis a map that could be modified if accessed from multiple goroutines. Verify whether concurrency-safe patterns (e.g., synchronization) are needed to avoid data races.
23-26: Constructor looks good.This factory function cleanly initializes fields and fosters clarity. No concerns found.
60-65: Setup method is straightforward.The controller wiring with Manager is standard, and the resource watch is correctly specified. No issues found.
pkg/consts/constants.go (1)
149-157:Details
✅ Verification successful
Verify usage of newly introduced systemd-related constants.
These file path constants appear correct. Ensure they are applied consistently in all environments, and consider verifying their references throughout the codebase.Below is a script to confirm usage of these constants:
🏁 Script executed:
#!/bin/bash # Searches for references to newly introduced systemd constants to ensure consistent usage. rg -A 2 "SriovSystemdConfigPath" rg -A 2 "SriovSystemdResultPath" rg -A 2 "SriovSystemdSupportedNicPath" rg -A 2 "SriovSystemdServiceBinaryPath" rg -A 2 "SriovServicePath" rg -A 2 "SriovPostNetworkServicePath"Length of output: 11933
Usage of systemd-related constants verified and consistent
The search confirms that all newly introduced constants are referenced appropriately across the codebase. In particular:
- SriovSystemdConfigPath and SriovSystemdResultPath are used in file checking, creation, and I/O operations within the systemd module.
- SriovSystemdSupportedNicPath and SriovSystemdServiceBinaryPath are applied consistently for managing NIC support and binary paths.
- SriovServicePath and SriovPostNetworkServicePath are correctly referenced in the daemon module (for service status checks and removals).
No issues were found. Please continue to ensure these constants remain in sync with any environment-specific file system expectations.
pkg/daemon/config_test.go (5)
23-58: Confirm concurrency safety in the manager startup.
Using a wait group in theBeforeAllblock allows synchronization for the controller manager startup. This setup looks correct. Ensure no race conditions occur when multiple test suites run concurrently.
60-62: Check for leftover resources after test.
TheAfterEachblock deletes allSriovOperatorConfigresources, which is good housekeeping. Additionally, confirm no other transient resources (e.g., secrets or configmaps) remain.
102-122: Ensure interplay of drain logic with other components.
SettingDisableDrainworks well here. Check that other code paths (e.g., finalizing a node drain) handle transitions correctly when toggling this flag during mid-drain.
124-158: Feature gates testing approach looks thorough.
Enabling and disabling feature gates is tested well. Consider testing the scenario where a gate is removed altogether, if applicable, to ensure consistent behavior.
160-170: Helper functions maintain clarity in the test suite.
These helper assertions (log level and drain) are straightforward and keep tests readable. Nice approach.pkg/daemon/status.go (2)
21-58: Robust status updates with conflict handling.
TheupdateSyncStatefunction properly retries on conflict. This is good practice. Ensure backing off on repeated failures is acceptable (or consider exponential backoff for repeated conflicts).
60-96: Confirm partial interface changes are handled correctly.
shouldUpdateStatuschecks for full equality, ignoring only the VF subset. Consider whether partial changes to other fields might require an update. This approach appears sound if partial states are never relevant.cmd/sriov-network-config-daemon/service_test.go (14)
9-10: Imports look good.
The addition ofcobraandgomockis appropriate for CLI command handling and mocking in tests.
13-16: Consistent import references.
These references topkg/constsandpkg/host/typesalign well with the revised code structure. No issues found.
71-73: Convenient test result helper.
Returning a pointer tohosttypes.SriovResultimproves clarity when checking sync status and errors.
159-166: Effective device discovery test logic.
Mock expectations look valid. This helps validate "Pre phase - baremetal cluster" flow thoroughly.
171-171: Assert mock usage.
Expect(testCtrl.Satisfied()).To(BeTrue())is a clean way to verify all mock calls were used.
179-184: Parallel coverage for virtual cluster.
These mocks test a similar flow for the virtual platform. Looks consistent with the baremetal path.
193-193: Mock satisfaction check.
No issues with verifying mock calls here either.
203-210: Apply failure path tested.
The code ensures that a plugin error leads to a failed sync status. Great coverage for error handling.
215-215: Mock usage verification.
EnsuringtestCtrl.Satisfied()is still essential after the error scenario.
222-230: Post phase - baremetal logical flow.
The combination of discovering devices, reading result, and writing success status is well tested and consistent with earlier phases.
234-234: Consistent verification of mocks.
All expectations are met again, ensuring a solid test structure.
239-245: Post phase - virtual flow checks.
These calls mirror the baremetal post flow, ensuring parity across platforms.
250-256: Proper error handling for failed pre phase.
The code setsSyncStatusFailedand retains the pre-phase error for debugging. Good approach to bubble up errors.
260-264: Additional device initialization.
The new code forwaitForDevicesInitialization()is valuable. Ensure corner cases (e.g., missing or renamed devices) are tested as well.Also applies to: 270-273
cmd/sriov-network-config-daemon/start.go (6)
26-42: Updated imports for extended functionality.
Addingocpconfigapi,fields, and the runtime packages is consistent with the new approach for scheme registration and caching.
95-96: Explicitly declaring a global scheme.
Definingscheme = runtime.NewScheme()is standard for custom resource registration. Ensure no concurrency concerns if modified from multiple goroutines.
108-115: Initialization of the scheme.
CallingAddToSchemeforclientgoscheme,sriovnetworkv1, andocpconfigapiensures that CRDs and OpenShift APIs are recognized. Good practice.
148-174: Handling kubelet kubeconfig.
useKubeletKubeConfigoverrides in-cluster configs using environment variables. Ensure local fallback or error logging if parsing fails.
176-183: OperatorConfig retrieval.
Fetching the default config from the cluster is a proven pattern; be mindful that missing or invalid CR might crash the daemon.
185-192: Feature gate initialization.
initFeatureGatesproperly loads dynamic gating fromSriovOperatorConfig. This is straightforward and maintainable.pkg/daemon/daemon_test.go (7)
1-1: Switch to dedicated test package.
Renaming todaemon_testimproves clarity by using external black-box testing for the daemon package.
5-7: Added imports for concurrency and Kubernetes client interactions.
Bringing inos,sync,time, and new K8s libraries reflects the expanded test scope.Also applies to: 14-19
22-34: Extended references for test setup.
snclientset,constants,daemon, andfeaturegateare now used. Ensure consistent version alignment if these packages are pinned in go.mod.
35-49: Shared test variables introduced.
Definingctx,cancel,k8sManager,snclient, etc. simplifies test orchestration. Beware of data races if multiple tests run in parallel.
56-269: Ordered test suite structure.
UsingOrderedplusBeforeAll,AfterAll,BeforeEach, andAfterEachsegments fosters clarity in integration tests. The mocks and environment setups appear thorough.
278-309: Node creation test helpers.
createNodesets up a test Node and NodeState pair, streamlining scenario setup. Good approach for repeated usage.
310-334: Daemon constructor for test.
createDaemoncentralizes the creation and setup of the NodeReconciler, ensuring each test scenario is self-contained.pkg/host/mock/mock_host.go (8)
145-158: Well-structured mock method for CleanSriovFilesFromHost
No issues identified in this mock implementation.
740-754: Consistent mock for ReadConfFile
The signature and return types align well with the real method.
800-814: Accurate mock for ReadSriovResult
No correctness or security concerns found.
816-830: Compliant mock for ReadSriovSupportedNics
Implementation appears consistent with the corresponding interface.
873-886: Properly structured mock for RemoveSriovResult
Matches the real method signature with no apparent issues.
1136-1150: Valid mock for WriteConfFile
Implementation syncs with the expected return signature.
1151-1164: Well-defined mock for WriteSriovResult
No revision needed.
1165-1177: Clear mock for WriteSriovSupportedNics
Conforms to the real method’s structure.pkg/helper/mock/mock_helper.go (8)
161-174: Mock for CleanSriovFilesFromHost
Implementation matches the real interface’s signature.
876-890: Mock for ReadConfFile
Signature and returned values are correct.
936-951: Mock for ReadSriovResult
Consistent with the interface and returns the expected types.
952-965: Mock for ReadSriovSupportedNics
No issues identified with returning the slice of strings.
1023-1036: Mock for RemoveSriovResult
Implementation is minimal and precise.
1335-1349: Mock for WriteConfFile
Overall structure and error handling align with the real method.
1350-1363: Mock for WriteSriovResult
Implementation is straightforward with no functional gaps.
1364-1376: Mock for WriteSriovSupportedNics
No complexities beyond simple file-handling logic in the real method.pkg/host/internal/systemd/systemd.go (8)
34-38: New systemd struct and constructor
The empty struct approach is suitable for methods that don't require state.
40-51: ReadConfFile method
The YAML unmarshalling looks correct. Returns well-defined error states.
137-170: WriteSriovResult method
Properly handles file creation, YAML marshaling, and error reporting.
172-201: ReadSriovResult method
Checks the file existence and returns a success flag, which helps clarify usage.
202-215: RemoveSriovResult method
Removes the file and logs appropriately if not found.
217-250: WriteSriovSupportedNics method
Creates the file if needed and writes NIC IDs line by line. Straightforward logic.
252-276: ReadSriovSupportedNics method
Splits file content by newline to generate a slice of NIC IDs. Consider trimming empty lines if needed.
278-316: CleanSriovFilesFromHost method
Removes relevant systemd files; handles OpenShift scenario by skipping removal of service units.pkg/daemon/daemon.go (22)
8-8: No concerns on new import
The new import reference is standard for interacting with Kubernetes core objects.
12-12: No issues with the new controller-runtime import
This facilitates the controller logic.
14-14: Import for controller package
Looks good; no issues found.
16-16: Predicate import
Enables resource filtering logic. No concerns.
22-22: Host types import
No concerns found; this is used for the new host type references.
29-32: Clearly documented struct
The documentation is clear and helpful. TheNodeReconcilerstruct approach looks well-structured for node-level state management.
46-47: Ensure concurrent access is safe
If multiple goroutines might modify or read fromloadedPlugins, consider using concurrency-safe data structures or synchronization.
50-50: Docstring looks good
The docstring is concise and straightforward.
71-73: Accurate docstring
The docstring clearly explains the initialization steps.
96-97: Good error logging
Appropriate error handling when preparing VF representors.
100-115: OpenStack initialization block
Ensure device info creation is safely idempotent to handle repeated runs without introducing duplicate or stale data.
117-119: Validate return of updateStatusFromHost
Check that the returned node status is fully populated. Partial states might lead to unexpected outcomes later.
121-122: Error logging
Properly logged and returned. No issues here.
125-129: Load plugin error handling
If a plugin fails to load, ensure the partial load doesn't cause undefined behavior. Validate or fail entirely as needed.
277-301: Consolidate plugin iteration
The iteration for drain/reboot signals is correct. Just ensure that plugin errors do not leave the node partially configured.
303-339: Systemd service checks
If one of the systemd services is missing, confirm the fallback logic is robust and doesn't silently ignore incomplete setups.
432-471: State drift detection
This logic seems valid. If triggered frequently, consider ways to cache or reduce any heavy computations.
[performance]
510-536: Drain logic
Draining plus rebooting can be tricky. Ensure the node won't get stuck if it needs a reboot but drain is incomplete.
538-597: Pod deletion concurrency
If multiple controllers or goroutines attempt to restart the device plugin, coordinate to prevent race conditions and repeated deletes.
626-642: Check for vendor ID changes
If new device IDs are introduced, incorporate them dynamically or in a config for future-proofing.
685-692: Controller concurrency
The max concurrency is set to 1. Likely needed for hardware changes, so no further concerns.
694-701: Potential concurrency
If multiple reconcile threads read/writedn.lastAppliedGeneration, consider synchronization.cmd/sriov-network-config-daemon/service.go (18)
31-31: Host types import
No issues with referencing host types.
37-37: Utils import
Likely needed for file/path helpers. No concerns.
68-70: ServiceConfig docs
Clear documentation describing the struct’s role in systemd service context.
81-92: Parameter validation
EnsuresetupLogis non-nil to avoid potential nil pointer usage when logging.
111-111: Default log level
Sets log level to 2, which may be appropriate for moderate verbosity.
119-119: New service config
The service configuration is instantiated properly.
125-127: Appropriate error flow
Configuration errors short-circuit and record the failure, ensuring proper feedback.
139-139: Device initialization wait
Helps ensure that devices are ready before config. Good practice for ephemeral hardware states.
142-144: Conditional flow
The phase-based approach is well-structured. No issues.
147-149: Clear error or success handling
Updating the sriov-result status after each phase is consistent and comprehensive.
152-166: Fall back to default config
Gracefully defaults to baremetal if the config file is missing. This is user-friendly.
169-176: Initialize NIC IDs
Ensures the operator is aware of all NICs. No issues.
178-193: Pre-phase checks
Removing stale results before reconfiguring prevents orphaned data. Good practice.
195-208: Post-phase logic
Checking the pre-phase result ensures continuity, preventing mismatched states.
209-234: Plugin-based design
Centralizing plugin calls makes sense. Watch out for partial plugin application if an error occurs mid-process.
269-306: Conditional device discovery
The logic for baremetal vs. OpenStack is clear. Confirm that new platforms also handle device discovery.
308-315: Error logging
Thoroughly records errors, making debugging easier.
317-324: Phase-based status
Setting SyncStatus to InProgress for the pre phase is a neat approach to track partial progress.
| // save init state | ||
| err = dn.HostHelpers.WriteCheckpointFile(ns) | ||
| if err != nil { | ||
| funcLog.Error(err, "failed to write checkpoint file on host") | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Checkpoint file logic
Currently, errors from writing the checkpoint file are not returned. Consider returning the error to reveal critical init-time failures.
if err != nil {
funcLog.Error(err, "failed to write checkpoint file on host")
}
- return nil
+ return err
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // save init state | |
| err = dn.HostHelpers.WriteCheckpointFile(ns) | |
| if err != nil { | |
| funcLog.Error(err, "failed to write checkpoint file on host") | |
| } | |
| return nil | |
| } | |
| // save init state | |
| err = dn.HostHelpers.WriteCheckpointFile(ns) | |
| if err != nil { | |
| funcLog.Error(err, "failed to write checkpoint file on host") | |
| } | |
| return err | |
| } |
696d64f to
e9bf0e6
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cmd/sriov-network-config-daemon/service_test.go (1)
250-257: Good error handling case testing.The test case for error conditions correctly uses the new types while checking expected error behavior.
Note that the commented-out code at lines 256-257 should be removed if it's no longer needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
cmd/sriov-network-config-daemon/service.go(7 hunks)cmd/sriov-network-config-daemon/service_test.go(5 hunks)pkg/consts/constants.go(1 hunks)pkg/daemon/daemon.go(6 hunks)pkg/daemon/daemon_test.go(1 hunks)pkg/helper/mock/mock_helper.go(5 hunks)pkg/host/internal/systemd/systemd.go(12 hunks)pkg/host/manager.go(5 hunks)pkg/host/mock/mock_host.go(5 hunks)pkg/host/types/interfaces.go(1 hunks)pkg/host/types/types.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- pkg/consts/constants.go
- pkg/daemon/daemon_test.go
- pkg/host/manager.go
- pkg/host/types/interfaces.go
- cmd/sriov-network-config-daemon/service.go
- pkg/daemon/daemon.go
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test-coverage
- GitHub Check: build
- GitHub Check: test
- GitHub Check: Analyze (go)
- GitHub Check: test-coverage
- GitHub Check: test
- GitHub Check: build
🔇 Additional comments (38)
pkg/host/types/types.go (3)
3-6: Appropriate imports for new types.The code properly imports the necessary dependencies for the new
SriovConfigandSriovResulttypes.
37-44: Well-structured SriovConfig type definition.The
SriovConfigtype is clearly defined with appropriate YAML tags and a descriptive comment. This provides a structured way to handle SR-IOV configuration data that will be stored on the host.
46-50: Clean implementation of SriovResult type.The
SriovResulttype provides a clear structure for representing the outcome of SR-IOV operations with appropriate YAML tags for serialization/deserialization.cmd/sriov-network-config-daemon/service_test.go (9)
9-16: Imports updated correctly for new types.The imports are updated to include the new host types package and other dependencies.
37-67: Improved type safety in test configuration.Good refactoring to return a structured
hosttypes.SriovConfigtype instead of raw bytes. This change enhances type safety and readability.
69-69: Enhanced readability for test data.The test data is now stored as a slice of strings instead of a multi-line string, improving readability and maintainability.
71-72: Type-safe test result creation.Function now returns a structured
hosttypes.SriovResulttype, which improves type safety compared to returning raw bytes.
162-166: Updated mock expectations with new types.The test expectations have been correctly updated to use the new structured types.
179-183: Consistent usage of new types across tests.Consistent application of the new types across different test cases.
226-230: Well-updated reader methods.The test expectations for reading results now use the structured types.
239-243: Aligned with virtual cluster case.The expectations for the virtual cluster case consistently use the new types.
259-274: Test for initialization method.This test case properly uses the new
hosttypes.SriovConfigtype and tests the initialization process.pkg/host/internal/systemd/systemd.go (10)
29-29: Proper import of the new types package.The file correctly imports the new types package from the proper path.
34-38: Good implementation of the SystemdInterface.The file defines a
systemdstruct that implements theSystemdInterface, providing a clean way to create new instances with theNew()factory function.
42-51: Updated ReadConfFile to use proper types.The method has been correctly updated to return a pointer to
types.SriovConfigand uses the constants from the consts package for file paths.
56-65: Properly structured WriteConfFile method.The method appropriately constructs a
types.SriovConfigobject with the input parameters.
139-170: Well-refactored WriteSriovResult method.The method has been properly updated to accept a pointer to
types.SriovResultand uses constants for file paths.
175-200: Enhanced ReadSriovResult with better error handling.The method now returns a boolean success flag in addition to the result and error, providing more context to callers.
203-215: Clean RemoveSriovResult implementation.The method has been properly updated to be a method on the
systemdstruct and uses constants for file paths.
220-250: Updated WriteSriovSupportedNics method.The method has been properly updated to be a method on the
systemdstruct and uses constants for file paths.
256-276: Enhanced ReadSriovSupportedNics method.The method has been properly updated to be a method on the
systemdstruct and uses constants for file paths.
282-316: Well-structured CleanSriovFilesFromHost method.The method has been properly updated to be a method on the
systemdstruct and uses constants for file paths. The conditional logic for OpenShift environments is maintained.pkg/host/mock/mock_host.go (8)
145-157: Added mock for CleanSriovFilesFromHost.Correctly adds the mock implementation for the new method.
740-753: Added mock for ReadConfFile.Correctly adds the mock implementation for the new method that returns
*types.SriovConfig.
800-814: Added mock for ReadSriovResult.Correctly adds the mock implementation for the new method that returns both the result and a boolean success flag.
816-829: Added mock for ReadSriovSupportedNics.Correctly adds the mock implementation for the new method.
873-885: Added mock for RemoveSriovResult.Correctly adds the mock implementation for the new method.
1136-1149: Added mock for WriteConfFile.Correctly adds the mock implementation for the new method.
1151-1163: Added mock for WriteSriovResult.Correctly adds the mock implementation for the new method.
1165-1177: Added mock for WriteSriovSupportedNics.Correctly adds the mock implementation for the new method.
pkg/helper/mock/mock_helper.go (8)
161-173: Appropriate mock implementation forCleanSriovFilesFromHost.The mock implementation correctly takes a boolean parameter
isOpenShiftand returns an error, following the same pattern as other mock methods in this file.
876-889: Well-structured mock implementation forReadConfFile.This mock follows the established pattern, correctly returning a
*types.SriovConfigand an error to match the interface requirements.
936-950: Correctly implemented mock forReadSriovResult.The mock method properly returns all expected values:
*types.SriovResult, boolean, and error.
952-965: Properly implemented mock forReadSriovSupportedNics.The mock implementation follows the consistent pattern, returning a slice of strings and an error as expected.
1023-1035: Well-implemented mock forRemoveSriovResult.The mock method correctly returns an error and follows the established pattern in this file.
1335-1348: Properly implemented mock forWriteConfFile.This mock correctly takes a
*v1.SriovNetworkNodeStateparameter and returns both a boolean and an error, consistent with the pattern in this file.
1350-1362: Well-structured mock forWriteSriovResult.The mock implementation correctly takes a
*types.SriovResultparameter and returns an error.
1364-1376: Appropriate mock implementation forWriteSriovSupportedNics.The mock method properly returns an error and follows the established pattern for mocked methods in this file.
this commit also switch the cmd/service to use a struct to not pass variables all around Signed-off-by: Sebastian Sch <sebassch@gmail.com>
e9bf0e6 to
bbc18ec
Compare
Summary by CodeRabbit
New Features
Refactor
Tests