Skip to content

WINC-1635, WINC-1592: enable log rotation for kubelet and kubeproxy services#3766

Open
jrvaldes wants to merge 6 commits intoopenshift:masterfrom
jrvaldes:log-rotation-kubelet
Open

WINC-1635, WINC-1592: enable log rotation for kubelet and kubeproxy services#3766
jrvaldes wants to merge 6 commits intoopenshift:masterfrom
jrvaldes:log-rotation-kubelet

Conversation

@jrvaldes
Copy link
Contributor

@jrvaldes jrvaldes commented Feb 6, 2026

This pull request introduces configurable log rotation and flushing for Windows kubelet services, allowing log file size, retention, and flush interval to be set via environment variables. It also adds helper functions to safely parse and use these environment variables, and includes unit tests to ensure their correctness.

Log rotation and flush configuration:

  • Added environment variable support for log file size (SERVICES_LOG_FILE_SIZE), log file age (SERVICES_LOG_FILE_AGE), and log flush interval (SERVICES_LOG_FLUSH_INTERVAL), with sensible defaults, to control log runner behavior in pkg/services/services.go. [1] [2]
  • Updated the kubelet service command generation to use a new helper function, getLogRunnerForCmd, which incorporates the log rotation and flush parameters. [1] [2]
  • Added the log flush interval to the kubelet configuration via the Logging field, using the new logsapi.LoggingConfiguration struct. [1] [2]

Helper functions and environment parsing:

  • Introduced getEnvQuantityOrDefault and getEnvDurationOrDefault functions to safely parse quantity and duration environment variables, falling back to defaults if invalid. [1] [2]

Testing:

  • Added unit tests for log runner command generation and environment variable parsing functions in pkg/services/services_test.go, covering various scenarios and edge cases.

Summary by CodeRabbit

  • New Features

    • Automatic log rotation for managed Windows services, configurable via environment variables (file size, age, flush interval)
    • Tooling to enable verbose operator debug logging and reduce service log file sizes (applies changes and ensures restart)
  • Updates

    • Kubelet configuration now includes structured logging with a 5s flush frequency
  • Tests

    • Added comprehensive tests for env parsing and log-runner command construction
  • Documentation

    • README updated with log rotation feature and enablement instructions
  • Chores

    • Dependency declarations adjusted for build consistency

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2026
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/retitle WINC-1635: enable log rotation for kubelet service

@openshift-ci openshift-ci bot changed the title Log rotation kubelet WINC-1635: enable log rotation for kubelet service Feb 6, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 6, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 6, 2026

@jrvaldes: This pull request references WINC-1635 which is a valid jira issue.

Details

In response to this:

This pull request introduces configurable log rotation and flushing for Windows kubelet services, allowing log file size, retention, and flush interval to be set via environment variables. It also adds helper functions to safely parse and use these environment variables, and includes unit tests to ensure their correctness.

Log rotation and flush configuration:

  • Added environment variable support for log file size (SERVICES_LOG_FILE_SIZE), log file age (SERVICES_LOG_FILE_AGE), and log flush interval (SERVICES_LOG_FLUSH_INTERVAL), with sensible defaults, to control log runner behavior in pkg/services/services.go. [1] [2]
  • Updated the kubelet service command generation to use a new helper function, getLogRunnerForCmd, which incorporates the log rotation and flush parameters. [1] [2]
  • Added the log flush interval to the kubelet configuration via the Logging field, using the new logsapi.LoggingConfiguration struct. [1] [2]

Helper functions and environment parsing:

  • Introduced getEnvQuantityOrDefault and getEnvDurationOrDefault functions to safely parse quantity and duration environment variables, falling back to defaults if invalid. [1] [2]

Testing:

  • Added unit tests for log runner command generation and environment variable parsing functions in pkg/services/services_test.go, covering various scenarios and edge cases.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 336fc21 to ad7f5c8 Compare February 6, 2026 16:32
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test ?

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test lint

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test unit

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test images

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 4469104 to f6a626f Compare February 6, 2026 17:57
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test vsphere-e2e-operator

1 similar comment
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 6, 2026

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 06d4c31 to ca06837 Compare February 7, 2026 04:49
@jrvaldes
Copy link
Contributor Author

jrvaldes commented Feb 7, 2026

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from ca06837 to f994aee Compare February 11, 2026 22:58
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from f994aee to 547445b Compare February 12, 2026 04:15
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 547445b to abd7af6 Compare February 12, 2026 13:07
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds kubelet Logging configuration (FlushFrequency 5s), promotes k8s.io/component-base to a direct go.mod require, introduces environment-driven log rotation for managed Windows services with helpers, tests and package init-time env parsing, and adds enable_debug_logging() in hack/common.sh to toggle WMCO debug logging.

Changes

Cohort / File(s) Summary
Dependency change
go.mod
Moved k8s.io/component-base v0.34.4 from indirect to direct require.
Kubelet logging
pkg/nodeconfig/nodeconfig.go, pkg/nodeconfig/nodeconfig_test.go
Imported k8s.io/component-base/logs/api/v1 and added Logging to generated KubeletConfiguration (FlushFrequency = 5s, SerializeAsString enabled); test expectation updated from 0s to 5s.
Service init vars
pkg/services/init.go
New file: declares package-level vars (logFileSize, logFileAge, flushInterval) and initializes them from env using getEnvQuantity/getEnvDuration, logging errors on parse failures.
Service log runner & env parsing
pkg/services/services.go
Added env var constants (SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE, SERVICES_LOG_FLUSH_INTERVAL), helpers (getEnvQuantity, getEnvDuration), and getLogRunnerForCmd to wrap service commands with kube-log-runner flags; updated service command construction to use the wrapper.
Service tests
pkg/services/services_test.go
New tests covering getEnvDuration, getEnvQuantity, and getLogRunnerForCmd, including ordering assertions and package-level state preservation/restoration.
Test automation helper
hack/common.sh
Added enable_debug_logging() to detect OLM v0/v1 and patch WMCO subscription or manager container to enable --debugLogging and set SERVICES_LOG_FILE_SIZE=1M; OLmv1 path applies envs in one call and restarts the deployment, includes verification retry loop and waits for rollout; explicit error handling on unknown OLM version.
Documentation
README.md
Added "Automatic log rotation for managed Windows services" feature section (appears duplicated in the diff); minor formatting tweaks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: enabling log rotation for kubelet and kubeproxy services with JIRA issue references.
Stable And Deterministic Test Names ✅ Passed All test names in the pull request are static and deterministic with no dynamic content like timestamps, UUIDs, or generated suffixes.
Test Structure And Quality ✅ Passed The test code meets all five quality requirements: single responsibility with focused test cases, proper setup/cleanup using t.Cleanup() and t.Setenv(), no timeouts needed for unit tests, meaningful assertion messages with context, and consistency with codebase conventions using testify and table-driven patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@jrvaldes
Copy link
Contributor Author

@coderabbitai review

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from abd7af6 to 5abf765 Compare February 16, 2026 22:07
@jrvaldes
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In `@pkg/services/services_test.go`:
- Around line 158-189: The test TestGetLogRunnerForCmd relies on package-level
defaults (logFileSize, logFileAge, flushInterval) which are initialized from
environment variables; make the test deterministic by setting those env vars at
test start (use t.Setenv for SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE,
SERVICES_LOG_FLUSH_INTERVAL) to the expected default values ("100M", "168h",
"5s") before calling getLogRunnerForCmd so the output always matches the
hardcoded expected strings.
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In `@pkg/services/services_test.go`:
- Around line 158-189: The test TestGetLogRunnerForCmd relies on package-level
defaults (logFileSize, logFileAge, flushInterval) which are initialized from
environment variables; make the test deterministic by setting those env vars at
test start (use t.Setenv for SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE,
SERVICES_LOG_FLUSH_INTERVAL) to the expected default values ("100M", "168h",
"5s") before calling getLogRunnerForCmd so the output always matches the
hardcoded expected strings.
pkg/services/services_test.go (1)

158-189: Test relies on package-level defaults initialized at startup.

The test hardcodes expected values (100M, 168h, 5s) that match the default constants. However, since logFileSize, logFileAge, and flushInterval are set in init() from environment variables, if any of these env vars (SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE, SERVICES_LOG_FLUSH_INTERVAL) are set in the test environment, this test will fail unexpectedly.

Consider either:

  1. Explicitly setting the env vars in this test using t.Setenv to ensure predictable behavior, or
  2. Documenting that this test assumes default values
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/services_test.go` around lines 158 - 189, The test
TestGetLogRunnerForCmd relies on package-level defaults (logFileSize,
logFileAge, flushInterval) which are initialized from environment variables;
make the test deterministic by setting those env vars at test start (use
t.Setenv for SERVICES_LOG_FILE_SIZE, SERVICES_LOG_FILE_AGE,
SERVICES_LOG_FLUSH_INTERVAL) to the expected default values ("100M", "168h",
"5s") before calling getLogRunnerForCmd so the output always matches the
hardcoded expected strings.

this commit explicitly configures kubelet's logging flush frequency to 5
seconds
to ensure log entries are written to disk in near real-time.
ran:
go mod tidy && go mod vendor
@jrvaldes
Copy link
Contributor Author

I'm just wondering if setting it through environment variables is the best choice. I think it might be nice to save them in a config somewhere that can get verified.

Makes sense, what do you mean by "can get verified"?

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch 2 times, most recently from 0465980 to 54c4ce8 Compare February 25, 2026 17:37
@wgahnagl
Copy link
Contributor

wgahnagl commented Feb 25, 2026

I'm just wondering if setting it through environment variables is the best choice. I think it might be nice to save them in a config somewhere that can get verified.

Makes sense, what do you mean by "can get verified"?

like, on parsing the config you can have a quick bit of code to verify that everything is correct.

@jrvaldes
Copy link
Contributor Author

I'm just wondering if setting it through environment variables is the best choice. I think it might be nice to save them in a config somewhere that can get verified.

Makes sense, what do you mean by "can get verified"?

like, on parsing the config you can have a quick bit of code to verify that everything is correct.

exactly, that is what the validations around the env var values is about: verify that everything is correct syntactically

@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 54c4ce8 to 857920f Compare February 26, 2026 00:05
@jrvaldes
Copy link
Contributor Author

/test ?

@jrvaldes
Copy link
Contributor Author

/test azure-e2e-upgrade

@jrvaldes jrvaldes requested review from sebsoto and wgahnagl February 26, 2026 00:06
value := os.Getenv(key)
value = strings.TrimSpace(value)
if value == "" {
// not present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove these kind of comments and other simple code comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

// logging to the given logfilePath. Log rotation parameters can be configured via environment variables.
func getLogRunnerForCmd(commandPath, logfilePath string) string {
cmdBuilder := strings.Builder{}
// log runner path must be first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments here may be redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

jrvaldes added 4 commits March 3, 2026 09:12
this commit configures kubelet service to use kube-log-runner wrapper to
enable optional log rotation on Windows nodes. This prevents unbounded log
growth that could exhaust disk space.

Log rotation parameters are configurable via environment variables:
- SERVICES_LOG_FILE_SIZE: Size limit before rotation
- SERVICES_LOG_FILE_AGE: Retention period for rotated logs
- SERVICES_LOG_FLUSH_INTERVAL: Flush interval to disk

This approach is necessary because Windows services don't have a native
mechanism for output redirection, and Kubernetes has deprecated the
--log-file flag for components.
this commit configures kubeproxy service to use kube-log-runner wrapper to
enable automatic log rotation on Windows nodes. This prevents unbounded log
growth that could exhaust disk space.

Log rotation parameters are configurable via environment variables:
- SERVICES_LOG_FILE_SIZE: Size limit before rotation (default: 100M)
- SERVICES_LOG_FILE_AGE: Retention period for rotated logs (default: 168h)
- SERVICES_LOG_FLUSH_INTERVAL: Flush interval to disk (default: 5s)
this commit documents the automatic log rotation for managed
Windows services in the Enabled features section in the README.md
@jrvaldes jrvaldes force-pushed the log-rotation-kubelet branch from 857920f to 36f6377 Compare March 3, 2026 14:13
@wgahnagl
Copy link
Contributor

wgahnagl commented Mar 3, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2026
@jrvaldes jrvaldes marked this pull request as ready for review March 3, 2026 21:31
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 3, 2026
@openshift-ci openshift-ci bot requested a review from mansikulkarni96 March 3, 2026 21:31
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@jrvaldes: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Mar 4, 2026

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@jrvaldes: jrvaldes unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers.

Details

In response to this:

/override ci/prow/gcp-e2e-operator

passed before

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_windows-machine-config-operator/3766/pull-ci-openshift-windows-machine-config-operator-master-gcp-e2e-operator/2028946291865686016

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jrvaldes
Copy link
Contributor Author

jrvaldes commented Mar 4, 2026

/override ci/prow/gcp-e2e-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@jrvaldes: Overrode contexts on behalf of jrvaldes: ci/prow/gcp-e2e-operator

Details

In response to this:

/override ci/prow/gcp-e2e-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants