Skip to content

Conversation

@clobrano
Copy link
Contributor

@clobrano clobrano commented Oct 9, 2025

Add os.O_SYNC flag to OpenFile call in trySaveRevision to force synchronous writes to disk. This prevents the revision.json file from being empty if the process terminates unexpectedly before buffered data is flushed

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 9, 2025
@openshift-ci-robot
Copy link

@clobrano: This pull request references Jira Issue OCPBUGS-60273, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

…shutdown

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@clobrano clobrano changed the title [WIP] OCPBUGS-60273: add os.O_SYNC flag to prevent empty revision.json on ungraceful … [WIP] OCPBUGS-60273: add os.O_SYNC flag to prevent empty revision.json on ungraceful shutdown Oct 9, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 9, 2025

Walkthrough

Updated atomic temporary-file save in pkg/cmd/rev/rev.go: open temp file with O_SYNC, capture and log close errors, move the final rename into the deferred cleanup and perform it only if write and close succeeded, and remove the prior unconditional immediate rename.

Changes

Cohort / File(s) Change summary
Atomic save and cleanup
pkg/cmd/rev/rev.go
Open temp file with O_SYNC; capture and log errors returned by Close; perform final rename inside the deferred cleanup only when write/close succeeded; add guard and logging to skip rename on failure; remove previous unconditional post-write rename.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the primary intent of the changeset by stating that revision.json will be persisted on ungraceful shutdown, which reflects the addition of O_SYNC and atomic write handling to prevent data loss.
Description Check ✅ Passed The PR description accurately describes the addition of the O_SYNC flag and its purpose to ensure revision.json is not empty after an ungraceful shutdown, directly matching the changes in trySaveRevision.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@clobrano clobrano force-pushed the OCPBUGS-60273-ensure-revision-json-is-written-on-disk branch from a3d111a to bbc1c24 Compare October 9, 2025 12:01
@clobrano clobrano changed the title [WIP] OCPBUGS-60273: add os.O_SYNC flag to prevent empty revision.json on ungraceful shutdown OCPBUGS-60273: Ensure revision.json persists on ungraceful shutdown Oct 9, 2025
@clobrano clobrano marked this pull request as ready for review October 9, 2025 12:06
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2025
@openshift-ci openshift-ci bot requested review from ironcladlou and tjungblu October 9, 2025 12:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/cmd/rev/rev.go (2)

192-192: Approve with suggestion: Consider explicit file.Sync() for clarity.

Adding os.O_SYNC correctly ensures that the Write operation on line 203 blocks until data is physically written to disk, preventing empty files on ungraceful shutdown. This addresses the issue described in the PR.

However, consider using an explicit file.Sync() call after the write instead, which is more conventional and makes the intent clearer:

-	file, err := os.OpenFile(tmpPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC|os.O_SYNC, 0644)
+	file, err := os.OpenFile(tmpPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0644)
 	if err != nil {
 		klog.Errorf("error opening file: %v", err)
 		return
 	}
 	defer func() {
 		if err = file.Close(); err != nil {
 			klog.Errorf("error closing file: %v", err)
 		}
 	}()
 
 	_, err = file.Write(jsonOutput)
 	if err != nil {
 		klog.Errorf("error writing result to file: %v", err)
 		return
 	}
+
+	if err = file.Sync(); err != nil {
+		klog.Errorf("error syncing file to disk: %v", err)
+		return
+	}
 
 	if err = os.Rename(tmpPath, outputFile); err != nil {

Both approaches provide the same durability guarantee, but explicit Sync() is more idiomatic and slightly more efficient (one sync operation after all writes vs synchronous mode for each write). Given the 10-second polling interval, performance impact is minimal either way.


209-212: Optional: Consider syncing the parent directory for complete durability.

While the temp-file-then-rename pattern ensures atomic updates, for strict durability guarantees after ungraceful shutdown, you may also want to sync the parent directory after the rename to ensure the directory entry update is persisted:

if err = os.Rename(tmpPath, outputFile); err != nil {
	klog.Errorf("error during rename to destination file: %v", err)
	return
}

// Optional: sync parent directory to ensure rename is durable
if dir, err := os.Open(filepath.Dir(outputFile)); err == nil {
	dir.Sync()
	dir.Close()
}

However, this is often omitted in practice as most filesystems handle this reasonably well, and the risk is low for this use case (if the rename doesn't complete, the old file remains valid). This is mentioned for completeness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0357803 and bbc1c24.

📒 Files selected for processing (1)
  • pkg/cmd/rev/rev.go (1 hunks)

@clobrano clobrano force-pushed the OCPBUGS-60273-ensure-revision-json-is-written-on-disk branch from bbc1c24 to 0d67b4e Compare October 9, 2025 12:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between bbc1c24 and 0d67b4e.

📒 Files selected for processing (1)
  • pkg/cmd/rev/rev.go (1 hunks)
🔇 Additional comments (1)
pkg/cmd/rev/rev.go (1)

192-192: Good addition of O_SYNC for durability.

Adding os.O_SYNC ensures writes are flushed to disk immediately, which is exactly what's needed to prevent empty or corrupted files on ungraceful shutdown.

Add os.O_SYNC flag to OpenFile call in trySaveRevision to force
synchronous writes to disk. This prevents the revision.json file from
being empty if the process terminates unexpectedly before buffered data
is flushed.
@clobrano clobrano force-pushed the OCPBUGS-60273-ensure-revision-json-is-written-on-disk branch from 0d67b4e to 03de23b Compare October 9, 2025 12:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/cmd/rev/rev.go (1)

197-211: Critical issue resolved: close error now gates rename.

The defer function correctly prevents renaming if either write or close fails. The early-return pattern (lines 199-200) ensures that a failed close prevents the rename, addressing the critical issue from the previous review.

The current implementation achieves the same safety as the previously suggested combined check (if err != nil || closeErr != nil) but uses idiomatic early returns instead, which is a valid Go pattern.

Minor: The comment at line 202 could be more precise—consider "Only rename if write succeeded" since close is already verified above.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0d67b4e and 03de23b.

📒 Files selected for processing (1)
  • pkg/cmd/rev/rev.go (1 hunks)
🔇 Additional comments (1)
pkg/cmd/rev/rev.go (1)

192-192: LGTM: O_SYNC ensures durability on ungraceful shutdown.

The addition of os.O_SYNC forces synchronous writes, ensuring that data reaches the disk immediately rather than being buffered. This directly addresses the issue where revision.json could be empty after unexpected process termination.

@tjungblu
Copy link
Contributor

tjungblu commented Oct 9, 2025

/lgtm

thanks @clobrano!

@tjungblu
Copy link
Contributor

tjungblu commented Oct 9, 2025

/cherry-pick release-4.20 release-4.19 release-4.18

@tjungblu
Copy link
Contributor

tjungblu commented Oct 9, 2025

/jira refresh

@openshift-cherrypick-robot

@tjungblu: once the present PR merges, I will cherry-pick it on top of release-4.20 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.20 release-4.19 release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 9, 2025
@openshift-ci-robot
Copy link

@tjungblu: This pull request references Jira Issue OCPBUGS-60273, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2025
@clobrano
Copy link
Contributor Author

clobrano commented Oct 9, 2025

/retest-required

1 similar comment
@clobrano
Copy link
Contributor Author

/retest-required

@jaypoulz
Copy link
Contributor

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 13, 2025
@openshift-ci-robot
Copy link

@jaypoulz: This PR has been marked as verified by https://issues.redhat.com/browse/OCPBUGS-60273?focusedId=28236250&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28236250.

In response to this:

/verified by https://issues.redhat.com/browse/OCPBUGS-60273?focusedId=28236250&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28236250

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

@clobrano: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 03de23b link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4fb2d22 and 2 for PR HEAD 03de23b in total

@openshift-merge-bot openshift-merge-bot bot merged commit c0663ed into openshift:main Oct 13, 2025
15 of 16 checks passed
@openshift-ci-robot
Copy link

@clobrano: Jira Issue Verification Checks: Jira Issue OCPBUGS-60273
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-60273 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

Add os.O_SYNC flag to OpenFile call in trySaveRevision to force synchronous writes to disk. This prevents the revision.json file from being empty if the process terminates unexpectedly before buffered data is flushed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.21.0-0.nightly-2025-10-15-101012

@jaypoulz
Copy link
Contributor

/cherry-pick release-4.20 release-4.19 release-4.18

@openshift-cherrypick-robot

@jaypoulz: new pull request created: #1501

In response to this:

/cherry-pick release-4.20 release-4.19 release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants