- 
                Notifications
    
You must be signed in to change notification settings  - Fork 147
 
OCPBUGS-60273: Ensure revision.json persists on ungraceful shutdown #1494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-60273: Ensure revision.json persists on ungraceful shutdown #1494
Conversation
| 
           @clobrano: This pull request references Jira Issue OCPBUGS-60273, which is invalid: 
 Comment  The bug has been updated to refer to the pull request using the external bug tracker. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.  | 
    
| 
           Skipping CI for Draft Pull Request.  | 
    
          
WalkthroughUpdated atomic temporary-file save in  Changes
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment   | 
    
a3d111a    to
    bbc1c24      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pkg/cmd/rev/rev.go (2)
192-192: Approve with suggestion: Consider explicitfile.Sync()for clarity.Adding
os.O_SYNCcorrectly ensures that theWriteoperation on line 203 blocks until data is physically written to disk, preventing empty files on ungraceful shutdown. This addresses the issue described in the PR.However, consider using an explicit
file.Sync()call after the write instead, which is more conventional and makes the intent clearer:- file, err := os.OpenFile(tmpPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC|os.O_SYNC, 0644) + file, err := os.OpenFile(tmpPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0644) if err != nil { klog.Errorf("error opening file: %v", err) return } defer func() { if err = file.Close(); err != nil { klog.Errorf("error closing file: %v", err) } }() _, err = file.Write(jsonOutput) if err != nil { klog.Errorf("error writing result to file: %v", err) return } + + if err = file.Sync(); err != nil { + klog.Errorf("error syncing file to disk: %v", err) + return + } if err = os.Rename(tmpPath, outputFile); err != nil {Both approaches provide the same durability guarantee, but explicit
Sync()is more idiomatic and slightly more efficient (one sync operation after all writes vs synchronous mode for each write). Given the 10-second polling interval, performance impact is minimal either way.
209-212: Optional: Consider syncing the parent directory for complete durability.While the temp-file-then-rename pattern ensures atomic updates, for strict durability guarantees after ungraceful shutdown, you may also want to sync the parent directory after the rename to ensure the directory entry update is persisted:
if err = os.Rename(tmpPath, outputFile); err != nil { klog.Errorf("error during rename to destination file: %v", err) return } // Optional: sync parent directory to ensure rename is durable if dir, err := os.Open(filepath.Dir(outputFile)); err == nil { dir.Sync() dir.Close() }However, this is often omitted in practice as most filesystems handle this reasonably well, and the risk is low for this use case (if the rename doesn't complete, the old file remains valid). This is mentioned for completeness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
pkg/cmd/rev/rev.go(1 hunks)
bbc1c24    to
    0d67b4e      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
pkg/cmd/rev/rev.go(1 hunks)
🔇 Additional comments (1)
pkg/cmd/rev/rev.go (1)
192-192: Good addition of O_SYNC for durability.Adding
os.O_SYNCensures writes are flushed to disk immediately, which is exactly what's needed to prevent empty or corrupted files on ungraceful shutdown.
Add os.O_SYNC flag to OpenFile call in trySaveRevision to force synchronous writes to disk. This prevents the revision.json file from being empty if the process terminates unexpectedly before buffered data is flushed.
0d67b4e    to
    03de23b      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/cmd/rev/rev.go (1)
197-211: Critical issue resolved: close error now gates rename.The defer function correctly prevents renaming if either write or close fails. The early-return pattern (lines 199-200) ensures that a failed close prevents the rename, addressing the critical issue from the previous review.
The current implementation achieves the same safety as the previously suggested combined check (
if err != nil || closeErr != nil) but uses idiomatic early returns instead, which is a valid Go pattern.Minor: The comment at line 202 could be more precise—consider "Only rename if write succeeded" since close is already verified above.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
pkg/cmd/rev/rev.go(1 hunks)
🔇 Additional comments (1)
pkg/cmd/rev/rev.go (1)
192-192: LGTM: O_SYNC ensures durability on ungraceful shutdown.The addition of
os.O_SYNCforces synchronous writes, ensuring that data reaches the disk immediately rather than being buffered. This directly addresses the issue whererevision.jsoncould be empty after unexpected process termination.
| 
           /lgtm thanks @clobrano!  | 
    
| 
           /cherry-pick release-4.20 release-4.19 release-4.18  | 
    
| 
           /jira refresh  | 
    
| 
           @tjungblu: once the present PR merges, I will cherry-pick it on top of  In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.  | 
    
| 
           @tjungblu: This pull request references Jira Issue OCPBUGS-60273, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
 No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.  | 
    
| 
           [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: clobrano, tjungblu The full list of commands accepted by this bot can be found here. The pull request process is described here 
Needs approval from an approver in each of these files:
 
      
 Approvers can indicate their approval by writing   | 
    
| 
           /retest-required  | 
    
    
      
        1 similar comment
      
    
  
    | 
           /retest-required  | 
    
| 
           @jaypoulz: This PR has been marked as verified by  In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.  | 
    
| 
           @clobrano: The following test failed, say  
 Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.  | 
    
c0663ed
      into
      
  
    openshift:main
  
    | 
           @clobrano: Jira Issue Verification Checks: Jira Issue OCPBUGS-60273 Jira Issue OCPBUGS-60273 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.  | 
    
| 
           Fix included in accepted release 4.21.0-0.nightly-2025-10-15-101012  | 
    
| 
           /cherry-pick release-4.20 release-4.19 release-4.18  | 
    
| 
           @jaypoulz: new pull request created: #1501 In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.  | 
    
Add
os.O_SYNCflag to OpenFile call intrySaveRevisionto force synchronous writes to disk. This prevents therevision.jsonfile from being empty if the process terminates unexpectedly before buffered data is flushed