Skip to content

Conversation

@arreyder
Copy link
Contributor

@arreyder arreyder commented Dec 17, 2025

Add file syncs to prevent ZFS ARC race condition by making sure that files are fully written before being read.

Add Sync() after io.Copy() in copyToTempFile() (s3 manager) - required for immediate reads
Add Sync() after io.Copy() in copyFileToTmp() (local manager) - required for immediate reads
Add Sync() after io.Copy() in SaveC1Z() (local manager) - defensive

On ZFS ARC, files read immediately after write can see incomplete data from write buffers. Explicit sync ensures data is flushed before reads.
Guessing this will adds ~1-5ms latency per file operation, negligible compared to network/compression overhead.

Summary by CodeRabbit

  • Bug Fixes
    • Improved data integrity by ensuring files are explicitly synced to disk after writes.
    • Added validation to verify that bytes written match on-disk file sizes, catching mismatches.
    • Enhanced error handling with clearer context for sync or size verification failures.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

Walkthrough

Adds explicit file synchronization (Sync) and defensive size verification after writing/copying c1z files in both local and S3 managers; errors are wrapped with context when sync/stat/size checks fail.

Changes

Cohort / File(s) Summary
Local manager updates
pkg/dotc1z/manager/local/local.go
After copying/writing to temp and destination files, call Sync() and Stat(); capture source size, verify bytes written match source and temp file sizes; return wrapped errors on sync or size mismatches.
S3 manager updates
pkg/dotc1z/manager/s3/s3.go
After writing to temp file, call Sync() and Stat(); assign io.Copy result to a variable, then verify written bytes against on-disk file size and return wrapped errors on failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Reviewers should check error-wrapping consistency and correct handling/closing of files on error.
  • Pay attention to race conditions around Stat vs concurrent writers and platform-specific semantics of Sync/Stat.

Possibly related PRs

Suggested reviewers

  • ggreer

Poem

“I nibbled bytes beneath the moon’s soft glow,
I synced each file so safe they grow,
I counted bytes — one, two, three — all right,
No lost carrot-data in the night,
A rabbit’s hop keeps storage tight.” 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fsync where it matters' directly and clearly describes the main change: adding strategic file synchronization calls to critical locations in the codebase.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch crr/c1zCorruption

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/dotc1z/manager/s3/s3.go (1)

66-79: Consider removing redundant explicit Close() calls.

The explicit _ = f.Close() calls on error paths (lines 66, 73, 77) are redundant since defer f.Close() at line 51 will execute when the function returns. While Close() is idempotent and this doesn't cause bugs, it adds confusion about resource management strategy.

If the goal is to release resources immediately on error, the Go idiom is:

-		if err := f.Sync(); err != nil {
-			_ = f.Close()
-			return fmt.Errorf("failed to sync temp file: %w", err)
-		}
+		if err := f.Sync(); err != nil {
+			return fmt.Errorf("failed to sync temp file: %w", err)
+		}

Apply similar changes to lines 73 and 77. The defer will handle cleanup in all cases.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60ab7a9 and ace92f0.

📒 Files selected for processing (2)
  • pkg/dotc1z/manager/local/local.go (2 hunks)
  • pkg/dotc1z/manager/s3/s3.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-lint
🔇 Additional comments (3)
pkg/dotc1z/manager/local/local.go (2)

66-72: LGTM! Sync ensures data durability before use.

The Sync() call correctly ensures that buffered data is flushed to disk before the temporary file is read. This prevents the ZFS ARC race condition described in the PR objectives.


157-161: LGTM! Defensive Sync prevents incomplete writes.

The Sync() call ensures the destination file is fully written before SaveC1Z returns, preventing downstream code from reading incomplete data.

Note: Unlike s3.go (lines 70-79), this doesn't include a defensive size verification comparing the bytes written to the actual file size. While local file operations are typically more reliable than network transfers, consider whether adding similar verification here would be beneficial for consistency.

pkg/dotc1z/manager/s3/s3.go (1)

56-68: Excellent! Sync and size verification enhance data integrity.

The changes correctly:

  • Capture bytes written for verification
  • Sync to flush buffers before subsequent reads
  • Wrap errors with clear context

The defensive size check at lines 70-79 is particularly valuable for detecting buffered write issues in network transfers.

@arreyder arreyder requested a review from loganintech December 17, 2025 16:44

if r != nil {
_, err = io.Copy(f, r)
written, err := io.Copy(f, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call io.copy() in local manager too. Should we error if the bytes written in there don't match? I think we should either check the size vs bytes written and error in both cases, or error in neither case, not just the s3 case.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/dotc1z/manager/local/local.go (1)

176-180: Sync is correct; consider adding size verification for consistency.

The sync call properly addresses the ZFS ARC race condition. However, copyFileToTmp includes defensive size verification (comparing source, written, and destination sizes), while this function only syncs without verifying the bytes written match the source.

For consistency and equal protection, consider adding size verification here as well.

Apply this diff to add defensive size verification:

 	tmpFile, err := os.Open(l.tmpPath)
 	if err != nil {
 		return err
 	}
 	defer tmpFile.Close()
+
+	// Get source file size for verification
+	tmpStat, err := tmpFile.Stat()
+	if err != nil {
+		return fmt.Errorf("failed to stat temp file: %w", err)
+	}
+	expectedSize := tmpStat.Size()
 
 	dstFile, err := os.Create(l.filePath)
 	if err != nil {
 		return err
 	}
 	defer dstFile.Close()
 
 	size, err := io.Copy(dstFile, tmpFile)
 	if err != nil {
 		return err
 	}
 
 	// CRITICAL: Sync to ensure data is written before function returns.
 	// This is especially important on ZFS ARC where writes may be cached.
 	if err := dstFile.Sync(); err != nil {
 		return fmt.Errorf("failed to sync destination file: %w", err)
 	}
+
+	// Verify file size matches what we wrote (defensive check)
+	dstStat, err := dstFile.Stat()
+	if err != nil {
+		return fmt.Errorf("failed to stat destination file: %w", err)
+	}
+	if dstStat.Size() != size {
+		return fmt.Errorf("file size mismatch: wrote %d bytes but file is %d bytes", size, dstStat.Size())
+	}
+	if size != expectedSize {
+		return fmt.Errorf("copy size mismatch: expected %d bytes from temp but copied %d bytes", expectedSize, size)
+	}
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ace92f0 and 43368a4.

📒 Files selected for processing (1)
  • pkg/dotc1z/manager/local/local.go (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (1)
pkg/dotc1z/manager/local/local.go (1)

62-91: Excellent defensive I/O with three-way size verification.

The implementation correctly:

  • Captures source size before copy
  • Syncs before verification (ensuring write buffers are flushed)
  • Verifies temp file size matches both io.Copy's return value and the source size

This three-way check (expectedSize == written == stat.Size()) provides strong guarantees against data corruption or incomplete writes, which is critical for the ZFS ARC race condition described in the PR.

@ggreer ggreer merged commit 03aafcc into main Dec 18, 2025
6 checks passed
@ggreer ggreer deleted the crr/c1zCorruption branch December 18, 2025 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants