Skip to content

fix leaked cache directories and socket files after driver restart #8074#233

Merged
chrislusf merged 3 commits intomasterfrom
fix-leaked-cache-dirs-8074
Jan 22, 2026
Merged

fix leaked cache directories and socket files after driver restart #8074#233
chrislusf merged 3 commits intomasterfrom
fix-leaked-cache-dirs-8074

Conversation

@chrislusf
Copy link
Contributor

@chrislusf chrislusf commented Jan 22, 2026

This PR fixes the issue where cache directories and socket files were leaked on worker nodes after a CSI driver restart. Details: seaweedfs/seaweedfs#8074

Summary by CodeRabbit

  • Bug Fixes

    • Improved cleanup of per-volume artifacts (cache directories and local sockets) during volume unstaging and node unstage; cleanup now always attempted with warnings logged on failure rather than treating failures as fatal.
  • Refactor

    • Consolidated cache and socket path handling into reusable helpers and added guards to avoid removing system temp directories during cleanup.

✏️ Tip: You can customize this high-level summary in your review settings.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chrislusf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a significant issue where temporary cache directories and socket files were not being properly removed following a CSI driver restart, leading to resource accumulation on worker nodes. The changes introduce a more robust and centralized strategy for managing these ephemeral resources, ensuring their timely and accurate cleanup during the volume unstage and node unstage processes.

Highlights

  • Centralized Path Management: Introduced GetCacheDir and GetLocalSocket utility functions to standardize the generation of cache directory and local socket paths, improving consistency and maintainability across the driver.
  • Resource Leak Prevention: Implemented explicit cleanup mechanisms for cache directories and socket files within the NodeUnstageVolume and Volume.Unstage operations, preventing resource leakage after CSI driver restarts.
  • Refined Initialization Cleanup: Modified the NewNodeServer initialization to only clean up the driver's specific cache directory if it's explicitly configured and not the general system temporary directory, avoiding unintended data loss.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Refactors per-volume cache/socket path construction into GetCacheDir and GetLocalSocket, integrates them into the mounter, and adds guarded per-volume resource cleanup (cache dir and local socket removal) during Unstage and NodeUnstageVolume with warning logging on failures.

Changes

Cohort / File(s) Summary
Helpers & init
pkg/driver/utils.go
Add GetCacheDir, GetLocalSocket, and CleanupVolumeResources; guard startup cache cleanup to skip empty/system temp dir and log warnings on errors.
Mounter refactor
pkg/driver/mounter.go
Swap manual cache/socket path construction for GetCacheDir and GetLocalSocket; remove now-unused imports.
Unstage / NodeUnstage
pkg/driver/volume.go, pkg/driver/nodeserver.go
Invoke CleanupVolumeResources during Unstage and NodeUnstageVolume; adjust staging-path removal to follow unmounter behavior; log cleanup failures rather than returning errors.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  rect rgba(220,220,255,0.5)
    participant Client
    participant Mounter
    participant MountManager
    participant Driver
    participant FS as Filesystem
  end

  Client->>Mounter: Request Mount(volumeID, options)
  Mounter->>Mounter: GetCacheDir(cacheBase, volumeID)
  Mounter->>MountManager: GetLocalSocket(volumeSocketDir, volumeID)
  Mounter->>Driver: Build MountRequest (cacheDir, socket)
  Driver->>MountManager: Perform mount (via socket)
  MountManager->>FS: Attach volume
  FS-->>MountManager: OK
  MountManager-->>Driver: Mount result
  Driver-->>Mounter: Mount success
Loading
sequenceDiagram
  autonumber
  rect rgba(220,255,220,0.5)
    participant API
    participant Volume
    participant Driver
    participant MountManager
    participant OS as Filesystem
  end

  API->>Volume: Unstage(volumeID)
  Volume->>Volume: If unmounter != nil -> Unmount()
  Volume->>OS: Remove staging path (if present)
  Volume->>Driver: CleanupVolumeResources(volumeID)
  Driver->>MountManager: Remove local socket (GetLocalSocket)
  Driver->>OS: Remove cache dir (GetCacheDir)
  MountManager-->>Driver: OK / warn on failure
  OS-->>Driver: OK / warn on failure
  Driver-->>API: Unstage complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Soft paws tidy paths at night,
Cache and sockets tucked out of sight,
Helpers hop in, neat and spry,
Unstage hums a gentle sigh,
A rabbit's cheer for code now light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: fixing the leak of cache directories and socket files that occur after a CSI driver restart, directly addressing the issue referenced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

gemini-code-assist[bot]

This comment was marked as resolved.

@chrislusf
Copy link
Contributor Author

/gemini review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a resource leak issue where cache directories and socket files were not properly cleaned up after a CSI driver restart on worker nodes. The fix ensures that per-volume artifacts (cache directories and Unix socket files) are always removed during volume unstaging.

Changes:

  • Added CleanupVolumeResources function that removes both cache directories and socket files for a given volume
  • Refactored path construction logic into reusable helper functions (GetCacheDir, GetLocalSocket)
  • Updated volume unstaging logic to always call cleanup regardless of the unmount path taken

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
pkg/driver/volume.go Restructured control flow in Unstage method to ensure cleanup is always called for both normal and forced unmount paths
pkg/driver/utils.go Added helper functions for path construction and volume resource cleanup; added safety check in NewNodeServer to avoid cleaning system temp directory
pkg/driver/nodeserver.go Added cleanup call in NodeUnstageVolume for the case when volume is not found in the volume map (post-restart scenario)
pkg/driver/mounter.go Refactored to use new helper functions for consistent path construction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gemini-code-assist[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@chrislusf chrislusf merged commit 28fdd55 into master Jan 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants