fix leaked cache directories and socket files after driver restart #8074#233
fix leaked cache directories and socket files after driver restart #8074#233
Conversation
Summary of ChangesHello @chrislusf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant issue where temporary cache directories and socket files were not being properly removed following a CSI driver restart, leading to resource accumulation on worker nodes. The changes introduce a more robust and centralized strategy for managing these ephemeral resources, ensuring their timely and accurate cleanup during the volume unstage and node unstage processes. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRefactors per-volume cache/socket path construction into GetCacheDir and GetLocalSocket, integrates them into the mounter, and adds guarded per-volume resource cleanup (cache dir and local socket removal) during Unstage and NodeUnstageVolume with warning logging on failures. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
rect rgba(220,220,255,0.5)
participant Client
participant Mounter
participant MountManager
participant Driver
participant FS as Filesystem
end
Client->>Mounter: Request Mount(volumeID, options)
Mounter->>Mounter: GetCacheDir(cacheBase, volumeID)
Mounter->>MountManager: GetLocalSocket(volumeSocketDir, volumeID)
Mounter->>Driver: Build MountRequest (cacheDir, socket)
Driver->>MountManager: Perform mount (via socket)
MountManager->>FS: Attach volume
FS-->>MountManager: OK
MountManager-->>Driver: Mount result
Driver-->>Mounter: Mount success
sequenceDiagram
autonumber
rect rgba(220,255,220,0.5)
participant API
participant Volume
participant Driver
participant MountManager
participant OS as Filesystem
end
API->>Volume: Unstage(volumeID)
Volume->>Volume: If unmounter != nil -> Unmount()
Volume->>OS: Remove staging path (if present)
Volume->>Driver: CleanupVolumeResources(volumeID)
Driver->>MountManager: Remove local socket (GetLocalSocket)
Driver->>OS: Remove cache dir (GetCacheDir)
MountManager-->>Driver: OK / warn on failure
OS-->>Driver: OK / warn on failure
Driver-->>API: Unstage complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/gemini review |
There was a problem hiding this comment.
Pull request overview
This PR fixes a resource leak issue where cache directories and socket files were not properly cleaned up after a CSI driver restart on worker nodes. The fix ensures that per-volume artifacts (cache directories and Unix socket files) are always removed during volume unstaging.
Changes:
- Added
CleanupVolumeResourcesfunction that removes both cache directories and socket files for a given volume - Refactored path construction logic into reusable helper functions (
GetCacheDir,GetLocalSocket) - Updated volume unstaging logic to always call cleanup regardless of the unmount path taken
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/driver/volume.go | Restructured control flow in Unstage method to ensure cleanup is always called for both normal and forced unmount paths |
| pkg/driver/utils.go | Added helper functions for path construction and volume resource cleanup; added safety check in NewNodeServer to avoid cleaning system temp directory |
| pkg/driver/nodeserver.go | Added cleanup call in NodeUnstageVolume for the case when volume is not found in the volume map (post-restart scenario) |
| pkg/driver/mounter.go | Refactored to use new helper functions for consistent path construction |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR fixes the issue where cache directories and socket files were leaked on worker nodes after a CSI driver restart. Details: seaweedfs/seaweedfs#8074
Summary by CodeRabbit
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.