-
Notifications
You must be signed in to change notification settings - Fork 4
sambuc/feat merge restart pr 1 #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: build/csi-rclone-improvements
Are you sure you want to change the base?
sambuc/feat merge restart pr 1 #77
Conversation
* Ensure the mutex is not copied, even when the nodeServer is copied by storing a pointer to the mutex, instead of the mutex itself * Use Mutex instead of RWMutex, as having two readers of the variable at the same time means we are going to write the state at the same time, corrupting the state file on storage. * Mutex / RWMutex are not recursive / re-entrant in Go, so in two cases we do not call `Unlock()` through `defer` as `persistState()` also takes the lock. * As a rule of thumb, Locking a Mutex should be as close as possible to the resource requiring it, to minimize the size of the critical section / the time spent holding the lock.
|
|
||
| return &nodeServer{ | ||
| // Use kubelet plugin directory for state persistence | ||
| stateFile := "/var/lib/kubelet/plugins/csi-rclone/mounted_volumes.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but it does not hurt to make this a constant
| stateFile := "/var/lib/kubelet/plugins/csi-rclone/mounted_volumes.json" | |
| const stateFile string = "/var/lib/kubelet/plugins/csi-rclone/mounted_volumes.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will find this (and the comment) moved directly into the instantiation of nodeServer just below as this is the only place where this variable is used in a commit in #78.
| onDaemonReady := func() error { | ||
| if d.ns != nil { | ||
| return d.ns.remountTrackedVolumes(context.Background()) | ||
| } | ||
| return nil | ||
| } | ||
| return d.ns.RcloneOps.Run(onDaemonReady) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Here it is better to pass in a context to the enclosing function and propagate that into remountTrackedVolumes. And an even better thing is to wrap the context passed into Run into a timeout so that if remounting the volumes takes a long time the server can give up and start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can even decide whether to have a timeout for the total remount of all storages or just make a new timeout context for each call to Mount for each mountpoint. And you could even do both.
| RcloneOps Operations | ||
|
|
||
| // Track mounted volumes for automatic remounting | ||
| mountedVolumes map[string]*MountedVolume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why pointer here and not just map[string]MountedVolume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it allows for the map to stay small, therefore making it simpler and faster to reallocate when needed as we append often to it.
The MountedVolume struct is not that small, also this decouples it a bit more from the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really should not overthink that in go. This map will not have intensive use, so it should stay as simple as it can get. In go, this means using values as much as possible.
| ns.mutex.Unlock() | ||
| klog.Infof("Tracked mounted volume %s at path %s", volumeId, targetPath) | ||
|
|
||
| if err := ns.persistState(); err != nil { | ||
| klog.Errorf("Failed to persist volume state: %v", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think that updating the mountedVolumes map and writing to file should be done in one single lock/unlock cycle, not two. What could happen now is that as soon as you unlock the mutex, it can be picked up by another trackMountedVolume call. This can probably cause weird behaviour or it can cause us to miss persisting the state to file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you could do is make persistState be a fully standalone private function that just takes a map and a file name and writes that file. And the locking happens all in the methods that just call the function.
You could even go as far as calling that function unsafePersistState and/or also making it clear in the docstring that it is not goroutine safe.
| // Load volume state from disk | ||
| func (ns *nodeServer) loadState() error { | ||
| if ns.stateFile == "" { | ||
| return nil | ||
| } | ||
|
|
||
| ns.mutex.Lock() | ||
| defer ns.mutex.Unlock() | ||
|
|
||
| data, err := os.ReadFile(ns.stateFile) | ||
| if err != nil { | ||
| if os.IsNotExist(err) { | ||
| klog.Info("No persisted volume state found, starting fresh") | ||
| return nil | ||
| } | ||
| return fmt.Errorf("failed to read state file: %v", err) | ||
| } | ||
|
|
||
| var volumes map[string]*MountedVolume | ||
| if err := json.Unmarshal(data, &volumes); err != nil { | ||
| return fmt.Errorf("failed to unmarshal volume state: %v", err) | ||
| } | ||
|
|
||
| ns.mountedVolumes = volumes | ||
| klog.Infof("Loaded %d tracked volumes from %s", len(ns.mountedVolumes), ns.stateFile) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(non-blocking): I am being incredibly picky here.. but it is not a bad idea to do one of the following:
- Note in the docstring that this will overwrite the map of volumes in memory
- Fail if the map in memory is not empty but this is called (maybe this is a bit too extreme)
- Merge. This may require looping through the map from disk. But you can also try to unmarshal directly into
ns.mountedVolumesbut I am not sure really if this is the same as overwrite or it will just update the fields it can find.
| func (ns *nodeServer) remountTrackedVolumes(ctx context.Context) error { | ||
| ns.mutex.Lock() | ||
| defer ns.mutex.Unlock() | ||
|
|
||
| if len(ns.mountedVolumes) == 0 { | ||
| klog.Info("No tracked volumes to remount") | ||
| return nil | ||
| } | ||
|
|
||
| klog.Infof("Remounting %d tracked volumes", len(ns.mountedVolumes)) | ||
|
|
||
| for volumeId, mv := range ns.mountedVolumes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This is similar to the other cases. It is safer to load the text file and remount in one lock/unlock cycle.
This merges with a couple of fixes the code from #72.