-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I'm trying clustered nodes with ctdb for the first time, using version 0.5. I have been able to get it to work nicely, with a cephfs backend. It's very pleasing!
I've been experimenting with failover etc. by randomly deleting the pods that the operator creates (to simulate evictions or node failures or whatever). What I'm experiencing is that if I delete the second pod (e.g., in my case, cmu-fileshare-1), then it comes back up in the expected way. However, if I delete the pods "out of order" -- that is, delete the first pod (in my case, cmu-fileshare-0), then the pod doesn't come back up successfully.
What I see from kubectl get pods
is this:
NAME READY STATUS RESTARTS AGE
cmu-fileshare-0 4/5 CrashLoopBackOff 2 (15s ago) 2m23s
cmu-fileshare-1 5/5 Running 0 6m40s
And what I see from kubectl logs cmu-fileshare-0 -c wb
is this:
2024-06-05 02:41:05,535: INFO: Enabling ctdb in samba config file
winbindd version 4.19.6 started.
Copyright Andrew Tridgell and the Samba Team 1992-2023
Could not fetch our SID - did we join?
unable to initialize domain list
I'm wondering whether this might be related to #262, which is another issue which may have something to do with the exact order in which nodes are brought up, and whether certain initialization steps are performed or skipped.
I'll dive into this further if I have time -- just thought I'd jot down the experience in case it is helpful to anyone.