-
The driving motivation is to deal with situations when and existing NATS cluster need to be migrated to new VMs/servers/hardware. The goal would be to not require clients to change the address they are connecting to. Obviously if client connections are temporarily interrupted and auto-reconnect, then that is fine (no different then a network hiccup). To setup a scenario, let's assume I have a 3-node cluster with Jetstream enabled on all nodes. I have one or more streams some of which are configured to have 3 replicas. The problem is that I am running out of storage and need to add more storage for existing streams but also new streams. I am aware that additional nodes can be added to a cluster with JS enabled and NATS will distribute stream replicas across the set of available nodes. My understanding is that replication/quorum happens on a per stream basis. Would it be possible to increase the size of the cluster, with the new nodes having the new storage requirements, and the stream be moved transparently to these new nodes? I know a mirror could be setup, but this would require clients to know to switch over to publishing and consuming to this new stream. Is there some other transparent way of handling this? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 16 replies
-
Generally you'd want to keep your server names static, so in your 3 node cluster case you'd bring up replacement nodes on the same name - possibly CNAME of course - and with appropriate TLS certs etc. When you cleanly shut down a old one and bring a new one it would start syncing data soon as it comes online of any streams assigned to it. You want to wait for this process to fully complete, all streams in all accounts to be up to date - then move to the next machine. Always use Lame Duck Mode to prepare a server for replacement. Otoh if you increase size and you want to redistribute streams, you'll have to use the cli to remove one peer from a stream and it will select a new one at random Both processes quite involved and manual and requires careful testing and planning. |
Beta Was this translation helpful? Give feedback.
-
Depending on your needs you can also backup and restore into a new cluster with different storage properties. |
Beta Was this translation helpful? Give feedback.
-
For no downtime, I would expand the cluster (JetStream assets live within a cluster scope, so any option of no downtime needs to work inside that structure) with additional nodes running JS. I would then move the non-leader peers for each asset that needs to be moved onto the new servers. This may be tricky since right now if you remove a peer it will pick a new one but could pick one of the old ones. Will think more on that. Then move the leader by asking the asset group to elect a new leader and then remove that peer after that operation has been successful. |
Beta Was this translation helpful? Give feedback.
-
LDM gracefully shuts a node down moving its leadships elsewhere etc. you should always shut down a JS node that way if possible. but it might still come, maybe just maintenance or something. It’s not till you remove it from the RAFT group that a new member is picked in it’s place |
Beta Was this translation helpful? Give feedback.
-
Created a repo with the progress thus far: https://github.com/bruth/nats-zero-down/tree/main/replace-cluster-nodes. Further work will be updated there. |
Beta Was this translation helpful? Give feedback.
-
This question is pretty old but in the interest of cleanup: you can now control the placement of streams within a cluster or even between clusters using the placement tags. You can even move existing streams using those placements tags, without any interruption of service, and in any case the client applications do not need to know where a particular stream is located it's all completely transparent to the client applications. |
Beta Was this translation helpful? Give feedback.
This question is pretty old but in the interest of cleanup: you can now control the placement of streams within a cluster or even between clusters using the placement tags.
You can even move existing streams using those placements tags, without any interruption of service, and in any case the client applications do not need to know where a particular stream is located it's all completely transparent to the client applications.