Add nodes to a cluster from a different machine, making a cluster distributed #635

Evengard · 2021-06-14T22:42:14Z

Evengard
Jun 14, 2021

Hello!

Let's assume I have two physical machines, machineA with IP 233.252.0.100 and a machineB with IP 198.51.100.50. Theese both machines are able to communicate with each other via Internet.

Now, I create a cluster with k3d on machineA with, say, 1 server and 3 agents. And then I want to join another 2 servers and another 3 agents to this same cluster, but from machineB, making the cluster distributed across theese two machines.

Is there a way to achieve it?

Answered by Evengard

Jun 28, 2021

Well, my problem was indeed etcd majority problems, so I ended up with 3 KVM hosts, having following setups:

Management:

version: "2.4"

services:
  server-manage:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --cluster-init --tls-san 192.168.100.106 --tls-san 127.0.0.1 --tls-san 10.110.101.0 --node-name server-manage --advertise-address 192.168.100.106 --https-listen-port 6443 --kube-proxy-arg=conntrack-max-per-core=0 --disable traefik --node-ip=10.110.101.0"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
    - K3S_KUBECONFIG_MODE=666
    volumes:
    - …

View full answer

Evengard · 2021-06-15T01:02:55Z

Evengard
Jun 15, 2021
Author

After investigating some more, I found out that k3d uses the rancher/k3s image, which is basically a wrapper around k3s itself, so I guess I can try to use k3s documentation and start it off manually.
Notably, k3d setups the following volume mounts for the docker container:
/var/lib/kubelet
/var/lib/rancher/k3s
/var/log
/k3d/images
/var/lib/cni

2 replies

Evengard Jun 15, 2021
Author

While that may be a solution, I have trouble starting it up as I am having problems with cgroups such as failed to \"CreatePodSandbox\" for \"coredns-7448499f4d-brzvr_kube-system(d3782394-d10c-4ac2-ae90-908d22b833f1)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"coredns-7448499f4d-brzvr_kube-system(d3782394-d10c-4ac2-ae90-908d22b833f1)\\\": rpc error: code = Unknown desc = failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: failed to write 721 to cgroup.procs: write /sys/fs/cgroup/kubepods/burstable/podd3782394-d10c-4ac2-ae90-908d22b833f1/5443302309609862f125dc3ea604d1a38d783e76fb52a9c05322bf31ed9cc476/cgroup.procs: no such file or directory: unknown\"" pod="kube-system/coredns-7448499f4d-brzvr"
Honestly, I'm stuck.

Evengard Jun 15, 2021
Author

After struggling, I came up with this command:
docker run --entrypoint /bin/entrypoint.sh -v /srv/k3s/entrypoint.sh:/bin/entrypoint.sh --privileged -it --rm -p 42000:42000 --network k3s -v /srv/k3s/server0/var/lib/kubelet:/var/lib/kubelet -v /srv/k3s/server0/var/lib/rancher/k3s:/var/lib/rancher/k3s -v /srv/k3s/server0/var/log:/var/log -v /srv/k3s/server0/k3d/images:/k3d/images -v /srv/k3s/server0/var/lib/cni:/var/lib/cni rancher/k3s:latest server --cluster-init --tls-san 0.0.0.0 --tls-san 127.0.0.1 --tls-san 10.0.0.2 --node-name server0 --advertise-address <redacted (external ip)> --https-listen-port 42000 --kube-proxy-arg=conntrack-max-per-core=0
Here /srv/k3s/entrypoint.sh is this file: https://github.com/rancher/k3d/blob/main/pkg/types/fixes/assets/cgroupv2-entrypoint.sh
I am able to connect to it through the kubectl command, so seems to be fine. More tests ahead!

iwilltry42 · 2021-06-15T06:38:31Z

iwilltry42
Jun 15, 2021
Maintainer

Hey @Evengard , thanks for starting this discussion!
What's your use case for this? Note, that for production-like setup I'd rather recommend pure/raw k3s instead of k3d (to avoid the drawbacks of the additional docker layer).

There are a few open issues in this repo talking about exactly this topic.
It's planned to do some work on this for the next big release (v5.0.0).

What may help you right now is the official k3s docker-compose file: https://github.com/k3s-io/k3s/blob/master/docker-compose.yml.

Related issues:

1 reply

Evengard Jun 15, 2021
Author

While my project is single-server, I wanted to leverage the kubernetes capabilities for my app. But the typical kubernetes setup for single-server deployments is really non-extensible, and requires to run pods on the control plane, which in my opinion is not really "clean" to do, as it can possibly mess it up. And even if it is possible, going back from this setup to something more "production-sensible" is literally impossible, you need to basically recreate the whole cluster (and I am trying to avoid recreating the cluster).

My case is that I want to setup a "production-similar" cluster, with the ability to extend it across machines - but the actual extension mechanism is mostly for the simplicity of migrating from one server to another, the idea is that I add a control-plane and an agent on the second server and then just drain the first server control-plane and agent nodes into oblivion.

While running k3s directly is a sound approach (and I am already using a KVM guest for k3d), I am trying to keep the hypervisor host as "clean" as possible, for the ease of upgrading, minimizing the configuration to something KISS, and spinning up additional 8 KVM hosts would be too cumbersome and error-prone for my setup. I also have a need to keep my other KVM guests mostly intact, as they are hosting some legacy production services (and kubernetes cluster is for the upgraded version), which should not go down. So fiddling around with the hypervisor host should be kept to the bare minimum.

I still intend to spin 2 KVM guests for the cluster, one acting as a main one with 2 k3s server nodes, and the other one backup one in case if the main one is unavailable with 1 k3s server node, resulting in a total of 3 k3s server nodes (with some amount of agent nodes).

So my use cases are:

Setting up a "production-like" cluster but for a single-server setup, with separated control-plane and agent nodes
Being able to extend the cluster for ease of migration between servers (or maybe just extensibility in the future if the project grows)
Keeping the virtual machines to a minimum to avoid problems with their setup and impact on some production ones
Using a second VM as a fallback/backup host for the same cluster

Checking out the docker-compose file, I see that it basically uses the same rancher/k3s docker image, so my idea seems to be correct.

Evengard · 2021-06-26T20:37:41Z

Evengard
Jun 26, 2021
Author

The results are... Kinda weird. No, I actually managed to setup the configuration I described above... The problem is that when I stop the "primary" server with 2 control plane nodes, leaving only the "fallback" one, the cluster seems to be dead, as the control-plane node on the fallback server for some reason just becomes unresponsive...

I've set it up as follows (two separate docker-compose files for two separate KVM hosts):

Primary:

version: "2.4"

services:
  server0-0:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --cluster-init --tls-san 0.0.0.0 --tls-san 127.0.0.1 --tls-san 10.110.101.0 --node-name server0-0 --advertise-address <redacted external IP> --https-listen-port 42000 --kube-proxy-arg=conntrack-max-per-core=0 --disable traefik --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
    - K3S_KUBECONFIG_MODE=666
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/server0-0/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/server0-0/log:/var/log
    # This is just so that we get the kubeconfig file out
    - /srv/k3s/config:/output
    ports:
    - 42000:42000
    networks:
      k3s:
        ipv4_address: 10.110.101.0

  server0-1:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --kube-proxy-arg=conntrack-max-per-core=0 --node-name=server0-1 --advertise-address <redacted external IP> --https-listen-port 42000 --tls-san 0.0.0.0 --tls-san 127.0.0.1 --tls-san 10.110.101.1 --disable traefik --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.101.0:42000
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/server0-1/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/server0-1/log:/var/log
    - type: bind
      source: /srv/k3s/data/server0-0/k3s/server/token
      target: /token
      read_only: true
    networks:
      k3s:
        ipv4_address: 10.110.101.1
    depends_on:
    - server0-0
    
  agent:
    image: "rancher/k3s:latest"
    scale: 3
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=agent --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.101.0:42000
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - type: bind
      source: /srv/k3s/data/server0-0/k3s/server/token
      target: /token
      read_only: true
    networks:
      k3s:
    depends_on:
    - server0-0
    - server0-1
    - ingress0
    
  ingress0:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=ingress0 --node-ip=10.110.100.0 --node-taint=ingress:NoSchedule --node-label=ingress=true --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.101.0:42000
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/ingress0/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/ingress0/log:/var/log
    - type: bind
      source: /srv/k3s/data/server0-0/k3s/server/token
      target: /token
      read_only: true
    network_mode: "host"
    depends_on:
    - server0-0
    - server0-1
    
networks:
  k3s:
    name: k3s
    driver_opts:
      com.docker.network.bridge.name: "k3sbr"
    ipam:
      config:
        - subnet: 10.110.0.0/16
          ip_range: 10.110.102.0/24
          gateway: 10.110.100.0

Fallback:

version: "2.4"

services:
  server1-0:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --kube-proxy-arg=conntrack-max-per-core=0 --node-name=server1-0 --advertise-address <redacted external ip> --https-listen-port 45000 --tls-san 0.0.0.0 --tls-san 127.0.0.1 --tls-san 10.110.111.0 --disable traefik --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.101.0:42000
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/server1-0/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/server1-0/log:/var/log
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    networks:
      k3s:
        ipv4_address: 10.110.111.0
    ports:
    - 45000:45000
    
  agent:
    image: "rancher/k3s:latest"
    scale: 2
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=agent --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.111.0:45000
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    networks:
      k3s:
    depends_on:
    - server1-0
    - ingress1
    
  ingress1:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=ingress1 --node-ip=10.110.110.0 --node-taint=ingress:NoSchedule --node-label=ingress=true --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.111.0:45000
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/ingress1/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/ingress1/log:/var/log
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    network_mode: "host"
    depends_on:
    - server1-0
    
networks:
  k3s:
    name: k3s
    driver_opts:
      com.docker.network.bridge.name: "k3sbr"
    ipam:
      config:
        - subnet: 10.110.0.0/16
          ip_range: 10.110.112.0/24
          gateway: 10.110.110.0

The resulting cluster looks like:

# kubectl get nodes -o wide
NAME                 STATUS   ROLES                       AGE    VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE   KERNEL-VERSION   CONTAINER-RUNTIME
agent-65ab5fe8       Ready    <none>                      3h8m   v1.21.1+k3s1   10.110.102.0   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
agent-8b744158       Ready    <none>                      102m   v1.21.1+k3s1   10.110.112.1   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
agent-908d49dd       Ready    <none>                      3h8m   v1.21.1+k3s1   10.110.102.1   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
agent-924ace73       Ready    <none>                      102m   v1.21.1+k3s1   10.110.112.0   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
agent-fe4c0e14       Ready    <none>                      3h8m   v1.21.1+k3s1   10.110.102.2   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
ingress0-c7f945ad    Ready    <none>                      3h8m   v1.21.1+k3s1   10.110.100.0   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
ingress1-78f7a987    Ready    <none>                      99m    v1.21.1+k3s1   10.110.110.0   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
server0-0-8af045db   Ready    control-plane,etcd,master   3h8m   v1.21.1+k3s1   10.110.101.0   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
server0-1-633a57e7   Ready    control-plane,etcd,master   3h8m   v1.21.1+k3s1   10.110.101.1   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2
server1-0-3646eadc   Ready    control-plane,etcd,master   103m   v1.21.1+k3s1   10.110.111.0   <none>        Unknown    5.10.0-7-amd64   containerd://1.4.4-k3s2

The docker bridges are interconnected between themselves with manually setup VXLAN (that's why I defined "com.docker.network.bridge.name" to a static name - to be able to attach to the bridge the VXLAN interface).

The "ingress" agents are hosted with "network_mode: host" to be able to setup on them ingress services to actually be able to use the cluster the "kubernetes-way". That's basically the entry-point. I marked them with a custom label and a taint to avoid scheduling regular nodes on them.

The setup is MOSTLY fine, but there is still this problem of the fallback server (control plane) node not taking over when the main ones are dead...

2 replies

Evengard Jun 27, 2021
Author

The most weird thing is that when I stop either of the "main" host control plane instances, the cluster still works (though if I stop server0-0, the agent nodes of the main host goes down, but I can still query the cluster using the api server from server1-0).

Evengard Jun 27, 2021
Author

After reading logs I have an idea of what is happening. Apparently, what I am hitting is what seems to be an etcd cluster failure due to majority failure. I will try to work it another way, creating 3 KVM hosts:

"ClusterManagement" with only 1 or 3 master nodes and NOTHING else (theese are needed for the cluster to survive). 1 node may be acceptable, as we're going to have only 3 KVM hosts, where if both the "main" and "backup" are dead, then the whole cluster is basically dead anyway (as we don't have any place to execute workloads). 2 nodes may create us an even amount of master nodes, which gives us nothing compared to 1 node regarding the quorum needed for the cluster to survive...
"main" with 1 master node and "attached" to it main agent nodes for the main workloads.
"backup" with 1 master node and "attached" to it backup agent nodes for the failover workloads.

Evengard · 2021-06-28T00:02:54Z

Evengard
Jun 28, 2021
Author

Well, my problem was indeed etcd majority problems, so I ended up with 3 KVM hosts, having following setups:

Management:

version: "2.4"

services:
  server-manage:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --cluster-init --tls-san 192.168.100.106 --tls-san 127.0.0.1 --tls-san 10.110.101.0 --node-name server-manage --advertise-address 192.168.100.106 --https-listen-port 6443 --kube-proxy-arg=conntrack-max-per-core=0 --disable traefik --node-ip=10.110.101.0"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
    - K3S_KUBECONFIG_MODE=666
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/server-manage/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/server-manage/log:/var/log
    # This is just so that we get the kubeconfig file out
    - /srv/k3s/config:/output
    ports:
    - 6443:6443
    networks:
      k3s:
        ipv4_address: 10.110.101.0
        
networks:
  k3s:
    name: k3s
    driver_opts:
      com.docker.network.bridge.name: "k3sbr"
    ipam:
      config:
        - subnet: 10.110.0.0/16
          ip_range: 10.110.102.0/24
          gateway: 10.110.100.0

Primary:

version: "2.4"

services:
  server-main:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --kube-proxy-arg=conntrack-max-per-core=0 --node-name=server-main --https-listen-port 6443 --tls-san 127.0.0.1 --tls-san 10.110.111.0 --tls-san 192.168.100.102 --disable traefik --node-ip=10.110.111.0"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
    - K3S_KUBECONFIG_MODE=666
    - K3S_URL=https://10.110.101.0:6443
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/server-main/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/server-main/log:/var/log
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    - /srv/k3s/config:/output
    networks:
      k3s:
        ipv4_address: 10.110.111.0
    ports:
    - 6443:6443
    
  agent-main:
    image: "rancher/k3s:latest"
    scale: 3
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=agent-main --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.111.0:6443
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    networks:
      k3s:
    depends_on:
    - server-main
    - ingress-main
    
  ingress-main:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=ingress-main --node-ip=10.110.110.0 --node-taint=ingress:NoSchedule --node-label=ingress=true"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.111.0:6443
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/ingress-main/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/ingress-main/log:/var/log
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    network_mode: "host"
    depends_on:
    - server-main
    
networks:
  k3s:
    name: k3s
    driver_opts:
      com.docker.network.bridge.name: "k3sbr"
    ipam:
      config:
        - subnet: 10.110.0.0/16
          ip_range: 10.110.112.0/24
          gateway: 10.110.110.0

Fallback:

version: "2.4"

services:
  server-fallback:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "server --kube-proxy-arg=conntrack-max-per-core=0 --node-name=server-fallback --https-listen-port 6443 --tls-san 127.0.0.1 --tls-san 10.110.121.0 --tls-san 192.168.100.105 --disable traefik --node-ip=10.110.121.0"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
    - K3S_KUBECONFIG_MODE=666
    - K3S_URL=https://10.110.101.0:6443
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/server-fallback/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/server-fallback/log:/var/log
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    - /srv/k3s/config:/output
    networks:
      k3s:
        ipv4_address: 10.110.121.0
    ports:
    - 6443:6443
    
  agent-fallback:
    image: "rancher/k3s:latest"
    scale: 2
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=agent-fallback --with-node-id"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.121.0:6443
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    networks:
      k3s:
    depends_on:
    - server-fallback
    - ingress-fallback
    
  ingress-fallback:
    image: "rancher/k3s:latest"
    entrypoint: /entrypoint.sh
    command: "agent --kube-proxy-arg=conntrack-max-per-core=0 --node-name=ingress-fallback --node-ip=10.110.120.0 --node-taint=ingress:NoSchedule --node-label=ingress=true"
    tmpfs:
    - /run
    - /var/run
    privileged: true
    restart: always
    environment:
    - K3S_URL=https://10.110.121.0:6443
    - K3S_TOKEN_FILE=/token
    volumes:
    - type: bind
      source: /srv/k3s/scripts/entrypoint.sh
      target: /entrypoint.sh
      read_only: true
    - /srv/k3s/data/ingress-fallback/k3s:/var/lib/rancher/k3s    
    - /srv/k3s/data/ingress-fallback/log:/var/log
    - type: bind
      source: /srv/k3s/config/token
      target: /token
      read_only: true
    network_mode: "host"
    depends_on:
    - server-fallback
    
networks:
  k3s:
    name: k3s
    driver_opts:
      com.docker.network.bridge.name: "k3sbr"
    ipam:
      config:
        - subnet: 10.110.0.0/16
          ip_range: 10.110.122.0/24
          gateway: 10.110.120.0

Now when I disable any of theese 3 hosts separately - the cluster doesn't die. It still dies if 2 out of 3 hosts are dead, but there's probably nothing that can be done about it (although I really expected that the majority quorum would be dynamically recalculated based on available cluster members, not total known) - and it most cases it makes sense (except when one of main/fallback and management are dead - I would expect it to still work in a degraded mode, but because of this majority quorum - it can't).

Again, the docker networks are interconnected between themselves with help of a VXLAN (hence why the "com.docker.network.bridge.name" driver opt), and can be connected with help of some kind of a VPN solution to another host across Internet if needed (to join it to the cluster, for example). Now I have a handy management host, it could be used to actually be the gate for the VPN =)

3 replies

iwilltry42 Jun 28, 2021
Maintainer

Hey @Evengard , I just went through the whole discussion (sorry I couldn't help with it, I was gone for a few days).. what a ride :D
After all, the only thing I could've told you would've been the thing about the etcd quorum that you figured our yourself.
One thing that you may want to make sure is, that the registration URL should be a loadbalancer or at least an IP that you can easily move to point to a different control plane node, so that nodes can (re-)join, even if the initial server node is down.
Also, you may want to have a look into the external datastore option, where you host etcd outside of the cluster 🤔
Great job setting this up! 👍

Evengard Jun 28, 2021
Author

That's actually a good point, about the registration URL. I should probably set it up as a domain name, so that I can easily change it to whatever else I need. I am running dnsmasq anyways on the host machine (which manages DHCP for the KVM guests).

If I host etcd outside the cluster I then loose the ability for the cluster to be resilient from the outage of the only etcd instance, and if I setup a redundant etcd cluster, well, there isn't much benefit to host it inside the master node after all.

To be honest, before all of this I was sure that the cluster recalculates majority based on available etcd nodes, not total including unavailable ones =) That's a bit sad that it is, but I can guess that in case if the cluster splits apart into 2 isolated subnets (each of them electing a leader based on "local majority") and then joining back together, there is no telling how to merge all the distinct data back...

iwilltry42 Jun 28, 2021
Maintainer

Yeah.. High Availability is always.. tricky 😬
That's just how etcd does it: https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance (and the sections right before and after that)

Uh oh!

Add nodes to a cluster from a different machine, making a cluster distributed #635

Uh oh!

Uh oh!

Evengard Jun 14, 2021

Replies: 4 comments · 8 replies

Uh oh!

Evengard Jun 15, 2021 Author

Uh oh!

Evengard Jun 15, 2021 Author

Uh oh!

Uh oh!

Evengard Jun 15, 2021 Author

Uh oh!

iwilltry42 Jun 15, 2021 Maintainer

Uh oh!

Uh oh!

Evengard Jun 15, 2021 Author

Uh oh!

Uh oh!

Evengard Jun 26, 2021 Author

Uh oh!

Uh oh!

Evengard Jun 27, 2021 Author

Uh oh!

Uh oh!

Evengard Jun 27, 2021 Author

Uh oh!

Uh oh!

Evengard Jun 28, 2021 Author

Uh oh!

iwilltry42 Jun 28, 2021 Maintainer

Uh oh!

Uh oh!

Evengard Jun 28, 2021 Author

Uh oh!

iwilltry42 Jun 28, 2021 Maintainer

Evengard
Jun 14, 2021

Replies: 4 comments 8 replies

Evengard
Jun 15, 2021
Author

Evengard Jun 15, 2021
Author

Evengard Jun 15, 2021
Author

iwilltry42
Jun 15, 2021
Maintainer

Evengard Jun 15, 2021
Author

Evengard
Jun 26, 2021
Author

Evengard Jun 27, 2021
Author

Evengard Jun 27, 2021
Author

Evengard
Jun 28, 2021
Author

iwilltry42 Jun 28, 2021
Maintainer

Evengard Jun 28, 2021
Author

iwilltry42 Jun 28, 2021
Maintainer