Complete Homelab Setup Guide

4-Node k3s Cluster with Full Observability Stack

Time to Complete: ~30-45 minutes
Difficulty: Intermediate
Cost: Free (all open-source tools)

Prerequisites
Architecture Overview
Step 1: Install Required Software
Step 2: Create Project Directory
Step 3: Configure Vagrant
Step 4: Launch Virtual Machines
Step 5: Configure SSH Access
Step 6: Install k3s Cluster
Step 7: Install Observability Stack
Step 8: Verify Installation
Step 9: Explore Grafana Dashboards
Step 10: Deploy Sample Application
Troubleshooting
Daily Operations
Cleanup and Maintenance

Prerequisites

Hardware Requirements

CPU: Multi-core processor (4+ cores recommended)
RAM: Minimum 16GB (20GB+ recommended)
Disk: 50GB+ free space
Network: Active network connection for VM bridging

Software Requirements

Host OS: macOS, Linux, or Windows
VirtualBox: 7.0 or later
Vagrant: 2.0 or later
SSH: OpenSSH client (pre-installed on macOS/Linux)

Network Requirements

Access to a /22 network (192.168.4.0/22 in this guide)
Your host machine should be on this network
No firewall blocking ports: 22, 80, 443, 6443, 30080, 30090, 30093

Knowledge Requirements

Basic command-line skills
Understanding of SSH
Familiarity with YAML (helpful but not required)
Basic Kubernetes concepts (helpful but not required)

Architecture Overview

Cluster Design

                    ┌─────────────────────────────────────┐
                    │     Your Host Machine (Mac/PC)      │
                    │  - kubectl configured               │
                    │  - SSH access to all nodes          │
                    │  - Browser access to dashboards     │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────┴──────────────┐
                    │  192.168.4.0/22 Network     │
                    │  (Bridged Networking)        │
                    └──────────────┬──────────────┘
                                   │
        ┌──────────────────────────┼──────────────────────────┐
        │                          │                          │
┌───────▼────────┐    ┌───────────▼─────┐    ┌──────────────▼──────┐
│  Master Node   │    │  Worker Node 1   │    │  Worker Nodes 2 & 3 │
│ 192.168.5.200  │    │ 192.168.5.201    │    │ 192.168.5.202-203   │
│                │    │                  │    │                     │
│ - k3s server   │    │ - k3s agent      │    │ - k3s agent         │
│ - etcd         │    │ - Workloads      │    │ - Workloads         │
│ - API server   │    │                  │    │                     │
│ - Monitoring   │    │                  │    │                     │
│   Stack:       │    │                  │    │                     │
│   • Prometheus │    │                  │    │                     │
│   • Grafana    │    │                  │    │                     │
│   • Loki       │    │                  │    │                     │
│                │    │                  │    │                     │
│ 2 CPU / 4GB    │    │ 2 CPU / 4GB      │    │ 2 CPU / 4GB each    │
└────────────────┘    └──────────────────┘    └─────────────────────┘

Components

Component	Purpose	Port
k3s	Lightweight Kubernetes distribution	6443
Prometheus	Metrics collection and storage	30090
Grafana	Visualization and dashboards	30080
Loki	Log aggregation	3100
Alertmanager	Alert management	30093
Node Exporter	Host metrics	9100
Promtail	Log collection agent	9080

Network Configuration

Subnet: 192.168.4.0/22 (255.255.252.0)
Gateway: 192.168.4.1
DNS: 8.8.8.8, 8.8.4.4
IP Range: 192.168.4.0 - 192.168.7.255 (1024 addresses)

Step 1: Install Required Software

macOS

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install VirtualBox
brew install --cask virtualbox

# Install Vagrant
brew install --cask vagrant

# Install vagrant-disksize plugin (optional, for larger disks)
vagrant plugin install vagrant-disksize

# Verify installations
virtualbox --help
vagrant --version

Linux (Ubuntu/Debian)

# Update package list
sudo apt update

# Install VirtualBox
sudo apt install -y virtualbox virtualbox-ext-pack

# Install Vagrant
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y vagrant

# Install vagrant-disksize plugin
vagrant plugin install vagrant-disksize

# Verify installations
vboxmanage --version
vagrant --version

Windows

Download and install VirtualBox
Download and install Vagrant
Open PowerShell as Administrator and run:
```
vagrant plugin install vagrant-disksize
```

Step 2: Create Project Directory

# Create project directory
mkdir -p ~/homelabs
cd ~/homelabs

# Create a README placeholder
echo "# Homelabs k3s Cluster" > README.md

# Verify you're in the right directory
pwd

Expected Output: /Users/yourusername/homelabs (or similar)

Step 3: Configure Vagrant

3.1: Create Vagrantfile

Create a file named Vagrantfile in your project directory:

cd ~/homelabs
nano Vagrantfile  # or use your preferred editor

Copy and paste this content:

# Homelabs 4-node Debian cluster (bridged, /22)
# Requires: VirtualBox, Vagrant
# Optional: vagrant-disksize plugin for 32GB disks:
#   vagrant plugin install vagrant-disksize

NETMASK  = "255.255.252.0"  # /22 (192.168.4.0–192.168.7.255)
GATEWAY  = "192.168.4.1"
DNS      = ["8.8.8.8", "8.8.4.4"]

NODES = [
  {name: "master",  ip: "192.168.5.200", cpu: 2, ram: 4096},
  {name: "worker1", ip: "192.168.5.201", cpu: 2, ram: 4096},
  {name: "worker2", ip: "192.168.5.202", cpu: 2, ram: 4096},
  {name: "worker3", ip: "192.168.5.203", cpu: 2, ram: 4096},
]

Vagrant.configure("2") do |config|
  config.vm.box = "debian/bookworm64"
  config.vm.synced_folder ".", "/vagrant", disabled: true

  # Make sure vbguest (if present) doesn't interfere
  if Vagrant.has_plugin?("vagrant-vbguest")
    config.vbguest.auto_update = false
    config.vbguest.no_remote = true
  end

  NODES.each do |node|
    config.vm.define node[:name] do |vm|
      vm.vm.hostname = node[:name]

      # Bridged NIC with static IP on /22
      vm.vm.network :public_network,
        ip: node[:ip],
        netmask: NETMASK
        # Uncomment and modify if you want to specify a bridge:
        # bridge: "en0: Wi-Fi (AirPort)"

      vm.vm.provider :virtualbox do |vb|
        vb.name   = "hlab-#{node[:name]}"
        vb.cpus   = node[:cpu]
        vb.memory = node[:ram]
        vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
      end

      # 32 GB disk via plugin (comment if you don't use the plugin)
      if Vagrant.has_plugin?("vagrant-disksize")
        vm.disksize.size = "32GB"
      end

      vm.vm.provision "shell", privileged: true, inline: <<-SHELL
        set -euo pipefail
        apt-get update -y
        apt-get install -y curl ca-certificates gnupg lsb-release jq net-tools iproute2

        # K8s-friendly sysctls (safe even if you don't install k3s yet)
        swapoff -a || true
        sed -i.bak '/ swap / s/^/#/' /etc/fstab || true
        modprobe br_netfilter || true
        sysctl -w net.ipv4.ip_forward=1
        echo 'net.ipv4.ip_forward=1' > /etc/sysctl.d/99-ipforward.conf
        echo 'net.bridge.bridge-nf-call-iptables=1' > /etc/sysctl.d/99-k8s.conf
        sysctl --system

        # Make bridged NIC the default route (optional but recommended here)
        IFACE=$(ip -o -4 addr show | awk '/192\\.168\\.(4|5|6|7)\\./ {print $2; exit}')
        if [ -n "$IFACE" ]; then
          ip route del default || true
          ip route add default via #{GATEWAY} dev "$IFACE"
        fi

        # DNS via systemd-resolved
        mkdir -p /etc/systemd/resolved.conf.d
        cat >/etc/systemd/resolved.conf.d/99-custom-dns.conf <<EOF
[Resolve]
DNS=#{DNS.join(" ")}
FallbackDNS=
Domains=
MulticastDNS=no
LLMNR=no
DNSSEC=no
EOF
        systemctl restart systemd-resolved || true
      SHELL

      # Copy your SSH public key for direct SSH access
      ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip rescue nil
      ssh_pub_key ||= File.readlines("#{Dir.home}/.ssh/id_ed25519.pub").first.strip rescue nil
      
      if ssh_pub_key
        vm.vm.provision "shell", privileged: false, inline: <<-SHELL
          echo "Adding your SSH public key to authorized_keys..."
          mkdir -p ~/.ssh
          chmod 700 ~/.ssh
          echo '#{ssh_pub_key}' >> ~/.ssh/authorized_keys
          chmod 600 ~/.ssh/authorized_keys
        SHELL
        
        vm.vm.provision "shell", privileged: true, inline: <<-SHELL
          echo "Adding your SSH public key to root authorized_keys..."
          mkdir -p /root/.ssh
          chmod 700 /root/.ssh
          echo '#{ssh_pub_key}' >> /root/.ssh/authorized_keys
          chmod 600 /root/.ssh/authorized_keys
        SHELL
      end
    end
  end
end

Important: After creating the file, you need to identify your network interface.

3.2: Identify Your Network Interface

# On macOS/Linux
ifconfig | grep -A 1 "192.168.[4-7]"

# Or use ip command
ip addr show | grep "192.168.[4-7]"

Example output:

en2: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	inet 192.168.5.10 netmask 0xfffffc00 broadcast 192.168.7.255

In this example, the interface is en2: Wi-Fi (AirPort).

3.3: Update Vagrantfile with Your Interface

Edit the Vagrantfile and update line 35 to specify your bridge interface:

# Before (line 35)
        # bridge: "en0: Wi-Fi (AirPort)"

# After (replace with YOUR interface)
        bridge: "en2: Wi-Fi (AirPort)"

Save the file.

Step 4: Launch Virtual Machines

4.1: Start VMs

cd ~/homelabs

# Start all VMs (this will take 10-15 minutes)
vagrant up

What happens:

Downloads Debian Bookworm base box (~350MB, first time only)
Creates 4 VMs
Configures networking
Installs required packages
Configures kernel parameters for Kubernetes
Sets up SSH keys

4.2: Monitor Progress

You'll see output like:

Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'worker1' up with 'virtualbox' provider...
...
==> master: Box 'debian/bookworm64' could not be found...
==> master: Adding box 'debian/bookworm64'

Wait for completion. You should see:

==> master: Machine 'master' has a post `vagrant up` message...
==> worker1: Machine 'worker1' has a post `vagrant up` message...
==> worker2: Machine 'worker2' has a post `vagrant up` message...
==> worker3: Machine 'worker3' has a post `vagrant up` message...

4.3: Verify VMs

# Check VM status
vagrant status

Expected output:

Current machine states:

master                    running (virtualbox)
worker1                   running (virtualbox)
worker2                   running (virtualbox)
worker3                   running (virtualbox)

Step 5: Configure SSH Access

5.1: Test Vagrant SSH

# Test SSH through Vagrant
vagrant ssh master -c "hostname && whoami"

Expected output:

master
vagrant

5.2: Test Direct SSH

# Test direct SSH (should work without password)
ssh vagrant@192.168.5.200 "hostname && whoami"

Expected output:

master
vagrant

5.3: Create SSH Helper Script

Create scripts/ssh-nodes.sh:

cat > ~/homelabs/scripts/ssh-nodes.sh << 'EOF'
#!/bin/bash
# Quick SSH helper for homelabs cluster nodes

case "$1" in
  master|m)
    ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.200
    ;;
  worker1|w1)
    ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.201
    ;;
  worker2|w2)
    ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.202
    ;;
  worker3|w3)
    ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.203
    ;;
  *)
    echo "Usage: $0 {master|m|worker1|w1|worker2|w2|worker3|w3}"
    exit 1
    ;;
esac
EOF

chmod +x ~/homelabs/scripts/ssh-nodes.sh

Test it:

./scripts/ssh-nodes.sh master
# You should be logged into the master node
exit

Step 6: Install k3s Cluster

6.1: Create k3s Master Installation Script

Create scripts/install-k3s-master.sh:

cat > ~/homelabs/scripts/install-k3s-master.sh << 'EOF'
#!/bin/bash
# Install k3s master node with observability stack

set -euo pipefail

echo "=== Installing k3s master node ==="

# Install k3s server
curl -sfL https://get.k3s.io | sh -s - server \
  --write-kubeconfig-mode 644 \
  --disable traefik \
  --node-ip 192.168.5.200 \
  --node-external-ip 192.168.5.200 \
  --bind-address 192.168.5.200 \
  --advertise-address 192.168.5.200 \
  --tls-san 192.168.5.200

# Wait for k3s to be ready
echo "Waiting for k3s to be ready..."
until kubectl get nodes &>/dev/null; do
  sleep 2
done

echo "=== k3s master installed successfully ==="
echo "Node token:"
sudo cat /var/lib/rancher/k3s/server/node-token
echo ""
echo "Kubeconfig is available at: /etc/rancher/k3s/k3s.yaml"
EOF

chmod +x ~/homelabs/scripts/install-k3s-master.sh

6.2: Create k3s Worker Installation Script

Create scripts/install-k3s-worker.sh:

cat > ~/homelabs/scripts/install-k3s-worker.sh << 'EOF'
#!/bin/bash
# Install k3s worker node

set -euo pipefail

if [ $# -ne 2 ]; then
  echo "Usage: $0 <MASTER_IP> <NODE_TOKEN>"
  exit 1
fi

MASTER_IP=$1
NODE_TOKEN=$2
CURRENT_IP=$(hostname -I | awk '{print $2}')

echo "=== Installing k3s worker on $HOSTNAME ($CURRENT_IP) ==="
echo "Joining cluster at: $MASTER_IP"

# Install k3s agent
curl -sfL https://get.k3s.io | K3S_URL="https://${MASTER_IP}:6443" \
  K3S_TOKEN="${NODE_TOKEN}" sh -s - agent \
  --node-ip "${CURRENT_IP}" \
  --node-external-ip "${CURRENT_IP}"

echo "=== k3s worker installed successfully ==="
EOF

chmod +x ~/homelabs/scripts/install-k3s-worker.sh

6.3: Install k3s on Master Node

cd ~/homelabs

# Copy script to master
scp scripts/install-k3s-master.sh root@192.168.5.200:/tmp/

# Execute installation
ssh root@192.168.5.200 'bash /tmp/scripts/install-k3s-master.sh'

This takes 1-2 minutes. You'll see:

=== Installing k3s master node ===
[INFO]  Finding release for channel stable
[INFO]  Using v1.33.5+k3s1 as release
...
=== k3s master installed successfully ===
Node token:
K10xxxxxxxxxxxx::server:xxxxxxxxxxxxx

Important: Copy the node token shown at the end!

6.4: Get Node Token

# Get the token (save this for the next step)
NODE_TOKEN=$(ssh root@192.168.5.200 'cat /var/lib/rancher/k3s/server/node-token')
echo "Node Token: $NODE_TOKEN"

6.5: Install k3s on Worker Nodes

# Install on all workers simultaneously
for ip in 192.168.5.201 192.168.5.202 192.168.5.203; do
  echo "Installing on $ip..."
  scp scripts/install-k3s-worker.sh root@$ip:/tmp/
  ssh root@$ip "bash /tmp/scripts/install-k3s-worker.sh 192.168.5.200 $NODE_TOKEN" &
done

# Wait for all to complete
wait
echo "All workers installed!"

This takes 1-2 minutes per worker.

6.6: Configure kubectl on Your Machine

# Create .kube directory
mkdir -p ~/.kube

# Copy kubeconfig from master
ssh root@192.168.5.200 'cat /etc/rancher/k3s/k3s.yaml' | \
  sed "s/127.0.0.1/192.168.5.200/g" > ~/.kube/config

# Set permissions
chmod 600 ~/.kube/config

# Set environment variable (add to ~/.bashrc or ~/.zshrc for persistence)
export KUBECONFIG=~/.kube/config

6.7: Verify Cluster

# Check nodes
kubectl get nodes -o wide

Expected output (wait 30 seconds if nodes show NotReady):

NAME      STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP
master    Ready    control-plane,master   2m    v1.33.5+k3s1   192.168.5.200   192.168.5.200
worker1   Ready    <none>                 1m    v1.33.5+k3s1   192.168.5.201   192.168.5.201
worker2   Ready    <none>                 1m    v1.33.5+k3s1   192.168.5.202   192.168.5.202
worker3   Ready    <none>                 1m    v1.33.5+k3s1   192.168.5.203   192.168.5.203

✅ Checkpoint: You now have a working 4-node k3s cluster!

Step 7: Install Observability Stack

7.1: Create Observability Installation Script

Create scripts/install-observability.sh:

cat > ~/homelabs/scripts/install-observability.sh << 'EOF'
#!/bin/bash
# Install observability stack: Prometheus, Grafana, Loki

set -euo pipefail

# Set kubeconfig for k3s
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

echo "=== Installing Observability Stack ==="

# Wait for cluster to be ready
echo "Checking cluster status..."
kubectl wait --for=condition=Ready nodes --all --timeout=300s

# Add Helm (k3s doesn't include it by default)
if ! command -v helm &> /dev/null; then
    echo "Installing Helm..."
    curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
fi

# Add Helm repositories
echo "Adding Helm repositories..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Create monitoring namespace
echo "Creating monitoring namespace..."
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -

# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
echo "Installing Prometheus + Grafana..."
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.retention=7d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
  --set grafana.adminPassword=admin \
  --set grafana.service.type=NodePort \
  --set grafana.service.nodePort=30080 \
  --set prometheus.service.type=NodePort \
  --set prometheus.service.nodePort=30090 \
  --set alertmanager.service.type=NodePort \
  --set alertmanager.service.nodePort=30093 \
  --wait --timeout=10m

# Install Loki for log aggregation
echo "Installing Loki..."
helm upgrade --install loki grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=10Gi \
  --set promtail.enabled=true \
  --set grafana.enabled=false \
  --wait --timeout=5m

# Configure Loki datasource in Grafana
echo "Configuring Loki datasource..."
cat <<DATASOURCE | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasource-loki
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  loki-datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      access: proxy
      url: http://loki:3100
      isDefault: false
      editable: true
DATASOURCE

# Restart Grafana to pick up Loki datasource
echo "Restarting Grafana..."
kubectl rollout restart deployment kube-prometheus-stack-grafana -n monitoring
kubectl rollout status deployment kube-prometheus-stack-grafana -n monitoring --timeout=5m

echo ""
echo "=== Observability Stack Installed Successfully ==="
echo ""
echo "Access URLs (from your host machine):"
echo "  Grafana:       http://192.168.5.200:30080"
echo "    Username: admin"
echo "    Password: admin"
echo ""
echo "  Prometheus:    http://192.168.5.200:30090"
echo "  Alertmanager:  http://192.168.5.200:30093"
echo ""
EOF

chmod +x ~/homelabs/scripts/install-observability.sh

7.2: Install Observability Stack

cd ~/homelabs

# Copy script to master
scp scripts/install-observability.sh root@192.168.5.200:/tmp/

# Execute installation (this takes 5-10 minutes)
ssh root@192.168.5.200 'bash /tmp/scripts/install-observability.sh'

What happens:

Installs Helm package manager
Adds Prometheus and Grafana Helm repositories
Installs kube-prometheus-stack (Prometheus, Grafana, Alertmanager)
Installs Loki stack (Loki, Promtail)
Configures Loki as a datasource in Grafana

This takes 5-10 minutes. You'll see:

=== Installing Observability Stack ===
Checking cluster status...
Installing Helm...
Adding Helm repositories...
Installing Prometheus + Grafana...
Installing Loki...
=== Observability Stack Installed Successfully ===

7.3: Verify Observability Pods

# Check all monitoring pods
kubectl get pods -n monitoring

Expected output (all pods should be Running):

NAME                                                        READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          3m
kube-prometheus-stack-grafana-xxx                           3/3     Running   0          3m
kube-prometheus-stack-kube-state-metrics-xxx                1/1     Running   0          3m
kube-prometheus-stack-operator-xxx                          1/1     Running   0          3m
kube-prometheus-stack-prometheus-node-exporter-xxx          1/1     Running   0          3m
loki-0                                                      1/1     Running   0          2m
loki-promtail-xxx                                           1/1     Running   0          2m
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          3m

✅ Checkpoint: Observability stack is installed!

Step 8: Verify Installation

8.1: Create Verification Script

Create scripts/verify-cluster.sh:

cat > ~/homelabs/scripts/verify-cluster.sh << 'EOF'
#!/bin/bash
# Verify k3s cluster and observability stack

set -euo pipefail

echo "╔════════════════════════════════════════════════════════════════╗"
echo "║  k3s Cluster Verification                                      ║"
echo "╚════════════════════════════════════════════════════════════════╝"
echo ""

export KUBECONFIG=~/.kube/config

# Colors
GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m'

check_mark="${GREEN}✓${NC}"
cross_mark="${RED}✗${NC}"

# Check kubectl
echo "🔍 Checking kubectl connectivity..."
if kubectl cluster-info &>/dev/null; then
    echo -e "  ${check_mark} kubectl is configured correctly"
else
    echo -e "  ${cross_mark} kubectl is NOT working"
    exit 1
fi

# Check nodes
echo ""
echo "🖥️  Checking cluster nodes..."
kubectl get nodes -o wide

NODE_COUNT=$(kubectl get nodes --no-headers | wc -l | tr -d ' ')
READY_COUNT=$(kubectl get nodes --no-headers | grep -c Ready || true)

if [ "$NODE_COUNT" -eq 4 ] && [ "$READY_COUNT" -eq 4 ]; then
    echo -e "  ${check_mark} All 4 nodes are Ready"
else
    echo -e "  ${cross_mark} Expected 4 Ready nodes, found ${READY_COUNT}/${NODE_COUNT}"
fi

# Check monitoring
echo ""
echo "📊 Checking observability stack..."
kubectl get pods -n monitoring

# Check endpoints
echo ""
echo "🌐 Testing service endpoints..."

test_url() {
    if curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "$1" | grep -q "200\|302"; then
        echo -e "  ${check_mark} $2: $1"
    else
        echo -e "  ${cross_mark} $2: $1"
    fi
}

test_url "http://192.168.5.200:30080/login" "Grafana    "
test_url "http://192.168.5.200:30090/graph" "Prometheus "
test_url "http://192.168.5.200:30093" "Alertmanager"

echo ""
echo "╔════════════════════════════════════════════════════════════════╗"
echo "║  ✅ Verification Complete!                                     ║"
echo "╚════════════════════════════════════════════════════════════════╝"
echo ""
echo "📊 Grafana: http://192.168.5.200:30080"
echo "   Username: admin | Password: admin"
echo ""
EOF

chmod +x ~/homelabs/scripts/verify-cluster.sh

8.2: Run Verification

cd ~/homelabs
./scripts/verify-cluster.sh

Expected output: All checks should show ✓ (green checkmarks).

Step 9: Explore Grafana Dashboards

9.1: Open Grafana

# Open Grafana in your browser
open http://192.168.5.200:30080
# Or manually navigate to: http://192.168.5.200:30080

9.2: Login

Username: admin
Password: admin

(You may be prompted to change the password - you can skip this)

9.3: Explore Pre-configured Dashboards

Click the ☰ menu (top left)
Click Dashboards
You'll see folders like:
- General
- Kubernetes / Compute Resources
- Node Exporter

9.4: View Cluster Overview

Navigate to: Dashboards → Kubernetes / Compute Resources → Cluster
You'll see:
- CPU Usage
- Memory Usage
- Network I/O
- Disk I/O
- Pod count
- And more!

9.5: View Node Metrics

Navigate to: Dashboards → Node Exporter → Nodes
Select a node from the dropdown
You'll see detailed system metrics

9.6: Explore Logs with Loki

Click the compass icon (Explore) on the left sidebar
Select Loki as the datasource

Try these queries:

{namespace="kube-system"}
{namespace="monitoring"}
{job="systemd-journal"}

Step 10: Deploy Sample Application

10.1: Deploy Nginx

# Create a deployment
kubectl create deployment nginx --image=nginx:latest --replicas=3

# Expose it as a NodePort service
kubectl expose deployment nginx --port=80 --type=NodePort

# Get the assigned port
kubectl get svc nginx

Example output:

NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
nginx   NodePort   10.43.123.45    <none>        80:31234/TCP   10s

10.2: Test the Application

# Access nginx (replace 31234 with your actual NodePort)
curl http://192.168.5.200:31234

Expected output: HTML content from nginx

10.3: View in Grafana

Go back to Grafana
Navigate to: Dashboards → Kubernetes / Compute Resources → Pod
Select namespace: default
Select pod: nginx-xxx
You'll see CPU, memory, and network metrics for your nginx pods!

10.4: View Nginx Logs

In Grafana, click Explore
Select Loki
Query: {app="nginx"}
You'll see nginx access logs

10.5: Cleanup Sample App (Optional)

kubectl delete deployment nginx
kubectl delete service nginx

Troubleshooting

Issue: VMs Don't Start

Symptoms: Vagrant fails to create VMs

Solutions:

# Check VirtualBox is running
vboxmanage list runningvms

# Check if VMs exist but are stopped
vboxmanage list vms

# Restart VirtualBox
# macOS: Open VirtualBox GUI and restart
# Linux: sudo systemctl restart virtualbox

# Try again
vagrant up

Issue: Network Bridge Not Found

Symptoms: Vagrant prompts for network interface selection

Solution:

Identify your interface: ifconfig or ip addr
Edit Vagrantfile line 35:
```
bridge: "YOUR_INTERFACE_NAME"
```
Reload VMs:
```
vagrant reload
```

Issue: kubectl Connection Refused

Symptoms: kubectl get nodes fails

Solutions:

# Re-copy kubeconfig
ssh root@192.168.5.200 'cat /etc/rancher/k3s/k3s.yaml' | \
  sed "s/127.0.0.1/192.168.5.200/g" > ~/.kube/config

# Set environment variable
export KUBECONFIG=~/.kube/config

# Test connection
kubectl get nodes

Issue: Grafana Not Accessible

Symptoms: Can't access http://192.168.5.200:30080

Solutions:

# Check if Grafana pod is running
kubectl get pods -n monitoring | grep grafana

# Check service
kubectl get svc -n monitoring | grep grafana

# Restart Grafana
kubectl rollout restart deployment kube-prometheus-stack-grafana -n monitoring

# Wait for it to be ready
kubectl rollout status deployment kube-prometheus-stack-grafana -n monitoring

# Test from command line
curl http://192.168.5.200:30080/login

Issue: Pods Stuck in Pending

Symptoms: kubectl get pods -A shows pods in Pending state

Solutions:

# Describe the pod to see why
kubectl describe pod <pod-name> -n <namespace>

# Common causes:
# 1. Insufficient resources - check node resources:
kubectl top nodes  # Requires metrics-server

# 2. Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 3. Restart the pod
kubectl delete pod <pod-name> -n <namespace>

Issue: Node Shows NotReady

Symptoms: kubectl get nodes shows NotReady status

Solutions:

# Check node details
kubectl describe node <node-name>

# SSH to the node
ssh root@<node-ip>

# Check k3s service
systemctl status k3s  # on master
systemctl status k3s-agent  # on worker

# Restart k3s
systemctl restart k3s  # on master
systemctl restart k3s-agent  # on worker

# Check logs
journalctl -u k3s -f  # on master
journalctl -u k3s-agent -f  # on worker

Daily Operations

Starting the Cluster

cd ~/homelabs

# Start all VMs
vagrant up

# Wait 30 seconds for k3s to start
sleep 30

# Verify
kubectl get nodes
./scripts/verify-cluster.sh

Stopping the Cluster

cd ~/homelabs

# Gracefully stop all VMs
vagrant halt

# Or stop individual nodes
vagrant halt master
vagrant halt worker1

Accessing Nodes

# Via helper script
./scripts/ssh-nodes.sh master

# Via direct SSH
ssh vagrant@192.168.5.200
ssh root@192.168.5.200

# Via Vagrant
vagrant ssh master

Viewing Logs

# Kubernetes events
kubectl get events -A --sort-by='.lastTimestamp'

# Pod logs
kubectl logs <pod-name> -n <namespace>

# Follow logs
kubectl logs <pod-name> -n <namespace> -f

# Previous container logs (if crashed)
kubectl logs <pod-name> -n <namespace> --previous

# All containers in a pod
kubectl logs <pod-name> -n <namespace> --all-containers

Monitoring Cluster Health

# Node status
kubectl get nodes -o wide

# Pod status across all namespaces
kubectl get pods -A

# System pods
kubectl get pods -n kube-system

# Monitoring pods
kubectl get pods -n monitoring

# Resource usage (requires metrics-server)
kubectl top nodes
kubectl top pods -A

Updating Observability Stack

# SSH to master
ssh root@192.168.5.200

# Update Helm repos
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
helm repo update

# Upgrade Prometheus/Grafana
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring

# Upgrade Loki
helm upgrade loki grafana/loki-stack --namespace monitoring

Cleanup and Maintenance

Removing Observability Stack

# SSH to master
ssh root@192.168.5.200

# Set kubeconfig
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

# Remove Helm releases
helm uninstall kube-prometheus-stack -n monitoring
helm uninstall loki -n monitoring

# Delete namespace
kubectl delete namespace monitoring

Removing k3s

# On master
ssh root@192.168.5.200 '/usr/local/bin/k3s-uninstall.sh'

# On workers
ssh root@192.168.5.201 '/usr/local/bin/k3s-agent-uninstall.sh'
ssh root@192.168.5.202 '/usr/local/bin/k3s-agent-uninstall.sh'
ssh root@192.168.5.203 '/usr/local/bin/k3s-agent-uninstall.sh'

Destroying VMs

cd ~/homelabs

# Stop and delete all VMs
vagrant destroy -f

# This removes all VMs but keeps your Vagrantfile
# You can recreate them with: vagrant up

Recreating the Cluster

cd ~/homelabs

# Destroy old VMs
vagrant destroy -f

# Create fresh VMs
vagrant up

# Reinstall k3s
./scripts/setup-cluster.sh  # If you created the automated script
# Or follow steps 6 and 7 manually

Disk Space Management

# Check VM disk usage
for ip in 192.168.5.{200..203}; do
  echo "=== $ip ==="
  ssh root@$ip "df -h /"
done

# Clean Docker/containerd images on nodes
for ip in 192.168.5.{200..203}; do
  ssh root@$ip "k3s crictl rmi --prune"
done

# Clean up unused Kubernetes resources
kubectl delete pods --field-selector status.phase=Failed -A
kubectl delete pods --field-selector status.phase=Succeeded -A

Appendix: Reference Information

Port Reference

Service	Port	Type	Access
k3s API	6443	TCP	Master only
Grafana	30080	NodePort	Any node
Prometheus	30090	NodePort	Any node
Alertmanager	30093	NodePort	Any node
Loki	3100	ClusterIP	Internal only
SSH	22	TCP	All nodes

IP Allocation

Node	IP Address	Purpose
master	192.168.5.200	Control plane, monitoring
worker1	192.168.5.201	Workloads
worker2	192.168.5.202	Workloads
worker3	192.168.5.203	Workloads

Default Credentials

Service	Username	Password
Grafana	admin	admin
SSH (vagrant)	vagrant	vagrant
SSH (root)	root	(key-based)

Useful kubectl Commands

# Context and config
kubectl config view
kubectl config current-context
kubectl cluster-info

# Nodes
kubectl get nodes
kubectl describe node <node-name>
kubectl cordon <node-name>  # Mark unschedulable
kubectl uncordon <node-name>  # Mark schedulable
kubectl drain <node-name>  # Evict all pods

# Pods
kubectl get pods -A
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl exec -it <pod-name> -n <namespace> -- bash

# Deployments
kubectl get deployments -A
kubectl scale deployment <name> --replicas=3
kubectl rollout status deployment <name>
kubectl rollout restart deployment <name>

# Services
kubectl get svc -A
kubectl describe svc <service-name>

# Resources
kubectl api-resources
kubectl explain pod
kubectl explain deployment.spec

# Debugging
kubectl get events -A --sort-by='.lastTimestamp'
kubectl top nodes
kubectl top pods -A

Additional Resources

k3s Documentation: https://docs.k3s.io/
Vagrant Documentation: https://www.vagrantup.com/docs
kubectl Cheat Sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
Prometheus Documentation: https://prometheus.io/docs/
Grafana Documentation: https://grafana.com/docs/
Loki Documentation: https://grafana.com/docs/loki/

Success Checklist

Use this checklist to verify your setup:

Congratulations! 🎉

You now have a fully functional 4-node k3s Kubernetes cluster with comprehensive observability!

What you've built:

Production-like Kubernetes cluster
Full monitoring with Prometheus
Beautiful dashboards with Grafana
Centralized logging with Loki
Alert management with Alertmanager
Complete infrastructure as code

What you can do next:

Deploy real applications
Experiment with Helm charts
Set up CI/CD pipelines
Test autoscaling
Implement GitOps with ArgoCD
Add a service mesh (Istio, Linkerd)
Deploy databases
Create custom Grafana dashboards
Configure alerts
Learn Kubernetes operators

Happy clustering! 🚀

FilesExpand file tree

SETUP-GUIDE.md

Latest commit

History

SETUP-GUIDE.md

File metadata and controls

Complete Homelab Setup Guide

4-Node k3s Cluster with Full Observability Stack

Table of Contents

Prerequisites

Hardware Requirements

Software Requirements

Network Requirements

Knowledge Requirements

Architecture Overview

Cluster Design

Components

Network Configuration

Step 1: Install Required Software

macOS

Linux (Ubuntu/Debian)

Windows

Step 2: Create Project Directory

Step 3: Configure Vagrant

3.1: Create Vagrantfile

3.2: Identify Your Network Interface

3.3: Update Vagrantfile with Your Interface

Step 4: Launch Virtual Machines

4.1: Start VMs

4.2: Monitor Progress

4.3: Verify VMs

Step 5: Configure SSH Access

5.1: Test Vagrant SSH

5.2: Test Direct SSH

5.3: Create SSH Helper Script

Step 6: Install k3s Cluster

6.1: Create k3s Master Installation Script

6.2: Create k3s Worker Installation Script

6.3: Install k3s on Master Node

6.4: Get Node Token

6.5: Install k3s on Worker Nodes

6.6: Configure kubectl on Your Machine

6.7: Verify Cluster

Step 7: Install Observability Stack

7.1: Create Observability Installation Script

7.2: Install Observability Stack

7.3: Verify Observability Pods

Step 8: Verify Installation

8.1: Create Verification Script

8.2: Run Verification

Step 9: Explore Grafana Dashboards

9.1: Open Grafana

9.2: Login

9.3: Explore Pre-configured Dashboards

9.4: View Cluster Overview

9.5: View Node Metrics

9.6: Explore Logs with Loki

Step 10: Deploy Sample Application

10.1: Deploy Nginx

10.2: Test the Application

10.3: View in Grafana

10.4: View Nginx Logs

10.5: Cleanup Sample App (Optional)

Troubleshooting

Issue: VMs Don't Start

Issue: Network Bridge Not Found

Issue: kubectl Connection Refused

Issue: Grafana Not Accessible

Issue: Pods Stuck in Pending

Issue: Node Shows NotReady

Daily Operations

Starting the Cluster

Stopping the Cluster

Accessing Nodes

Viewing Logs

Monitoring Cluster Health

Updating Observability Stack

Cleanup and Maintenance

Removing Observability Stack

Removing k3s