Time to Complete: ~30-45 minutes
Difficulty: Intermediate
Cost: Free (all open-source tools)
- Prerequisites
- Architecture Overview
- Step 1: Install Required Software
- Step 2: Create Project Directory
- Step 3: Configure Vagrant
- Step 4: Launch Virtual Machines
- Step 5: Configure SSH Access
- Step 6: Install k3s Cluster
- Step 7: Install Observability Stack
- Step 8: Verify Installation
- Step 9: Explore Grafana Dashboards
- Step 10: Deploy Sample Application
- Troubleshooting
- Daily Operations
- Cleanup and Maintenance
- CPU: Multi-core processor (4+ cores recommended)
- RAM: Minimum 16GB (20GB+ recommended)
- Disk: 50GB+ free space
- Network: Active network connection for VM bridging
- Host OS: macOS, Linux, or Windows
- VirtualBox: 7.0 or later
- Vagrant: 2.0 or later
- SSH: OpenSSH client (pre-installed on macOS/Linux)
- Access to a /22 network (192.168.4.0/22 in this guide)
- Your host machine should be on this network
- No firewall blocking ports: 22, 80, 443, 6443, 30080, 30090, 30093
- Basic command-line skills
- Understanding of SSH
- Familiarity with YAML (helpful but not required)
- Basic Kubernetes concepts (helpful but not required)
┌─────────────────────────────────────┐
│ Your Host Machine (Mac/PC) │
│ - kubectl configured │
│ - SSH access to all nodes │
│ - Browser access to dashboards │
└──────────────┬──────────────────────┘
│
┌──────────────┴──────────────┐
│ 192.168.4.0/22 Network │
│ (Bridged Networking) │
└──────────────┬──────────────┘
│
┌──────────────────────────┼──────────────────────────┐
│ │ │
┌───────▼────────┐ ┌───────────▼─────┐ ┌──────────────▼──────┐
│ Master Node │ │ Worker Node 1 │ │ Worker Nodes 2 & 3 │
│ 192.168.5.200 │ │ 192.168.5.201 │ │ 192.168.5.202-203 │
│ │ │ │ │ │
│ - k3s server │ │ - k3s agent │ │ - k3s agent │
│ - etcd │ │ - Workloads │ │ - Workloads │
│ - API server │ │ │ │ │
│ - Monitoring │ │ │ │ │
│ Stack: │ │ │ │ │
│ • Prometheus │ │ │ │ │
│ • Grafana │ │ │ │ │
│ • Loki │ │ │ │ │
│ │ │ │ │ │
│ 2 CPU / 4GB │ │ 2 CPU / 4GB │ │ 2 CPU / 4GB each │
└────────────────┘ └──────────────────┘ └─────────────────────┘
| Component | Purpose | Port |
|---|---|---|
| k3s | Lightweight Kubernetes distribution | 6443 |
| Prometheus | Metrics collection and storage | 30090 |
| Grafana | Visualization and dashboards | 30080 |
| Loki | Log aggregation | 3100 |
| Alertmanager | Alert management | 30093 |
| Node Exporter | Host metrics | 9100 |
| Promtail | Log collection agent | 9080 |
- Subnet: 192.168.4.0/22 (255.255.252.0)
- Gateway: 192.168.4.1
- DNS: 8.8.8.8, 8.8.4.4
- IP Range: 192.168.4.0 - 192.168.7.255 (1024 addresses)
# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install VirtualBox
brew install --cask virtualbox
# Install Vagrant
brew install --cask vagrant
# Install vagrant-disksize plugin (optional, for larger disks)
vagrant plugin install vagrant-disksize
# Verify installations
virtualbox --help
vagrant --version# Update package list
sudo apt update
# Install VirtualBox
sudo apt install -y virtualbox virtualbox-ext-pack
# Install Vagrant
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y vagrant
# Install vagrant-disksize plugin
vagrant plugin install vagrant-disksize
# Verify installations
vboxmanage --version
vagrant --version- Download and install VirtualBox
- Download and install Vagrant
- Open PowerShell as Administrator and run:
vagrant plugin install vagrant-disksize
# Create project directory
mkdir -p ~/homelabs
cd ~/homelabs
# Create a README placeholder
echo "# Homelabs k3s Cluster" > README.md
# Verify you're in the right directory
pwdExpected Output: /Users/yourusername/homelabs (or similar)
Create a file named Vagrantfile in your project directory:
cd ~/homelabs
nano Vagrantfile # or use your preferred editorCopy and paste this content:
# Homelabs 4-node Debian cluster (bridged, /22)
# Requires: VirtualBox, Vagrant
# Optional: vagrant-disksize plugin for 32GB disks:
# vagrant plugin install vagrant-disksize
NETMASK = "255.255.252.0" # /22 (192.168.4.0–192.168.7.255)
GATEWAY = "192.168.4.1"
DNS = ["8.8.8.8", "8.8.4.4"]
NODES = [
{name: "master", ip: "192.168.5.200", cpu: 2, ram: 4096},
{name: "worker1", ip: "192.168.5.201", cpu: 2, ram: 4096},
{name: "worker2", ip: "192.168.5.202", cpu: 2, ram: 4096},
{name: "worker3", ip: "192.168.5.203", cpu: 2, ram: 4096},
]
Vagrant.configure("2") do |config|
config.vm.box = "debian/bookworm64"
config.vm.synced_folder ".", "/vagrant", disabled: true
# Make sure vbguest (if present) doesn't interfere
if Vagrant.has_plugin?("vagrant-vbguest")
config.vbguest.auto_update = false
config.vbguest.no_remote = true
end
NODES.each do |node|
config.vm.define node[:name] do |vm|
vm.vm.hostname = node[:name]
# Bridged NIC with static IP on /22
vm.vm.network :public_network,
ip: node[:ip],
netmask: NETMASK
# Uncomment and modify if you want to specify a bridge:
# bridge: "en0: Wi-Fi (AirPort)"
vm.vm.provider :virtualbox do |vb|
vb.name = "hlab-#{node[:name]}"
vb.cpus = node[:cpu]
vb.memory = node[:ram]
vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
end
# 32 GB disk via plugin (comment if you don't use the plugin)
if Vagrant.has_plugin?("vagrant-disksize")
vm.disksize.size = "32GB"
end
vm.vm.provision "shell", privileged: true, inline: <<-SHELL
set -euo pipefail
apt-get update -y
apt-get install -y curl ca-certificates gnupg lsb-release jq net-tools iproute2
# K8s-friendly sysctls (safe even if you don't install k3s yet)
swapoff -a || true
sed -i.bak '/ swap / s/^/#/' /etc/fstab || true
modprobe br_netfilter || true
sysctl -w net.ipv4.ip_forward=1
echo 'net.ipv4.ip_forward=1' > /etc/sysctl.d/99-ipforward.conf
echo 'net.bridge.bridge-nf-call-iptables=1' > /etc/sysctl.d/99-k8s.conf
sysctl --system
# Make bridged NIC the default route (optional but recommended here)
IFACE=$(ip -o -4 addr show | awk '/192\\.168\\.(4|5|6|7)\\./ {print $2; exit}')
if [ -n "$IFACE" ]; then
ip route del default || true
ip route add default via #{GATEWAY} dev "$IFACE"
fi
# DNS via systemd-resolved
mkdir -p /etc/systemd/resolved.conf.d
cat >/etc/systemd/resolved.conf.d/99-custom-dns.conf <<EOF
[Resolve]
DNS=#{DNS.join(" ")}
FallbackDNS=
Domains=
MulticastDNS=no
LLMNR=no
DNSSEC=no
EOF
systemctl restart systemd-resolved || true
SHELL
# Copy your SSH public key for direct SSH access
ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip rescue nil
ssh_pub_key ||= File.readlines("#{Dir.home}/.ssh/id_ed25519.pub").first.strip rescue nil
if ssh_pub_key
vm.vm.provision "shell", privileged: false, inline: <<-SHELL
echo "Adding your SSH public key to authorized_keys..."
mkdir -p ~/.ssh
chmod 700 ~/.ssh
echo '#{ssh_pub_key}' >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
SHELL
vm.vm.provision "shell", privileged: true, inline: <<-SHELL
echo "Adding your SSH public key to root authorized_keys..."
mkdir -p /root/.ssh
chmod 700 /root/.ssh
echo '#{ssh_pub_key}' >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
SHELL
end
end
end
endImportant: After creating the file, you need to identify your network interface.
# On macOS/Linux
ifconfig | grep -A 1 "192.168.[4-7]"
# Or use ip command
ip addr show | grep "192.168.[4-7]"Example output:
en2: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 192.168.5.10 netmask 0xfffffc00 broadcast 192.168.7.255
In this example, the interface is en2: Wi-Fi (AirPort).
Edit the Vagrantfile and update line 35 to specify your bridge interface:
# Before (line 35)
# bridge: "en0: Wi-Fi (AirPort)"
# After (replace with YOUR interface)
bridge: "en2: Wi-Fi (AirPort)"Save the file.
cd ~/homelabs
# Start all VMs (this will take 10-15 minutes)
vagrant upWhat happens:
- Downloads Debian Bookworm base box (~350MB, first time only)
- Creates 4 VMs
- Configures networking
- Installs required packages
- Configures kernel parameters for Kubernetes
- Sets up SSH keys
You'll see output like:
Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'worker1' up with 'virtualbox' provider...
...
==> master: Box 'debian/bookworm64' could not be found...
==> master: Adding box 'debian/bookworm64'
Wait for completion. You should see:
==> master: Machine 'master' has a post `vagrant up` message...
==> worker1: Machine 'worker1' has a post `vagrant up` message...
==> worker2: Machine 'worker2' has a post `vagrant up` message...
==> worker3: Machine 'worker3' has a post `vagrant up` message...
# Check VM status
vagrant statusExpected output:
Current machine states:
master running (virtualbox)
worker1 running (virtualbox)
worker2 running (virtualbox)
worker3 running (virtualbox)
# Test SSH through Vagrant
vagrant ssh master -c "hostname && whoami"Expected output:
master
vagrant
# Test direct SSH (should work without password)
ssh vagrant@192.168.5.200 "hostname && whoami"Expected output:
master
vagrant
Create scripts/ssh-nodes.sh:
cat > ~/homelabs/scripts/ssh-nodes.sh << 'EOF'
#!/bin/bash
# Quick SSH helper for homelabs cluster nodes
case "$1" in
master|m)
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.200
;;
worker1|w1)
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.201
;;
worker2|w2)
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.202
;;
worker3|w3)
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null vagrant@192.168.5.203
;;
*)
echo "Usage: $0 {master|m|worker1|w1|worker2|w2|worker3|w3}"
exit 1
;;
esac
EOF
chmod +x ~/homelabs/scripts/ssh-nodes.shTest it:
./scripts/ssh-nodes.sh master
# You should be logged into the master node
exitCreate scripts/install-k3s-master.sh:
cat > ~/homelabs/scripts/install-k3s-master.sh << 'EOF'
#!/bin/bash
# Install k3s master node with observability stack
set -euo pipefail
echo "=== Installing k3s master node ==="
# Install k3s server
curl -sfL https://get.k3s.io | sh -s - server \
--write-kubeconfig-mode 644 \
--disable traefik \
--node-ip 192.168.5.200 \
--node-external-ip 192.168.5.200 \
--bind-address 192.168.5.200 \
--advertise-address 192.168.5.200 \
--tls-san 192.168.5.200
# Wait for k3s to be ready
echo "Waiting for k3s to be ready..."
until kubectl get nodes &>/dev/null; do
sleep 2
done
echo "=== k3s master installed successfully ==="
echo "Node token:"
sudo cat /var/lib/rancher/k3s/server/node-token
echo ""
echo "Kubeconfig is available at: /etc/rancher/k3s/k3s.yaml"
EOF
chmod +x ~/homelabs/scripts/install-k3s-master.shCreate scripts/install-k3s-worker.sh:
cat > ~/homelabs/scripts/install-k3s-worker.sh << 'EOF'
#!/bin/bash
# Install k3s worker node
set -euo pipefail
if [ $# -ne 2 ]; then
echo "Usage: $0 <MASTER_IP> <NODE_TOKEN>"
exit 1
fi
MASTER_IP=$1
NODE_TOKEN=$2
CURRENT_IP=$(hostname -I | awk '{print $2}')
echo "=== Installing k3s worker on $HOSTNAME ($CURRENT_IP) ==="
echo "Joining cluster at: $MASTER_IP"
# Install k3s agent
curl -sfL https://get.k3s.io | K3S_URL="https://${MASTER_IP}:6443" \
K3S_TOKEN="${NODE_TOKEN}" sh -s - agent \
--node-ip "${CURRENT_IP}" \
--node-external-ip "${CURRENT_IP}"
echo "=== k3s worker installed successfully ==="
EOF
chmod +x ~/homelabs/scripts/install-k3s-worker.shcd ~/homelabs
# Copy script to master
scp scripts/install-k3s-master.sh root@192.168.5.200:/tmp/
# Execute installation
ssh root@192.168.5.200 'bash /tmp/scripts/install-k3s-master.sh'This takes 1-2 minutes. You'll see:
=== Installing k3s master node ===
[INFO] Finding release for channel stable
[INFO] Using v1.33.5+k3s1 as release
...
=== k3s master installed successfully ===
Node token:
K10xxxxxxxxxxxx::server:xxxxxxxxxxxxx
Important: Copy the node token shown at the end!
# Get the token (save this for the next step)
NODE_TOKEN=$(ssh root@192.168.5.200 'cat /var/lib/rancher/k3s/server/node-token')
echo "Node Token: $NODE_TOKEN"# Install on all workers simultaneously
for ip in 192.168.5.201 192.168.5.202 192.168.5.203; do
echo "Installing on $ip..."
scp scripts/install-k3s-worker.sh root@$ip:/tmp/
ssh root@$ip "bash /tmp/scripts/install-k3s-worker.sh 192.168.5.200 $NODE_TOKEN" &
done
# Wait for all to complete
wait
echo "All workers installed!"This takes 1-2 minutes per worker.
# Create .kube directory
mkdir -p ~/.kube
# Copy kubeconfig from master
ssh root@192.168.5.200 'cat /etc/rancher/k3s/k3s.yaml' | \
sed "s/127.0.0.1/192.168.5.200/g" > ~/.kube/config
# Set permissions
chmod 600 ~/.kube/config
# Set environment variable (add to ~/.bashrc or ~/.zshrc for persistence)
export KUBECONFIG=~/.kube/config# Check nodes
kubectl get nodes -o wideExpected output (wait 30 seconds if nodes show NotReady):
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP
master Ready control-plane,master 2m v1.33.5+k3s1 192.168.5.200 192.168.5.200
worker1 Ready <none> 1m v1.33.5+k3s1 192.168.5.201 192.168.5.201
worker2 Ready <none> 1m v1.33.5+k3s1 192.168.5.202 192.168.5.202
worker3 Ready <none> 1m v1.33.5+k3s1 192.168.5.203 192.168.5.203
✅ Checkpoint: You now have a working 4-node k3s cluster!
Create scripts/install-observability.sh:
cat > ~/homelabs/scripts/install-observability.sh << 'EOF'
#!/bin/bash
# Install observability stack: Prometheus, Grafana, Loki
set -euo pipefail
# Set kubeconfig for k3s
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
echo "=== Installing Observability Stack ==="
# Wait for cluster to be ready
echo "Checking cluster status..."
kubectl wait --for=condition=Ready nodes --all --timeout=300s
# Add Helm (k3s doesn't include it by default)
if ! command -v helm &> /dev/null; then
echo "Installing Helm..."
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
fi
# Add Helm repositories
echo "Adding Helm repositories..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Create monitoring namespace
echo "Creating monitoring namespace..."
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
echo "Installing Prometheus + Grafana..."
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
--set grafana.adminPassword=admin \
--set grafana.service.type=NodePort \
--set grafana.service.nodePort=30080 \
--set prometheus.service.type=NodePort \
--set prometheus.service.nodePort=30090 \
--set alertmanager.service.type=NodePort \
--set alertmanager.service.nodePort=30093 \
--wait --timeout=10m
# Install Loki for log aggregation
echo "Installing Loki..."
helm upgrade --install loki grafana/loki-stack \
--namespace monitoring \
--set loki.persistence.enabled=true \
--set loki.persistence.size=10Gi \
--set promtail.enabled=true \
--set grafana.enabled=false \
--wait --timeout=5m
# Configure Loki datasource in Grafana
echo "Configuring Loki datasource..."
cat <<DATASOURCE | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasource-loki
namespace: monitoring
labels:
grafana_datasource: "1"
data:
loki-datasource.yaml: |-
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
isDefault: false
editable: true
DATASOURCE
# Restart Grafana to pick up Loki datasource
echo "Restarting Grafana..."
kubectl rollout restart deployment kube-prometheus-stack-grafana -n monitoring
kubectl rollout status deployment kube-prometheus-stack-grafana -n monitoring --timeout=5m
echo ""
echo "=== Observability Stack Installed Successfully ==="
echo ""
echo "Access URLs (from your host machine):"
echo " Grafana: http://192.168.5.200:30080"
echo " Username: admin"
echo " Password: admin"
echo ""
echo " Prometheus: http://192.168.5.200:30090"
echo " Alertmanager: http://192.168.5.200:30093"
echo ""
EOF
chmod +x ~/homelabs/scripts/install-observability.shcd ~/homelabs
# Copy script to master
scp scripts/install-observability.sh root@192.168.5.200:/tmp/
# Execute installation (this takes 5-10 minutes)
ssh root@192.168.5.200 'bash /tmp/scripts/install-observability.sh'What happens:
- Installs Helm package manager
- Adds Prometheus and Grafana Helm repositories
- Installs kube-prometheus-stack (Prometheus, Grafana, Alertmanager)
- Installs Loki stack (Loki, Promtail)
- Configures Loki as a datasource in Grafana
This takes 5-10 minutes. You'll see:
=== Installing Observability Stack ===
Checking cluster status...
Installing Helm...
Adding Helm repositories...
Installing Prometheus + Grafana...
Installing Loki...
=== Observability Stack Installed Successfully ===
# Check all monitoring pods
kubectl get pods -n monitoringExpected output (all pods should be Running):
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 3m
kube-prometheus-stack-grafana-xxx 3/3 Running 0 3m
kube-prometheus-stack-kube-state-metrics-xxx 1/1 Running 0 3m
kube-prometheus-stack-operator-xxx 1/1 Running 0 3m
kube-prometheus-stack-prometheus-node-exporter-xxx 1/1 Running 0 3m
loki-0 1/1 Running 0 2m
loki-promtail-xxx 1/1 Running 0 2m
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 3m
✅ Checkpoint: Observability stack is installed!
Create scripts/verify-cluster.sh:
cat > ~/homelabs/scripts/verify-cluster.sh << 'EOF'
#!/bin/bash
# Verify k3s cluster and observability stack
set -euo pipefail
echo "╔════════════════════════════════════════════════════════════════╗"
echo "║ k3s Cluster Verification ║"
echo "╚════════════════════════════════════════════════════════════════╝"
echo ""
export KUBECONFIG=~/.kube/config
# Colors
GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m'
check_mark="${GREEN}✓${NC}"
cross_mark="${RED}✗${NC}"
# Check kubectl
echo "🔍 Checking kubectl connectivity..."
if kubectl cluster-info &>/dev/null; then
echo -e " ${check_mark} kubectl is configured correctly"
else
echo -e " ${cross_mark} kubectl is NOT working"
exit 1
fi
# Check nodes
echo ""
echo "🖥️ Checking cluster nodes..."
kubectl get nodes -o wide
NODE_COUNT=$(kubectl get nodes --no-headers | wc -l | tr -d ' ')
READY_COUNT=$(kubectl get nodes --no-headers | grep -c Ready || true)
if [ "$NODE_COUNT" -eq 4 ] && [ "$READY_COUNT" -eq 4 ]; then
echo -e " ${check_mark} All 4 nodes are Ready"
else
echo -e " ${cross_mark} Expected 4 Ready nodes, found ${READY_COUNT}/${NODE_COUNT}"
fi
# Check monitoring
echo ""
echo "📊 Checking observability stack..."
kubectl get pods -n monitoring
# Check endpoints
echo ""
echo "🌐 Testing service endpoints..."
test_url() {
if curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "$1" | grep -q "200\|302"; then
echo -e " ${check_mark} $2: $1"
else
echo -e " ${cross_mark} $2: $1"
fi
}
test_url "http://192.168.5.200:30080/login" "Grafana "
test_url "http://192.168.5.200:30090/graph" "Prometheus "
test_url "http://192.168.5.200:30093" "Alertmanager"
echo ""
echo "╔════════════════════════════════════════════════════════════════╗"
echo "║ ✅ Verification Complete! ║"
echo "╚════════════════════════════════════════════════════════════════╝"
echo ""
echo "📊 Grafana: http://192.168.5.200:30080"
echo " Username: admin | Password: admin"
echo ""
EOF
chmod +x ~/homelabs/scripts/verify-cluster.shcd ~/homelabs
./scripts/verify-cluster.shExpected output: All checks should show ✓ (green checkmarks).
# Open Grafana in your browser
open http://192.168.5.200:30080
# Or manually navigate to: http://192.168.5.200:30080- Username:
admin - Password:
admin
(You may be prompted to change the password - you can skip this)
- Click the ☰ menu (top left)
- Click Dashboards
- You'll see folders like:
- General
- Kubernetes / Compute Resources
- Node Exporter
- Navigate to: Dashboards → Kubernetes / Compute Resources → Cluster
- You'll see:
- CPU Usage
- Memory Usage
- Network I/O
- Disk I/O
- Pod count
- And more!
- Navigate to: Dashboards → Node Exporter → Nodes
- Select a node from the dropdown
- You'll see detailed system metrics
- Click the compass icon (Explore) on the left sidebar
- Select Loki as the datasource
- Try these queries:
{namespace="kube-system"} {namespace="monitoring"} {job="systemd-journal"}
# Create a deployment
kubectl create deployment nginx --image=nginx:latest --replicas=3
# Expose it as a NodePort service
kubectl expose deployment nginx --port=80 --type=NodePort
# Get the assigned port
kubectl get svc nginxExample output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx NodePort 10.43.123.45 <none> 80:31234/TCP 10s
# Access nginx (replace 31234 with your actual NodePort)
curl http://192.168.5.200:31234Expected output: HTML content from nginx
- Go back to Grafana
- Navigate to: Dashboards → Kubernetes / Compute Resources → Pod
- Select namespace: default
- Select pod: nginx-xxx
- You'll see CPU, memory, and network metrics for your nginx pods!
- In Grafana, click Explore
- Select Loki
- Query:
{app="nginx"} - You'll see nginx access logs
kubectl delete deployment nginx
kubectl delete service nginxSymptoms: Vagrant fails to create VMs
Solutions:
# Check VirtualBox is running
vboxmanage list runningvms
# Check if VMs exist but are stopped
vboxmanage list vms
# Restart VirtualBox
# macOS: Open VirtualBox GUI and restart
# Linux: sudo systemctl restart virtualbox
# Try again
vagrant upSymptoms: Vagrant prompts for network interface selection
Solution:
- Identify your interface:
ifconfigorip addr - Edit Vagrantfile line 35:
bridge: "YOUR_INTERFACE_NAME"
- Reload VMs:
vagrant reload
Symptoms: kubectl get nodes fails
Solutions:
# Re-copy kubeconfig
ssh root@192.168.5.200 'cat /etc/rancher/k3s/k3s.yaml' | \
sed "s/127.0.0.1/192.168.5.200/g" > ~/.kube/config
# Set environment variable
export KUBECONFIG=~/.kube/config
# Test connection
kubectl get nodesSymptoms: Can't access http://192.168.5.200:30080
Solutions:
# Check if Grafana pod is running
kubectl get pods -n monitoring | grep grafana
# Check service
kubectl get svc -n monitoring | grep grafana
# Restart Grafana
kubectl rollout restart deployment kube-prometheus-stack-grafana -n monitoring
# Wait for it to be ready
kubectl rollout status deployment kube-prometheus-stack-grafana -n monitoring
# Test from command line
curl http://192.168.5.200:30080/loginSymptoms: kubectl get pods -A shows pods in Pending state
Solutions:
# Describe the pod to see why
kubectl describe pod <pod-name> -n <namespace>
# Common causes:
# 1. Insufficient resources - check node resources:
kubectl top nodes # Requires metrics-server
# 2. Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# 3. Restart the pod
kubectl delete pod <pod-name> -n <namespace>Symptoms: kubectl get nodes shows NotReady status
Solutions:
# Check node details
kubectl describe node <node-name>
# SSH to the node
ssh root@<node-ip>
# Check k3s service
systemctl status k3s # on master
systemctl status k3s-agent # on worker
# Restart k3s
systemctl restart k3s # on master
systemctl restart k3s-agent # on worker
# Check logs
journalctl -u k3s -f # on master
journalctl -u k3s-agent -f # on workercd ~/homelabs
# Start all VMs
vagrant up
# Wait 30 seconds for k3s to start
sleep 30
# Verify
kubectl get nodes
./scripts/verify-cluster.shcd ~/homelabs
# Gracefully stop all VMs
vagrant halt
# Or stop individual nodes
vagrant halt master
vagrant halt worker1# Via helper script
./scripts/ssh-nodes.sh master
# Via direct SSH
ssh vagrant@192.168.5.200
ssh root@192.168.5.200
# Via Vagrant
vagrant ssh master# Kubernetes events
kubectl get events -A --sort-by='.lastTimestamp'
# Pod logs
kubectl logs <pod-name> -n <namespace>
# Follow logs
kubectl logs <pod-name> -n <namespace> -f
# Previous container logs (if crashed)
kubectl logs <pod-name> -n <namespace> --previous
# All containers in a pod
kubectl logs <pod-name> -n <namespace> --all-containers# Node status
kubectl get nodes -o wide
# Pod status across all namespaces
kubectl get pods -A
# System pods
kubectl get pods -n kube-system
# Monitoring pods
kubectl get pods -n monitoring
# Resource usage (requires metrics-server)
kubectl top nodes
kubectl top pods -A# SSH to master
ssh root@192.168.5.200
# Update Helm repos
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
helm repo update
# Upgrade Prometheus/Grafana
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring
# Upgrade Loki
helm upgrade loki grafana/loki-stack --namespace monitoring# SSH to master
ssh root@192.168.5.200
# Set kubeconfig
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# Remove Helm releases
helm uninstall kube-prometheus-stack -n monitoring
helm uninstall loki -n monitoring
# Delete namespace
kubectl delete namespace monitoring# On master
ssh root@192.168.5.200 '/usr/local/bin/k3s-uninstall.sh'
# On workers
ssh root@192.168.5.201 '/usr/local/bin/k3s-agent-uninstall.sh'
ssh root@192.168.5.202 '/usr/local/bin/k3s-agent-uninstall.sh'
ssh root@192.168.5.203 '/usr/local/bin/k3s-agent-uninstall.sh'cd ~/homelabs
# Stop and delete all VMs
vagrant destroy -f
# This removes all VMs but keeps your Vagrantfile
# You can recreate them with: vagrant upcd ~/homelabs
# Destroy old VMs
vagrant destroy -f
# Create fresh VMs
vagrant up
# Reinstall k3s
./scripts/setup-cluster.sh # If you created the automated script
# Or follow steps 6 and 7 manually# Check VM disk usage
for ip in 192.168.5.{200..203}; do
echo "=== $ip ==="
ssh root@$ip "df -h /"
done
# Clean Docker/containerd images on nodes
for ip in 192.168.5.{200..203}; do
ssh root@$ip "k3s crictl rmi --prune"
done
# Clean up unused Kubernetes resources
kubectl delete pods --field-selector status.phase=Failed -A
kubectl delete pods --field-selector status.phase=Succeeded -A| Service | Port | Type | Access |
|---|---|---|---|
| k3s API | 6443 | TCP | Master only |
| Grafana | 30080 | NodePort | Any node |
| Prometheus | 30090 | NodePort | Any node |
| Alertmanager | 30093 | NodePort | Any node |
| Loki | 3100 | ClusterIP | Internal only |
| SSH | 22 | TCP | All nodes |
| Node | IP Address | Purpose |
|---|---|---|
| master | 192.168.5.200 | Control plane, monitoring |
| worker1 | 192.168.5.201 | Workloads |
| worker2 | 192.168.5.202 | Workloads |
| worker3 | 192.168.5.203 | Workloads |
| Service | Username | Password |
|---|---|---|
| Grafana | admin | admin |
| SSH (vagrant) | vagrant | vagrant |
| SSH (root) | root | (key-based) |
# Context and config
kubectl config view
kubectl config current-context
kubectl cluster-info
# Nodes
kubectl get nodes
kubectl describe node <node-name>
kubectl cordon <node-name> # Mark unschedulable
kubectl uncordon <node-name> # Mark schedulable
kubectl drain <node-name> # Evict all pods
# Pods
kubectl get pods -A
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl exec -it <pod-name> -n <namespace> -- bash
# Deployments
kubectl get deployments -A
kubectl scale deployment <name> --replicas=3
kubectl rollout status deployment <name>
kubectl rollout restart deployment <name>
# Services
kubectl get svc -A
kubectl describe svc <service-name>
# Resources
kubectl api-resources
kubectl explain pod
kubectl explain deployment.spec
# Debugging
kubectl get events -A --sort-by='.lastTimestamp'
kubectl top nodes
kubectl top pods -A- k3s Documentation: https://docs.k3s.io/
- Vagrant Documentation: https://www.vagrantup.com/docs
- kubectl Cheat Sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
- Prometheus Documentation: https://prometheus.io/docs/
- Grafana Documentation: https://grafana.com/docs/
- Loki Documentation: https://grafana.com/docs/loki/
Use this checklist to verify your setup:
- VirtualBox installed and working
- Vagrant installed and working
- All 4 VMs created and running
- SSH access working to all nodes
- k3s master node installed
- All 3 worker nodes joined cluster
- kubectl configured on host machine
- All 4 nodes show "Ready" status
- Monitoring namespace created
- All monitoring pods running
- Grafana accessible at http://192.168.5.200:30080
- Prometheus accessible at http://192.168.5.200:30090
- Can login to Grafana
- Pre-configured dashboards visible
- Loki datasource configured
- Sample application deployed successfully
- Metrics visible in Grafana for sample app
You now have a fully functional 4-node k3s Kubernetes cluster with comprehensive observability!
What you've built:
- Production-like Kubernetes cluster
- Full monitoring with Prometheus
- Beautiful dashboards with Grafana
- Centralized logging with Loki
- Alert management with Alertmanager
- Complete infrastructure as code
What you can do next:
- Deploy real applications
- Experiment with Helm charts
- Set up CI/CD pipelines
- Test autoscaling
- Implement GitOps with ArgoCD
- Add a service mesh (Istio, Linkerd)
- Deploy databases
- Create custom Grafana dashboards
- Configure alerts
- Learn Kubernetes operators
Happy clustering! 🚀