Skip to content

Infrastructure-as-code for K3s clusters on Turing Pi RK1 with NPU support. Terraform + Ansible.

License

Notifications You must be signed in to change notification settings

freed-dev-llc/turing-ansible-cluster

Turing RK1 Ansible Cluster

CI License: MIT

Infrastructure-as-code for deploying K3s Kubernetes on Turing Pi RK1 hardware with NPU support.

Architecture Documentation - Visual diagrams for network topology, deployment pipeline, and component interactions.

Overview

Deploy a 4-node K3s cluster on Turing Pi with:

  • Armbian with Rockchip BSP kernel (NPU support)
  • Terraform for BMC flashing via terraform-provider-turingpi
  • Ansible for K3s installation and addon deployment
  • Networking matching existing Talos cluster configuration

Node Configuration

Node IP Role Hardware
node1 10.10.88.73 Control Plane RK1 (slot 1) + NVMe
node2 10.10.88.74 Worker RK1 (slot 2) + NVMe
node3 10.10.88.75 Worker RK1 (slot 3) + NVMe
node4 10.10.88.76 Worker RK1 (slot 4) + NVMe

Prerequisites

Quick Start

1. Setup Secrets

cd ansible
./scripts/setup-secrets.sh
# Edit secrets/server.yml to set passwords

2. Flash Armbian via BMC

cd terraform/environments/server

# Set BMC credentials
export TURINGPI_USERNAME=root
export TURINGPI_PASSWORD=turing
export TURINGPI_ENDPOINT=https://10.10.88.70
# WARNING: Only use TURINGPI_INSECURE in trusted networks (disables TLS verification)
export TURINGPI_INSECURE=true

# Flash all nodes (WARNING: destructive!)
terraform init
terraform apply -var="flash_nodes=true" -var="firmware_path=/path/to/armbian.img"

3. Deploy K3s Cluster

cd ansible
ansible-playbook -i inventories/server/hosts.yml playbooks/site.yml

4. Setup NPU (Optional)

ansible-playbook -i inventories/server/hosts.yml playbooks/npu-setup.yml

Repository Structure

turing-ansible-cluster/
├── terraform/
│   ├── modules/bmc/              # BMC operations module
│   └── environments/server/      # RK1 cluster config
│
├── ansible/
│   ├── inventories/server/       # Node inventory
│   ├── playbooks/
│   │   ├── site.yml              # Full deployment
│   │   ├── bootstrap.yml         # OS preparation
│   │   ├── kubernetes.yml        # K3s installation
│   │   ├── addons.yml            # Helm addons
│   │   └── npu-setup.yml         # RKNN toolkit
│   ├── roles/                    # Ansible roles
│   ├── secrets/                  # Local secrets (gitignored)
│   └── scripts/setup-secrets.sh  # Initialize secrets

Cluster Configuration

Matches existing Talos cluster:

Component Value
Pod CIDR 10.244.0.0/16
Service CIDR 10.96.0.0/12
Cluster DNS 10.96.0.10
MetalLB Range 10.10.88.80-89
Ingress IP 10.10.88.80

Addons Deployed

  • MetalLB - L2 LoadBalancer
  • NGINX Ingress - Ingress controller
  • Longhorn - Distributed storage (NVMe on workers)
  • Prometheus + Grafana - Monitoring
  • Portainer Agent - Container management

Storage Optimization

All nodes with NVMe drives are automatically configured to use NVMe for both Longhorn and K3s container storage:

Path Location Purpose
/var/lib/longhorn NVMe partition 2 Longhorn distributed storage
/var/lib/rancher NVMe (symlink) K3s container images and data

Note: Nodes are labeled with node.longhorn.io/create-default-disk=true during deployment to enable automatic Longhorn disk detection.

NPU Support

Runtime-only RKNN installation for RK3588 NPU inference:

Component Description
rknn-llm LLM inference runtime with librknnrt.so
rkllama Flask-based LLM API server (systemd service)
DeepSeek 1.5B Pre-installed RKLLM model (~1.9GB)
NPU Device /dev/dri/renderD129 (DRM subsystem)
Driver rknpu v0.9.8+ (vendor kernel)

LLM API Service

Each node runs rkllama as a systemd service on port 8080:

# Load model
curl -X POST http://10.10.88.73:8080/load_model \
  -H "Content-Type: application/json" \
  -d '{"model_name": "DeepSeek-R1-1.5B"}'

# Generate response
curl -X POST http://10.10.88.73:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'

Performance: ~7-8 tokens/second per node

See docs/NPU-API.md for full API documentation.

NPU Status

cat /sys/kernel/debug/rknpu/version  # Driver version
cat /sys/kernel/debug/rknpu/load     # Core utilization
systemctl status rkllama             # Service status

Requires Armbian with Rockchip vendor kernel (6.1.x).

Note: Dev tools (rknn-toolkit2, ezrknpu) are not installed by default. For model conversion, use a separate dev machine.

Building Armbian

Build custom Armbian images with NPU support using the Armbian build framework:

# Clone build framework
git clone --depth=1 https://github.com/armbian/build ~/armbian-build
cd ~/armbian-build

# Build image with vendor kernel (required for NPU)
./compile.sh build \
  BOARD=turing-rk1 \
  BRANCH=vendor \
  RELEASE=bookworm \
  BUILD_MINIMAL=no \
  BUILD_DESKTOP=no

Output: ~/armbian-build/output/images/Armbian-*_Turing-rk1_bookworm_vendor_*.img

For advanced options (custom packages, static IPs, SSH keys), see docs/ARMBIAN-BUILD.md.

Image Distribution

Pre-built images are hosted on Cloudflare R2 at armbian-builds.techki.to:

# Download latest image
./scripts/download-armbian-image.sh --latest

# Prepare for specific node
./scripts/prepare-armbian-image.sh Armbian-*.img 1

# Flash to node
tpi flash --node 1 --image-path Armbian-*.img

See docs/ARMBIAN-BUILD.md#image-distribution for full usage.

Development

Local Testing

# Install dependencies
make install-deps

# Run all checks (lint + syntax)
make test

# Run individual checks
make lint
make syntax-check

Releases

Uses semantic versioning with git tags:

# Create a release
make release VERSION=v1.0.0

# Push the tag (triggers GitHub release)
git push origin v1.0.0

CI/CD

GitHub Actions runs on every push and PR:

  • Ansible Lint - Code quality checks (production profile)
  • Syntax Check - Validates all playbooks
  • Terraform Validate - Checks Terraform configuration

Related Repositories

License

MIT