Stellar-K8s: Cloud-Native Stellar Infrastructure

Production-grade Stellar infrastructure in one command.

Stellar-K8s is a high-performance Kubernetes Operator written in strict Rust using kube-rs. It automates the deployment, management, and scaling of Stellar Core, Horizon, and Soroban RPC nodes, bringing the power of Cloud-Native patterns to the Stellar ecosystem.

Designed for high availability, type safety, and minimal footprint.

✨ Key Features

🦀 Rust-Native Performance: Built with kube-rs and Tokio for an ultra-lightweight footprint (~15MB binary) and complete memory safety.
🛡️ Enterprise Reliability: Type-safe error handling prevents runtime failures. Built-in Finalizers ensure clean PVC and resource cleanup.
🏥 Auto-Sync Health Checks: Automatically monitors Horizon and Soroban RPC nodes, only marking them Ready when fully synced with the network.
GitOps Ready: Fully compatible with ArgoCD and Flux for declarative infrastructure management.
📈 Observable by Default: Native Prometheus metrics integration for monitoring node health, ledger sync status, and resource usage.
⚡ Soroban Ready: First-class support for Soroban RPC nodes with captive core configuration.

🏗️ Architecture Overview

Stellar-K8s follows the Operator Pattern, extending Kubernetes with a StellarNode Custom Resource Definition (CRD).

CRD Source of Truth: You define your node requirements (Network, Type, Resources) in a StellarNode manifest.
Reconciliation Loop: The Rust-based controller watches for changes and drives the cluster state to match your desired specification.
Stateful Management: Automatically handles complex lifecycle events for Validators (StatefulSets) and RPC nodes (Deployments), including persistent storage and configuration.

📋 Prerequisites

Kubernetes cluster (1.28+)
kubectl configured
Helm 3.x (for operator installation)
Rust 1.88+ (for local development)
- CI/CD and Docker builds use Rust 1.93 for consistency
- Contributors can use any Rust 1.88+ version locally

🚀 Quick Start

Get a Testnet node running in under 5 minutes.

1. Install the Operator via Helm

# Add the helm repo (example)
helm repo add stellar-k8s https://stellar.github.io/stellar-k8s
helm repo update

# Install the operator
helm install stellar-operator stellar-k8s/stellar-operator \
  --namespace stellar-system \
  --create-namespace

Install the Operator via OLM

If you are installing on a cluster with the Operator Lifecycle Manager (e.g. OpenShift), refer to the OLM Deployment Guide.

2. Deploy a Testnet Validator

Apply the following manifest to your cluster:

# validator.yaml
apiVersion: stellar.org/v1alpha1
kind: StellarNode
metadata:
  name: my-validator
  namespace: stellar
spec:
  nodeType: Validator
  network: Testnet
  version: "v21.0.0"
  storage:
    storageClass: "standard"
    size: "100Gi"
    retentionPolicy: Retain
  validatorConfig:
    seedSecretRef: "my-validator-seed" # Pre-created K8s secret
    enableHistoryArchive: true

kubectl apply -f validator.yaml
kubectl get stellarnodes -n stellar

3. Use the kubectl-stellar Plugin

The project includes a kubectl plugin for convenient interaction with StellarNode resources:

# Build the plugin
cargo build --release --bin kubectl-stellar
cp target/release/kubectl-stellar ~/.local/bin/kubectl-stellar

# List all StellarNode resources
kubectl stellar list

# Check sync status
kubectl stellar status

# View logs from a node
kubectl stellar logs my-validator -f

See kubectl-plugin.md for complete documentation.

4. Custom Validation Policies with WebAssembly

Stellar-K8s supports custom validation policies written in WebAssembly, allowing you to enforce organization-specific requirements without modifying the operator code.

// Example: Enforce approved image registries
#[no_mangle]
pub extern "C" fn validate() -> i32 {
    let input = read_validation_input()?;
    
    // Check if image is from approved registry
    if !is_approved_registry(&input.object.spec.version) {
        return deny("Image must be from approved registry");
    }
    
    allow()
}

Features:

Sandboxed Execution: Plugins run in a secure, isolated Wasm environment
Dynamic Loading: Load plugins from ConfigMaps at runtime
Multi-Language Support: Write policies in Rust, Go, C++, or any language that compiles to Wasm
Fail-Open Support: Configure plugins to allow requests if they fail

See wasm-webhook.md for complete documentation and examples.

📊 Monitoring & Observability

Stellar-K8s comes with built-in Prometheus metrics and a pre-configured Grafana dashboard that provides a comprehensive overview of both the operator's health and the managed Stellar nodes.

Importing the Grafana Dashboard

Open your Grafana instance.
Navigate to Dashboards -> Import.
Upload the monitoring/grafana-dashboard.json file provided in this repository.
Select your Prometheus data source when prompted.
The dashboard will now automatically visualize:
- Node availability, sync status, and peer connectivity
- Controller reconciliation rates and duration (p50, p95, p99)
- Error rates and operator resource usage (CPU/Memory)

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on our development process, coding standards, and how to submit pull requests.

Roadmap

Phase 1: Core Operator & Helm Charts (Current)

StellarNode CRD with Validator support
Basic Controller logic with kube-rs
Helm Chart for easy deployment
CI/CD Pipeline with GitHub Actions and Docker builds
Auto-Sync Health Checks for Horizon and Soroban RPC nodes
kubectl-stellar plugin for node management

Phase 2: Soroban & Observability (Month 2)

Full Soroban RPC node support with captive core
Comprehensive Prometheus metrics export (Ledger age, peer count)
Dedicated Grafana Dashboards
Automated history archive management

Phase 3: High Availability & DR (Month 3)

Automated failover for high-availability setups
Disaster Recovery automation (backup/restore from history)
Multi-region federation support

💾 High-Performance Local Storage (NVMe)

Standard cloud Persistent Volumes (like AWS EBS or GCP Persistent Disks) can sometimes bottleneck Stellar Core's highly demanding database I/O, leading to ledger sync lag. Stellar-K8s supports a specialized LocalStorage mode to take advantage of low-latency local NVMe drives directly attached to your Kubernetes nodes.

Standard PVCs vs Local NVMe (Testnet Workload Benchmark)

Storage Type	Peak IOPS	Read Latency	Write Latency	Avg Sync Lag
Cloud Standard (EBS)	~3,000	1.5 - 2.5ms	2.0 - 5.0ms	5 - 15s
Local NVMe	100,000+	< 0.1ms	< 0.1ms	< 1s

Enabling LocalStorage

Simply set spec.storage.mode to Local. Stellar-K8s will automatically attempt to use a provisioner like local-path (often bundled with K3s/Kind/EKS). You can also explicitly pin to a specific node using nodeAffinity or specify a dedicated storageClass.

spec:
  nodeType: Validator
  storage:
    mode: Local
    # Automatically detects "local-path" or "local-storage" if omitted 
    # Or explicitly pin to specific nodes:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values: ["my-nvme-node-1"]

📊 Soroban-Specific Observability

Stellar-K8s provides comprehensive monitoring for Soroban RPC nodes with specialized metrics for smart contract operations.

Grafana Dashboard

A dedicated Soroban monitoring dashboard is available at monitoring/grafana-soroban.json. This dashboard provides real-time visibility into:

Smart Contract Metrics

Wasm Execution Time: Histogram showing p50, p95, and p99 latencies for host function execution
Contract Storage Fees: Distribution of storage fees charged across contract operations
Host Function Calls: Breakdown of which host functions are being invoked most frequently

Resource Consumption

CPU per Invocation: CPU instructions consumed by each contract invocation
Memory per Invocation: Wasm VM memory usage and per-invocation memory consumption
Process Resources: Overall CPU and memory usage of the Soroban RPC process

Transaction Metrics

Success/Failure Rate: Real-time success and failure rates for Soroban transactions
Transaction Ingestion Rate: Rate of transactions being processed (10m sliding window)
Events Ingestion Rate: Rate of contract events being ingested

Performance Indicators

RPC Request Latency: p50, p95, p99 latencies for JSON RPC methods
Database Round Trip Time: Database query performance monitoring
Ledger Ingestion Lag: How far behind the network the RPC node is

Runtime Health

Active Goroutines: Number of concurrent goroutines in the Go runtime
Memory Allocations: Rate of memory allocations
GC Pause Time: Garbage collection pause duration

Importing the Dashboard

Access Grafana: Navigate to your Grafana instance
Import Dashboard: Go to Dashboards → Import
Upload JSON: Upload monitoring/grafana-soroban.json
Configure Datasource: Select your Prometheus datasource
Save: The dashboard will be available as "Soroban RPC - Smart Contract Monitoring"

Prometheus Metrics

The operator exports the following Soroban-specific metrics:

# Wasm execution metrics
soroban_rpc_wasm_execution_duration_microseconds{namespace, name, network, contract_id}

# Storage fee metrics
soroban_rpc_contract_storage_fee_stroops{namespace, name, network, contract_id}

# Resource consumption
soroban_rpc_wasm_vm_memory_bytes{namespace, name, network, contract_id}
soroban_rpc_contract_invocation_cpu_instructions{namespace, name, network, contract_id}
soroban_rpc_contract_invocation_memory_bytes{namespace, name, network, contract_id}

# Contract invocations
soroban_rpc_contract_invocations_total{namespace, name, network, contract_type}

# Transaction results
soroban_rpc_transaction_result_total{namespace, name, network, result}

# Host function calls
soroban_rpc_host_function_calls_total{namespace, name, network, contract_id}

Example Queries

Average Wasm execution time (last 5m):

rate(soroban_rpc_wasm_execution_duration_microseconds_sum[5m]) / 
rate(soroban_rpc_wasm_execution_duration_microseconds_count[5m])

Transaction success rate:

sum(rate(soroban_rpc_transaction_result_total{result="success"}[5m])) /
sum(rate(soroban_rpc_transaction_result_total[5m]))

Top 5 most invoked contracts:

topk(5, sum(rate(soroban_rpc_contract_invocations_total[5m])) by (contract_type))

Alerting Rules

Example Prometheus alerting rules for Soroban RPC:

groups:
  - name: soroban_rpc
    rules:
      - alert: HighWasmExecutionLatency
        expr: histogram_quantile(0.99, rate(soroban_rpc_wasm_execution_duration_microseconds_bucket[5m])) > 100000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High Wasm execution latency (p99 > 100ms)"
          
      - alert: HighTransactionFailureRate
        expr: |
          sum(rate(soroban_rpc_transaction_result_total{result="failed"}[5m])) /
          sum(rate(soroban_rpc_transaction_result_total[5m])) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Transaction failure rate above 10%"
          
      - alert: HighLedgerIngestionLag
        expr: soroban_rpc_ingest_ledger_lag > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Ledger ingestion lagging behind network"

For more details on Soroban metrics, see the Stellar Soroban RPC documentation.

Development

Prerequisites

Rust (latest stable)
Docker & Kubernetes cluster
Make

Quick Start

# Setup development environment
make dev-setup

# Quick pre-commit check
make quick

# Full CI validation
make ci-local

# Build and run
make build
make run

See CONTRIBUTING.md for detailed development guidelines.

👨‍💻 Maintainer

Otowo Samuel
DevOps Engineer & Protocol Developer

Bringing nearly 5 years of DevOps experience and a deep background in blockchain infrastructure tools (core contributor of starknetnode-kit). Passionate about building robust, type-safe tooling for the decentralized web.

📄 License

This project is licensed under the Apache 2.0 License.

📝 Changelog

See CHANGELOG.md for a detailed history of changes and releases.

Name		Name	Last commit message	Last commit date
Latest commit History 396 Commits
.cargo		.cargo
.github		.github
assets		assets
benchmarks		benchmarks
bundle		bundle
charts/stellar-operator		charts/stellar-operator
config		config
docs		docs
examples		examples
formal_verification		formal_verification
monitoring		monitoring
results		results
src		src
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
SECURITY.md		SECURITY.md
SOROBAN_DASHBOARD_COMPLETE.md		SOROBAN_DASHBOARD_COMPLETE.md
WASM_WEBHOOK_COMPLETE.md		WASM_WEBHOOK_COMPLETE.md
WEBHOOK_BENCHMARK_COMPLETE.md		WEBHOOK_BENCHMARK_COMPLETE.md
build.rs		build.rs
bundle.Dockerfile		bundle.Dockerfile
cargo_check.log		cargo_check.log
check.log		check.log
get_helm.sh		get_helm.sh
krew-plugin.yaml		krew-plugin.yaml
manifest_validation_report.md		manifest_validation_report.md
rendered-output.yaml		rendered-output.yaml
rust-toolchain.toml		rust-toolchain.toml
test_encap.rs		test_encap.rs
test_pqc.rs		test_pqc.rs

Folders and files

Latest commit

History

Repository files navigation

Stellar-K8s: Cloud-Native Stellar Infrastructure

✨ Key Features

🏗️ Architecture Overview

📋 Prerequisites

🚀 Quick Start

1. Install the Operator via Helm

Install the Operator via OLM

2. Deploy a Testnet Validator

3. Use the kubectl-stellar Plugin

4. Custom Validation Policies with WebAssembly

📊 Monitoring & Observability

Importing the Grafana Dashboard

🤝 Contributing

Roadmap

Phase 1: Core Operator & Helm Charts (Current)

Phase 2: Soroban & Observability (Month 2)

Phase 3: High Availability & DR (Month 3)

💾 High-Performance Local Storage (NVMe)

Standard PVCs vs Local NVMe (Testnet Workload Benchmark)

Enabling LocalStorage

📊 Soroban-Specific Observability

Grafana Dashboard

Smart Contract Metrics

Resource Consumption

Transaction Metrics

Performance Indicators

Runtime Health

Importing the Dashboard

Prometheus Metrics

Example Queries

Alerting Rules

Development

Prerequisites

Quick Start

👨‍💻 Maintainer

📄 License

📝 Changelog

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages