Production-grade Stellar infrastructure in one command.
Stellar-K8s is a high-performance Kubernetes Operator written in strict Rust using kube-rs. It automates the deployment, management, and scaling of Stellar Core, Horizon, and Soroban RPC nodes, bringing the power of Cloud-Native patterns to the Stellar ecosystem.
Designed for high availability, type safety, and minimal footprint.
- π¦ Rust-Native Performance: Built with
kube-rsandTokiofor an ultra-lightweight footprint (~15MB binary) and complete memory safety. - π‘οΈ Enterprise Reliability: Type-safe error handling prevents runtime failures. Built-in
Finalizersensure clean PVC and resource cleanup. - π₯ Auto-Sync Health Checks: Automatically monitors Horizon and Soroban RPC nodes, only marking them Ready when fully synced with the network.
- GitOps Ready: Fully compatible with ArgoCD and Flux for declarative infrastructure management.
- π Observable by Default: Native Prometheus metrics integration for monitoring node health, ledger sync status, and resource usage.
- β‘ Soroban Ready: First-class support for Soroban RPC nodes with captive core configuration.
Stellar-K8s follows the Operator Pattern, extending Kubernetes with a StellarNode Custom Resource Definition (CRD).
- CRD Source of Truth: You define your node requirements (Network, Type, Resources) in a
StellarNodemanifest. - Reconciliation Loop: The Rust-based controller watches for changes and drives the cluster state to match your desired specification.
- Stateful Management: Automatically handles complex lifecycle events for Validators (StatefulSets) and RPC nodes (Deployments), including persistent storage and configuration.
- Kubernetes cluster (1.28+)
- kubectl configured
- Helm 3.x (for operator installation)
- Rust 1.88+ (for local development)
- CI/CD and Docker builds use Rust 1.93 for consistency
- Contributors can use any Rust 1.88+ version locally
Get a Testnet node running in under 5 minutes.
# Add the helm repo (example)
helm repo add stellar-k8s https://stellar.github.io/stellar-k8s
helm repo update
# Install the operator
helm install stellar-operator stellar-k8s/stellar-operator \
--namespace stellar-system \
--create-namespaceIf you are installing on a cluster with the Operator Lifecycle Manager (e.g. OpenShift), refer to the OLM Deployment Guide.
Apply the following manifest to your cluster:
# validator.yaml
apiVersion: stellar.org/v1alpha1
kind: StellarNode
metadata:
name: my-validator
namespace: stellar
spec:
nodeType: Validator
network: Testnet
version: "v21.0.0"
storage:
storageClass: "standard"
size: "100Gi"
retentionPolicy: Retain
validatorConfig:
seedSecretRef: "my-validator-seed" # Pre-created K8s secret
enableHistoryArchive: truekubectl apply -f validator.yaml
kubectl get stellarnodes -n stellarThe project includes a kubectl plugin for convenient interaction with StellarNode resources:
# Build the plugin
cargo build --release --bin kubectl-stellar
cp target/release/kubectl-stellar ~/.local/bin/kubectl-stellar
# List all StellarNode resources
kubectl stellar list
# Check sync status
kubectl stellar status
# View logs from a node
kubectl stellar logs my-validator -fSee kubectl-plugin.md for complete documentation.
Stellar-K8s supports custom validation policies written in WebAssembly, allowing you to enforce organization-specific requirements without modifying the operator code.
// Example: Enforce approved image registries
#[no_mangle]
pub extern "C" fn validate() -> i32 {
let input = read_validation_input()?;
// Check if image is from approved registry
if !is_approved_registry(&input.object.spec.version) {
return deny("Image must be from approved registry");
}
allow()
}Features:
- Sandboxed Execution: Plugins run in a secure, isolated Wasm environment
- Dynamic Loading: Load plugins from ConfigMaps at runtime
- Multi-Language Support: Write policies in Rust, Go, C++, or any language that compiles to Wasm
- Fail-Open Support: Configure plugins to allow requests if they fail
See wasm-webhook.md for complete documentation and examples.
Stellar-K8s comes with built-in Prometheus metrics and a pre-configured Grafana dashboard that provides a comprehensive overview of both the operator's health and the managed Stellar nodes.
- Open your Grafana instance.
- Navigate to Dashboards -> Import.
- Upload the
monitoring/grafana-dashboard.jsonfile provided in this repository. - Select your Prometheus data source when prompted.
- The dashboard will now automatically visualize:
- Node availability, sync status, and peer connectivity
- Controller reconciliation rates and duration (p50, p95, p99)
- Error rates and operator resource usage (CPU/Memory)
We welcome contributions! Please see our Contributing Guide for details on our development process, coding standards, and how to submit pull requests.
-
StellarNodeCRD with Validator support - Basic Controller logic with
kube-rs - Helm Chart for easy deployment
- CI/CD Pipeline with GitHub Actions and Docker builds
- Auto-Sync Health Checks for Horizon and Soroban RPC nodes
- kubectl-stellar plugin for node management
- Full Soroban RPC node support with captive core
- Comprehensive Prometheus metrics export (Ledger age, peer count)
- Dedicated Grafana Dashboards
- Automated history archive management
- Automated failover for high-availability setups
- Disaster Recovery automation (backup/restore from history)
- Multi-region federation support
Standard cloud Persistent Volumes (like AWS EBS or GCP Persistent Disks) can sometimes bottleneck Stellar Core's highly demanding database I/O, leading to ledger sync lag. Stellar-K8s supports a specialized LocalStorage mode to take advantage of low-latency local NVMe drives directly attached to your Kubernetes nodes.
| Storage Type | Peak IOPS | Read Latency | Write Latency | Avg Sync Lag |
|---|---|---|---|---|
| Cloud Standard (EBS) | ~3,000 | 1.5 - 2.5ms | 2.0 - 5.0ms | 5 - 15s |
| Local NVMe | 100,000+ | < 0.1ms | < 0.1ms | < 1s |
Simply set spec.storage.mode to Local. Stellar-K8s will automatically attempt to use a provisioner like local-path (often bundled with K3s/Kind/EKS). You can also explicitly pin to a specific node using nodeAffinity or specify a dedicated storageClass.
spec:
nodeType: Validator
storage:
mode: Local
# Automatically detects "local-path" or "local-storage" if omitted
# Or explicitly pin to specific nodes:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values: ["my-nvme-node-1"]Stellar-K8s provides comprehensive monitoring for Soroban RPC nodes with specialized metrics for smart contract operations.
A dedicated Soroban monitoring dashboard is available at monitoring/grafana-soroban.json. This dashboard provides real-time visibility into:
- Wasm Execution Time: Histogram showing p50, p95, and p99 latencies for host function execution
- Contract Storage Fees: Distribution of storage fees charged across contract operations
- Host Function Calls: Breakdown of which host functions are being invoked most frequently
- CPU per Invocation: CPU instructions consumed by each contract invocation
- Memory per Invocation: Wasm VM memory usage and per-invocation memory consumption
- Process Resources: Overall CPU and memory usage of the Soroban RPC process
- Success/Failure Rate: Real-time success and failure rates for Soroban transactions
- Transaction Ingestion Rate: Rate of transactions being processed (10m sliding window)
- Events Ingestion Rate: Rate of contract events being ingested
- RPC Request Latency: p50, p95, p99 latencies for JSON RPC methods
- Database Round Trip Time: Database query performance monitoring
- Ledger Ingestion Lag: How far behind the network the RPC node is
- Active Goroutines: Number of concurrent goroutines in the Go runtime
- Memory Allocations: Rate of memory allocations
- GC Pause Time: Garbage collection pause duration
- Access Grafana: Navigate to your Grafana instance
- Import Dashboard: Go to Dashboards β Import
- Upload JSON: Upload
monitoring/grafana-soroban.json - Configure Datasource: Select your Prometheus datasource
- Save: The dashboard will be available as "Soroban RPC - Smart Contract Monitoring"
The operator exports the following Soroban-specific metrics:
# Wasm execution metrics
soroban_rpc_wasm_execution_duration_microseconds{namespace, name, network, contract_id}
# Storage fee metrics
soroban_rpc_contract_storage_fee_stroops{namespace, name, network, contract_id}
# Resource consumption
soroban_rpc_wasm_vm_memory_bytes{namespace, name, network, contract_id}
soroban_rpc_contract_invocation_cpu_instructions{namespace, name, network, contract_id}
soroban_rpc_contract_invocation_memory_bytes{namespace, name, network, contract_id}
# Contract invocations
soroban_rpc_contract_invocations_total{namespace, name, network, contract_type}
# Transaction results
soroban_rpc_transaction_result_total{namespace, name, network, result}
# Host function calls
soroban_rpc_host_function_calls_total{namespace, name, network, contract_id}
Average Wasm execution time (last 5m):
rate(soroban_rpc_wasm_execution_duration_microseconds_sum[5m]) /
rate(soroban_rpc_wasm_execution_duration_microseconds_count[5m])
Transaction success rate:
sum(rate(soroban_rpc_transaction_result_total{result="success"}[5m])) /
sum(rate(soroban_rpc_transaction_result_total[5m]))
Top 5 most invoked contracts:
topk(5, sum(rate(soroban_rpc_contract_invocations_total[5m])) by (contract_type))
Example Prometheus alerting rules for Soroban RPC:
groups:
- name: soroban_rpc
rules:
- alert: HighWasmExecutionLatency
expr: histogram_quantile(0.99, rate(soroban_rpc_wasm_execution_duration_microseconds_bucket[5m])) > 100000
for: 5m
labels:
severity: warning
annotations:
summary: "High Wasm execution latency (p99 > 100ms)"
- alert: HighTransactionFailureRate
expr: |
sum(rate(soroban_rpc_transaction_result_total{result="failed"}[5m])) /
sum(rate(soroban_rpc_transaction_result_total[5m])) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "Transaction failure rate above 10%"
- alert: HighLedgerIngestionLag
expr: soroban_rpc_ingest_ledger_lag > 10
for: 5m
labels:
severity: warning
annotations:
summary: "Ledger ingestion lagging behind network"For more details on Soroban metrics, see the Stellar Soroban RPC documentation.
- Rust (latest stable)
- Docker & Kubernetes cluster
- Make
# Setup development environment
make dev-setup
# Quick pre-commit check
make quick
# Full CI validation
make ci-local
# Build and run
make build
make runSee CONTRIBUTING.md for detailed development guidelines.
Otowo Samuel
DevOps Engineer & Protocol Developer
Bringing nearly 5 years of DevOps experience and a deep background in blockchain infrastructure tools (core contributor of starknetnode-kit). Passionate about building robust, type-safe tooling for the decentralized web.
This project is licensed under the Apache 2.0 License.
See CHANGELOG.md for a detailed history of changes and releases.
