A complete guide to setting up a distributed LangGraph infrastructure using your local hardware - zero external API costs!
Your optimal setup uses your available hardware efficiently:
- jetson-node (Orin Nano 8GB): Primary LLM server (Ollama + small/fast models)
- cpu-node (32GB Intel): Coordinator + heavy LLM tasks (Ollama + large models) + HAProxy + Redis
- rp-node (8GB ARM): Embeddings server (efficient ARM processing)
- worker-node3 (6GB VM): Tools execution server (web search, scraping, commands)
- worker-node4 (6GB VM): Monitoring and health checks (optional)
Follow the setup guides in order - each guide is self-contained and updated from the comprehensive SOT:
# 1. Setup Jetson Orin Nano (Primary LLM Server)
# Follow: setup_guides/01_jetson_setup.md
# Sets up Ollama + TensorRT optimizations on jetson-node (192.168.1.177)
# 2. Setup CPU Coordinator (Heavy LLM + Load Balancer + Cache)
# Follow: setup_guides/02_cpu_setup.md
# Sets up Ollama + HAProxy + Redis on cpu-node (192.168.1.81)
# 3. Setup LangGraph Integration (Workflows + Routing)
# Follow: setup_guides/03_langgraph_integration.md
# Creates intelligent routing and tool integration
# 4. Setup Worker Nodes (Embeddings + Tools + Monitoring)
# Follow: setup_guides/04_distributed_coordination.md
# Sets up rp-node, worker-node3, worker-node4 + orchestration# All IPs are pre-configured for your actual nodes!
cd ~/ai-infrastructure/langgraph-config
source ~/langgraph-env/bin/activate
# Start entire cluster
python3 cluster_orchestrator.py start
# Check cluster status
python3 cluster_orchestrator.py status
# Test all services
python3 cluster_orchestrator.py test# Test LangGraph workflows
cd ~/ai-infrastructure/langgraph-config
python3 main_app.py
# Run example workflows
cd /home/sanzad/git/langgraph/examples/
python3 example_workflows.py- Complete Guide: 00_complete_deployment_guide.md - Full walkthrough with all commands
- Modular Guides: 01-04 are extracted and synchronized from the complete guide
| Machine | Model | RAM Usage | Speed | Use Case |
|---|---|---|---|---|
| jetson-node | Llama 3.2 3B | ~3GB | 15-25 tok/s | General chat |
| jetson-node | Llama 3.2 1B | ~1.5GB | 30-50 tok/s | Quick responses |
| cpu-node | Mistral 7B | ~4.4GB | 8-15 tok/s | Complex analysis |
| rp-node | all-MiniLM-L6-v2 | ~200MB | 1000+ emb/s | Semantic intelligence (ARM) |
- LangGraph Concepts & Workflows - How LangGraph orchestrates your distributed setup β
- Embedding Server Deep Dive - Complete guide to the semantic intelligence layer π§
- Architecture Diagrams - System diagrams and visual flow charts
- Example Workflows - Complete LangGraph workflow implementations
# Automatically searches web, synthesizes findings
result = await research_workflow.invoke("What are the latest AI trends?")# Routes simple/complex coding questions appropriately
result = await coding_workflow.invoke("Build a FastAPI app with auth")# Scrapes data and provides analysis
result = await data_workflow.invoke("Analyze this dataset: https://...")- π Zero Cost: No LLM API costs ever (local inference only)
- β‘ Smart Routing: Auto-routes tasks to optimal hardware
- π Load Balancing: HAProxy distributes load automatically
- π Monitoring: Real-time health checks + optional Langfuse/Helicone
- π‘οΈ Fault Tolerance: Automatic failover and restart
- π Auto-scaling: Dynamic model loading based on load
For your hardware specifically:
- Jetson advantages: ARM efficiency, unified memory, low power
- Skip RTX cards: They're overkill for learning and consume 175W+ each
- Distributed approach: Maximizes utilization of all machines
- Local-first: Complete privacy and control
# Check cluster health
curl http://192.168.1.191:8083/cluster_health
# View load balancer stats
open http://192.168.1.81:9000/haproxy_stats
# Monitor real-time performance
htop # On each machine# On Jetson
ollama pull codellama:7b
# On CPU
wget https://huggingface.co/.../model.bin# Add more workers by modifying config.py
# Adjust model sizes based on available RAM# Create new workflows in examples/
# Follow the existing patterns for routing and tools# Option 1: Langfuse (LangSmith alternative) - Advanced tracing & analytics
# Follow: setup_guides/05_langfuse_setup.md
# Option 2: Helicone (Alternative monitoring) - Real-time monitoring & debugging
# Follow: setup_guides/06_helicone_setup.md
# Both are completely free and self-hosted!| Guide | Purpose | Machine(s) | Status |
|---|---|---|---|
| 00_complete_deployment_guide.md | π― Master SOT - Complete walkthrough | All machines | β Production ready |
| 01_jetson_setup.md | Jetson Orin Nano setup | jetson-node | β Synced from SOT |
| 02_cpu_setup.md | CPU coordinator setup | cpu-node | β Synced from SOT |
| 03_langgraph_integration.md | LangGraph workflows | cpu-node | β Synced from SOT |
| 04_distributed_coordination.md | Worker nodes + orchestration | All workers | β Synced from SOT |
| 05_langfuse_setup.md | Optional: Advanced monitoring | cpu-node | β Optional feature |
| 06_helicone_setup.md | Optional: Alternative monitoring | cpu-node | β Optional feature |
Benefits of this structure:
- β Modular: Focus on one machine/service at a time
- β Updated: All guides synced from the comprehensive SOT
- β Flexible: Use individual guides or complete guide
- β Maintained: Single source of truth prevents sync issues
Out of Memory
# Switch to smaller models
ollama pull tinyllama:1.1bService Not Starting
# Check logs
sudo journalctl -u ollama -f
sudo systemctl restart ollamaNetwork Issues
# Test connectivity
curl http://MACHINE_IP:PORT/healthThis setup is perfect for learning:
- LangGraph workflow patterns
- Distributed AI systems
- Local model deployment
- Resource optimization
- MLOps practices
- Production-ready local AI infrastructure
- Cost-effective learning environment
- Scalable architecture that grows with you
- Privacy-focused - your data never leaves your network
- Experiment with the example workflows
- Create your own domain-specific flows
- Scale up by adding more models or machines
- Optimize based on your specific use cases
- Share your workflows with the community!
Happy Learning! π You now have a professional-grade local AI infrastructure that rivals cloud solutions - for free!