AnyLoom: AnythingLLM Local AI Agentic Stack

A fully local, multi-agent AI system that gives you ChatGPT-level intelligence with complete privacy and control over your data.

Now with Docker! One command starts the entire stack. Zero manual setup.

💡 What Can You Do With This?

Run a production-grade AI assistant stack entirely on your hardware:

🔒 100% private — No data leaves your machine. No API keys. No subscriptions.
🧠 Advanced reasoning — Qwen3-30B MoE (30.5B params, 3.3B active) with hybrid thinking mode
📚 Hybrid RAG search — Finds YOUR information better than pure vector search (dense + sparse retrieval)
🤖 Multi-agent swarm — DyTopo coordination routes complex tasks to specialized agents that collaborate, with optional RAG context pre-fetch for domain grounding
🛠️ 10 MCP servers — Memory knowledge graph, web search, browser automation, file operations, code execution, RAG search, multi-agent swarm
🐋 Docker-first architecture — One command to start/stop everything. Auto-restart. Zero networking hassles.
💬 AnythingLLM UI — Clean interface for chat, document Q&A, and workspace management

Ideal for:

Engineers who need AI assistance with proprietary codebases
Researchers handling sensitive documents (legal, medical, financial)
Privacy-conscious users who want ChatGPT-level capability without cloud dependency
Developers building custom AI workflows with persistent memory and multi-agent collaboration

Why AnyLoom vs Cloud AI or Single-LLM Setups?

	AnyLoom	Cloud AI (ChatGPT, Claude)	Single Local LLM
Privacy	✅ 100% local, zero telemetry	❌ Your data trains their models	✅ Local
Cost	✅ One-time hardware investment	❌ $20-200/month subscription	✅ Free after setup
Retrieval Quality	✅ Hybrid dense+sparse RAG	⚠️ Dense-only embeddings	⚠️ Basic or no RAG
Multi-Agent Swarm	✅ DyTopo routing, 3-5 agents	❌ Single model per request	❌ Single model
Persistent Memory	✅ MCP knowledge graph across sessions	⚠️ Limited to conversation	❌ No cross-session memory
Tool Ecosystem	✅ 10 MCP servers (RAG, swarm, web, code, files, browser)	⚠️ Limited, cloud-gated	❌ Manual integration
Context Window	✅ 131K tokens (configurable)	⚠️ 128K (expensive tiers)	⚠️ Varies by model
Offline Use	✅ Fully functional	❌ Requires internet	✅ Fully functional

The bottom line: If you need ChatGPT-level capability for sensitive work, AnyLoom gives you near the same intelligence without the privacy trade-offs or subscription costs.

🌐 How It Works

AnyLoom runs as a Docker Compose stack with these services:

Qdrant (port 6333) — Vector database for hybrid dense+sparse RAG
llama.cpp LLM (port 8008) — GPU-accelerated inference with 131K context (Qwen3-30B-A3B)
llama.cpp Embedding (port 8009) — BGE-M3 embedding server for AnythingLLM (1024-dim dense vectors)
AnythingLLM (port 3001) — Web UI for chat and document management
DyTopo swarm (Python, runs natively) — Multi-agent orchestration for complex tasks
10 MCP servers — RAG search, DyTopo swarm, memory graph, web search, browser automation, file ops, and more

Everything starts with one command. Docker handles networking, GPU access, auto-restart, and data persistence.

Component	Tokens
Total Token Budget	131K
System prompt	~2K
MCP tool definitions (9 Docker + 1 qdrant-rag)	~3K
RAG snippets (16 × ~500 tokens)	~8K
Chat history (30 messages)	~12K
Overhead Subtotal:	~25K
Remaining for chat	~106K

The entire RAG-prompt set fits comfortably inside the token limit. Context length is configurable (default 131K). Q4_K_M model weights are ~18.6 GiB, leaving ample room for KV cache on 32GB GPUs. See docs/llm-engine.md for VRAM budget details.

✅ Runs on a single GPU (requires 32GB+ VRAM; optimized for RTX 5090)

🛠️ Prerequisites

All you need:

Component	Requirement
Docker Desktop	v24.0+ with WSL2 integration and GPU support enabled
NVIDIA GPU	RTX 4090/5090 or similar (32GB VRAM recommended for full 131K context. 24GB GPUs can run with reduced context.)
NVIDIA Driver	535+ (for CUDA 12 support)
Python	3.10+ (for benchmarks and DyTopo scripts)
Disk Space	~100GB for models and data

Docker handles everything: Qdrant, llama.cpp (LLM + Embedding), and AnythingLLM run as containers. No manual WSL setup or service management!

🚀 Quickstart

1. Clone and Download Model

git clone <repo-url>
cd AnyLoom

# Download models
mkdir -p models
pip install huggingface_hub

# LLM model — Qwen3-30B-A3B Q4_K_M (~18.6 GB, GPU)
huggingface-cli download Qwen/Qwen3-30B-A3B-Instruct-2507-GGUF \
  Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf \
  --local-dir models

# Embedding model — BGE-M3 Q8_0 (~605 MB, GPU)
huggingface-cli download ggml-org/bge-m3-Q8_0-GGUF \
  bge-m3-q8_0.gguf \
  --local-dir models

Already have the LLM GGUF? Symlink instead of re-downloading: ln -s ~/.lmstudio/models/lmstudio-community/Qwen3-30B-A3B-Instruct-2507-GGUF/Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf models/

2. Start the Docker Stack

# One command starts everything (creates volumes, checks model, waits for health)
bash scripts/docker_start.sh

# Or manually (must create volumes first)
docker volume create anyloom_qdrant_storage
docker volume create anyloom_anythingllm_storage
docker volume create anyloom_anythingllm_hotdir
docker compose up -d

Startup takes ~2 minutes while llama.cpp loads the model into GPU VRAM. First query may take an additional 1-2 minutes as the prompt cache warms up.

3. Configure AnythingLLM

Open http://localhost:3001 and complete the initial setup wizard (password, preferences). The API is locked until this is done.
Then run the automated configuration:

python scripts/configure_anythingllm.py

This configures AnythingLLM system defaults (LLM provider, max tokens, BGE-M3 embedding, vector DB, chunk size/overlap, default system prompt), creates an AnyLoom workspace, uploads and embeds the RAG reference documents from rag-docs/anythingllm/ into the workspace's vector store, pushes tuned workspace settings, and runs a smoke test. Re-running the script is safe — it skips documents that are already uploaded and embedded.

4. Access Services

AnythingLLM UI: http://localhost:3001
llama.cpp LLM API: http://localhost:8008/v1/models
llama.cpp Embedding API: http://localhost:8009/v1/embeddings
Qdrant Dashboard: http://localhost:6333/dashboard

5. Run Benchmarks (Optional)

# Install Python dependencies first
pip install -r requirements-dytopo.txt

# Test the full stack (all 6 phases)
ANYTHINGLLM_API_KEY=your-key python scripts/benchmarks/bench_run_all.py

# Or test just llama.cpp directly (no AnythingLLM needed)
ANYTHINGLLM_API_KEY=your-key python scripts/benchmarks/bench_phase5_llm.py

Phase 5 validates llama.cpp directly — fabrication guards, tool boundary awareness, and depth calibration. Current score: 15/20 (75%) with perfect marks on fabrication guards, adversarial resistance, cross-workspace parity, depth stability, and LLM direct validation. See benchmark results for full scores.

🔧 Management Commands

# View logs
bash scripts/docker_logs.sh llm           # llama.cpp only
bash scripts/docker_logs.sh anythingllm  # AnythingLLM only
docker compose logs -f                    # All services

# Stop services
bash scripts/docker_stop.sh
# Or: docker compose down

# Restart a specific service
docker compose restart llm

# Check status
docker compose ps

# Remove everything including data (⚠️ DESTRUCTIVE)
docker compose down -v

📚 Documentation

Start here: INSTALL.md — Docker-based installation guide (repo root)

Reference documentation in docs/:

Document	Contents
`architecture.md`	System topology, VRAM budget, port assignments
`llm-engine.md`	llama.cpp Docker container config, GPU settings, troubleshooting
`qwen3-model.md`	Qwen3-30B-A3B MoE architecture, quantization, sampling
`bge-m3-embedding.md`	BGE-M3 embedding architecture (ONNX INT8 CPU for MCP RAG + llama.cpp GGUF for AnythingLLM, 1024-dim dense vectors)
`qdrant-topology.md`	Qdrant Docker container, collection schema, sync
`qdrant-servers.md`	MCP server inventory, tool definitions, token budget
`dytopo-swarm.md`	DyTopo multi-agent routing, package architecture, domains, lifecycle
`anythingllm-settings.md`	AnythingLLM Docker container, provider config, workspace setup
`benchmark-results-showcase.md`	Benchmark results across all rounds

DyTopo Package (`src/dytopo/`)

Module	Purpose
`models.py`	Pydantic v2 data models (AgentState, SwarmTask with RAG context field, SwarmMetrics, etc.)
`config.py`	YAML configuration loader with defaults (`dytopo_config.yaml`)
`agents.py`	System prompts, JSON schemas, domain rosters
`router.py`	MiniLM-L6-v2 embedding, cosine similarity, threshold, degree cap
`graph.py`	NetworkX DAG construction, cycle breaking, topological sort
`orchestrator.py`	Main swarm loop with singleton inference client, Aegean termination, memory persistence
`governance.py`	Convergence detection, stalling detection, re-delegation, Aegean consensus voting
`audit.py`	JSONL audit logging to `~/dytopo-logs/{task_id}/`
`health/checker.py`	Pre-run health probes for LLM, Qdrant, AnythingLLM, GPU
`memory/writer.py`	Post-run swarm result persistence to structured storage

🔄 Data & Persistence

Docker Volumes (persist across restarts):
- anyloom_qdrant_storage — Vector database
- anyloom_anythingllm_storage — AnythingLLM workspaces
- anyloom_anythingllm_hotdir — AnythingLLM document collector
Host Bind Mount:
- ./models/ — GGUF model files (~19.2 GB total). LLM model (~18.6 GB) + embedding model (~605 MB). Place both files here before starting.
Filesystem Access: All configuration files and Python scripts are local
Model Updates: Replace the GGUF file in ./models/ and restart: docker compose restart llm
RAG Re-indexing: Re-run python scripts/configure_anythingllm.py (idempotent) or re-embed documents via AnythingLLM UI

# View volumes
docker volume ls | grep anyloom

# Backup a volume
docker run --rm -v anyloom_qdrant_storage:/data -v $(pwd):/backup ubuntu tar czf /backup/qdrant_backup.tar.gz /data

# Remove all data (⚠️ DESTRUCTIVE)
docker compose down -v

✅ You're now running a next-gen, fully local AI agentic stack. Start creating, querying, and orchestrating with AnyLoom today.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
docs		docs
prompts		prompts
rag-docs/anythingllm		rag-docs/anythingllm
scripts		scripts
skills		skills
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE.txt		LICENSE.txt
README.md		README.md
agent_config.txt		agent_config.txt
docker-compose.yml		docker-compose.yml
dytopo_config.yaml		dytopo_config.yaml
requirements-dytopo.txt		requirements-dytopo.txt
swarm-overview.png		swarm-overview.png
system-overview.png		system-overview.png
workspace_config.txt		workspace_config.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnyLoom: AnythingLLM Local AI Agentic Stack

💡 What Can You Do With This?

Why AnyLoom vs Cloud AI or Single-LLM Setups?

🌐 How It Works

🛠️ Prerequisites

🚀 Quickstart

1. Clone and Download Model

2. Start the Docker Stack

3. Configure AnythingLLM

4. Access Services

5. Run Benchmarks (Optional)

🔧 Management Commands

📚 Documentation

DyTopo Package (`src/dytopo/`)

🔄 Data & Persistence

About

Uh oh!

Packages

Uh oh!

Languages

License

Intradyne/AnyLoom-AnythingLLM-Local-AI-agentic-DyTopo-swarm

Folders and files

Latest commit

History

Repository files navigation

AnyLoom: AnythingLLM Local AI Agentic Stack

💡 What Can You Do With This?

Why AnyLoom vs Cloud AI or Single-LLM Setups?

🌐 How It Works

🛠️ Prerequisites

🚀 Quickstart

1. Clone and Download Model

2. Start the Docker Stack

3. Configure AnythingLLM

4. Access Services

5. Run Benchmarks (Optional)

🔧 Management Commands

📚 Documentation

DyTopo Package (src/dytopo/)

🔄 Data & Persistence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Languages

DyTopo Package (`src/dytopo/`)

Packages