Advanced reasoning models for Open WebUI using Adaptive Branching Monte Carlo Tree Search (AB-MCTS) and Multi-Model collaboration.
This project implements Sakana AI's AB-MCTS (Adaptive Branching Monte Carlo Tree Search) algorithm and a Multi-Model collaboration system, both integrated with Open WebUI as selectable AI models for advanced reasoning and decision-making.
- AB-MCTS Pipeline: Advanced tree search with LLM-as-judge quality evaluation
- Multi-criterion evaluation (accuracy, completeness, clarity, relevance)
- Configurable criterion weights
- Support for 1-2 judge models for consensus
- Real-time tree visualization
- Multi-Model Pipeline: Multi-model collaboration for comprehensive answers
- OpenAI-Compatible API: Native integration with Open WebUI's model system
- Real-time Monitoring: Prometheus metrics and Grafana dashboards
- Experiment Logging: SQLite + JSONL run tracking for research and analysis
- Interactive Dashboard: Configure models, judges, and visualize search trees
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Open WebUI Interface β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Model Selection: β
β βββββββββββββββββββ βββββββββββββββββββ β
β β ab-mcts β β multi-model β β
β β β β β β
β β β’ Tree Search β β β’ Collaboration β β
β β β’ Deep Analysis β β β’ Multi-perspective β
β β β’ Best Quality β β β’ Comprehensive β β
β βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Model Integration Service (8098) β
β OpenAI-Compatible API β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββ΄ββββββββββββββββ
βΌ βΌ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β AB-MCTS Service β β Multi-Model Service β
β (port 8094) β β (port 8090) β
β β β β
β β’ TreeQuest Algorithm β β β’ Direct Collaboration β
β β’ Thompson Sampling β β β’ Model Voting β
β β’ Anti-Hallucination β β β’ Synthesis β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β β
βββββββββββββββββ¬ββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ollama β
β Local LLM Inference Engine β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
openwebui-setup/
βββ README.md # This file
βββ docker-compose.yml # Docker orchestration
βββ Dockerfile # Container definition
βββ requirements.txt # Python dependencies
βββ backend/
β βββ api/
β β βββ main.py # Management API (port 8095)
β βββ services/
β β βββ proper_treequest_ab_mcts_service.py # AB-MCTS (port 8094)
β β βββ proper_multi_model_service.py # Multi-Model (port 8090)
β β βββ experiment_logger.py # Run logging
β β βββ config_manager.py # Configuration
β βββ model_integration.py # OpenAI-compatible model API (8098)
β βββ openwebui_integration.py # Tool endpoints (8097)
βββ interfaces/
β βββ dashboard.html # Management dashboard
β βββ idiots_guide.html # Setup guide
βββ logs/ # Experiment logs and runs
- Docker and Docker Compose
- Ollama running locally (port 11434)
- Recommended models:
llama3.2:latest,qwen2.5:latest,deepseek-r1:1.5b
- Clone the repository:
git clone https://github.com/yourusername/openwebui-setup.git
cd openwebui-setup- Pull Ollama models:
ollama pull llama3.2:latest
ollama pull qwen2.5:latest
ollama pull deepseek-r1:1.5b- Start all services:
docker-compose up -d- Verify services:
docker-compose psAll services should show "Up" status.
-
Open Open WebUI at
http://localhost:3000 -
Add the model provider:
- Click your profile β Settings
- Go to Connections
- Click + Add Connection
- Select OpenAI
- API Base URL:
http://model-integration:8098 - API Key:
dummy-key(any value works) - Click Verify Connection β Should show "β Connected"
- Click Save
-
Select a model:
- Start a new chat
- Click the model dropdown
- Select either:
- ab-mcts - Advanced tree search reasoning
- multi-model - Collaborative AI
Best for:
- Complex problem solving
- Multi-step reasoning
- Strategic planning
- Mathematical proofs
- Decision trees
Example queries:
- "Design a distributed caching system for a social media platform"
- "Prove that the square root of 2 is irrational"
- "What's the optimal strategy for a two-player game where..."
Note: Responses may take 30-120 seconds due to tree search exploration.
Best for:
- Comprehensive analysis
- Multiple perspectives
- Research questions
- Balanced viewpoints
- Faster responses
Example queries:
- "Compare microservices vs monolithic architectures"
- "Analyze the pros and cons of remote work"
- "Explain quantum computing to different audiences"
| Service | Port | Description |
|---|---|---|
| Open WebUI | 3000 | Main chat interface |
| Model Integration | 8098 | OpenAI-compatible model API |
| AB-MCTS Service | 8094 | TreeQuest AB-MCTS implementation |
| Multi-Model Service | 8090 | Multi-model collaboration |
| Backend API | 8095 | Management dashboard API |
| MCP Server | 8096 | Model Context Protocol bridge |
| Prometheus | 9090 | Metrics collection |
| Grafana | 3001 | Dashboards and visualization |
| HTTP Server | 8081 | Static interfaces |
Access the interactive dashboard at http://localhost:8081/dashboard.html
Features:
- Model Selection: Choose which Ollama models power each service
- Judge Configuration: Select 1-2 LLMs to evaluate solution quality
- Criterion Weights: Adjust importance of accuracy, completeness, clarity, and relevance
- Search Parameters: Configure iterations and max depth
- Tree Visualization: View AB-MCTS search trees (Sakana AI style)
- Run History: Browse past queries and their exploration trees
Configure via Dashboard or API:
curl -X POST http://localhost:8094/params/update \
-H "Content-Type: application/json" \
-d '{
"iterations": 20,
"max_depth": 5
}'Parameters:
iterations: Number of search iterations (1-100, default: 20)- Higher = better quality, slower response
- Recommended: 10-20 for most queries
max_depth: Maximum tree depth (1-20, default: 5)- Higher = deeper reasoning, slower response
- Recommended: 3-5 for most queries
AB-MCTS uses LLM judges to evaluate solution quality on 4 criteria:
Criteria:
- Accuracy: Is it factually correct?
- Completeness: Does it fully answer the question?
- Clarity: Is it well-explained and understandable?
- Relevance: Is it on-topic and addresses the query?
Configuration:
# Set judge models (1-2 recommended for consensus)
curl -X POST http://localhost:8094/judges/update \
-H "Content-Type: application/json" \
-d '{"judge_models": ["qwen3:0.6b"]}'
# Adjust criterion weights (auto-normalizes to 100%)
curl -X POST http://localhost:8094/weights/update \
-H "Content-Type: application/json" \
-d '{
"weights": {
"accuracy": 0.4,
"completeness": 0.3,
"clarity": 0.2,
"relevance": 0.1
}
}'Notes:
- Using 2 judges provides consensus and reduces bias
- Weights persist across restarts
- All settings are managed in the dashboard UI
Update which Ollama models each service uses:
# AB-MCTS models
curl -X POST http://localhost:8094/models/update \
-H "Content-Type: application/json" \
-d '{"models": ["llama3.2:latest", "qwen2.5:latest"]}'
# Multi-Model models
curl -X POST http://localhost:8090/models/update \
-H "Content-Type: application/json" \
-d '{"models": ["llama3.2:latest", "qwen2.5:latest", "deepseek-r1:1.5b"]}'Access Prometheus at http://localhost:9090
Key metrics:
model_integration_requests_total- Total requests by modelmodel_integration_success_total- Successful responsesmodel_integration_failures_total- Failed responsesmodel_integration_latency_seconds- Response time histogrammodel_integration_active_queries- Current active queries
Access Grafana at http://localhost:3001 (credentials: admin/admin)
Pre-configured dashboards:
- Request rates and success rates
- Latency percentiles (p50, p95, p99)
- Active query monitoring
- Error rates by type
- Service health status
All AB-MCTS runs are logged with complete search tree data:
logs/runs.db- SQLite indexlogs/runs/YYYYMMDD/run_<id>.jsonl- Event stream per runlogs/selected_models_abmcts.json- Persisted configuration
View in Dashboard:
- Go to
http://localhost:8081/dashboard.html - Click "Research Explorer" tab
- Click any run to view:
- Full hierarchical search tree visualization (Sakana AI style)
- Per-node quality scores and judge evaluations
- Model performance across iterations
- Complete response text for each node
Tree Visualization Features:
- D3.js interactive tree graph
- Color-coded by model and quality
- Zoom and pan navigation
- Click nodes to see full details
- Shows parent-child relationships
- Identifies best solution path
API Access:
- List runs:
GET http://localhost:8094/runs?limit=50 - Run details:
GET http://localhost:8094/runs/{run_id} - Tree data:
GET http://localhost:8094/runs/{run_id}/tree
Check model integration service:
curl http://localhost:8098/health
curl http://localhost:8098/v1/modelsVerify Open WebUI connection:
- Settings β Connections β Verify the connection shows "β Connected"
- Try refreshing the page
- Check browser console for errors
Reduce AB-MCTS iterations:
curl -X POST http://localhost:8095/api/config \
-d '{"ab_mcts_iterations": 10, "ab_mcts_max_depth": 3}'Use faster Ollama models:
ollama pull llama3.2:1b # Smaller, faster modelCheck Ollama performance:
time curl http://localhost:11434/api/generate \
-d '{"model":"llama3.2:latest","prompt":"test","stream":false}'Check all services are running:
docker-compose psView service logs:
docker logs model-integration
docker logs ab-mcts-service
docker logs multi-model-serviceRestart services:
docker-compose restart- Timeouts: AB-MCTS can take 30-120s on complex queries (streaming keeps UI responsive)
- Verbosity: AB-MCTS responses can be lengthy (working on length controls)
- Quality drift: Occasional hallucinations (add stricter validation)
OpenAI-Compatible Endpoints:
GET /v1/models- List available modelsPOST /v1/chat/completions- Chat completions
Management Endpoints:
GET /health- Health checkGET /metrics- Prometheus metricsGET /performance- Performance statisticsGET /config- Current configurationPOST /config- Update configuration
POST /query- Run AB-MCTS query- Body:
{"query": "...", "iterations": 20, "max_depth": 5}
- Body:
GET /models- List available modelsPOST /models/update- Update model selectionGET /health- Health checkGET /metrics- Prometheus metrics
POST /query- Run multi-model query- Body:
{"query": "..."}
- Body:
GET /models- List available modelsPOST /models/update- Update model selectionGET /health- Health checkGET /metrics- Prometheus metrics
- Scientific Data Enrichment Tool - Chemistry and materials science enrichment for Open WebUI (separate tool)
MIT License - See LICENSE file for details.
- Sakana AI for AB-MCTS research and TreeQuest
- Open WebUI for the chat interface
- Ollama for local LLM inference
- Prometheus & Grafana for observability
ARCHITECTURE.md- Detailed architecture and designAPI_REFERENCE.md- Complete API documentationDEPLOYMENT.md- Production deployment guidedocs/research/RESEARCH_GUIDE.md- Research and analysis guide