Neutree is an open-source Large Language Model (LLM) infrastructure management platform.
- Multi-cluster Management: Deploy and manage inference workloads across Kubernetes clusters and static node clusters (Ray + Docker)
- OpenAI-compatible API: Unified inference gateway with API key authentication and usage tracking
- Multi-tenancy: Workspace-based resource isolation with fine-grained RBAC
- Production-ready Observability: Integrated metrics collection and Grafana dashboards
- Flexible Model Storage: Support for HuggingFace Hub and file-based model registries
Visit docs.neutree.ai for installation guides, tutorials, and API references.
Technical design documents for contributors are available in the docs/ directory:
- Architecture Overview
- Cluster Management
- Online Inference
- Model Registry
- User Management
- RBAC and Workspace
- Cluster Monitoring
Prerequisites
- Go 1.23+
- Docker
- Make
Common workflows
# Build all binaries
make build
# Run unit tests
make test
# Run linter
make lint
# Run database tests
make db-test
# Quick iteration: rebuild and restart local containers
make docker-test-api
make docker-test-core- More accelerator support (e.g., Intel XPU)
- Inference endpoint auto-scaling
- External KV cache integration
- Quota and usage limits
- GPU memory hard isolation
- More inference engine adapters
- External endpoint support for unified management of local and external model services
- GitHub Issues - Bug reports and feature requests
- Discussions - Questions and community support
Neutree is licensed under the Apache License 2.0.
