|
| 1 | +--- |
| 2 | +title: llms.py - Lightweight OpenAI compatible CLI and server gateway for multiple LLMs |
| 3 | +summary: Support for Text, Image and Audio generation. Seamlessly mix and match local models with premium cloud LLMs |
| 4 | +tags: [llms,ai,python] |
| 5 | +author: Demis Bellot |
| 6 | +image: ./img/posts/llms-py/bg.webp |
| 7 | +--- |
| 8 | + |
| 9 | +# Introducing llms.py π |
| 10 | + |
| 11 | +We're excited to announce the release of **[llms.py](https://github.com/ServiceStack/llms)** - a super lightweight CLI tool and OpenAI-compatible server |
| 12 | +that acts as a **configurable gateway** over multiple configurable Large Language Model (LLM) providers. |
| 13 | + |
| 14 | +As part of our work in developing a new OSS AI Generation platform we were in need of a lightweight |
| 15 | +LLM gateway for usage within [ComfyUI](https://www.comfy.org). Unfortunately the popular option for Python **litellm** |
| 16 | +is anything but - requiring [60 deps!](https://github.com/BerriAI/litellm/blob/main/requirements.txt) |
| 17 | +where its VC funded development sees its scope creep into Enterprise features. |
| 18 | + |
| 19 | +This is a deal breaker within an open Python plugin ecosystem like ComfyUI where every dependency has a chance to break |
| 20 | +a Python environment. It's also unnecessary for a simple CLI tool and server gateway where the hard work |
| 21 | +is already done by the LLM providers in their OpenAI-compatible APIs - a simplicity capitalized on by `llms.py`. |
| 22 | + |
| 23 | +## π― OpenRouter but Local |
| 24 | + |
| 25 | +**llms.py** is designed as a **unified gateway** that seamlessly connects you to multiple LLM providers |
| 26 | +through a single, consistent interface. Whether using cloud APIs or local models, `llms` provides |
| 27 | +intelligent routing and automatic failover to ensure your AI workflows connect to your chosen providers in your |
| 28 | +preferred priority - whether optimizing for cost, performance or availability. |
| 29 | + |
| 30 | +### β‘ Ultra-Lightweight Architecture |
| 31 | + |
| 32 | +- **Single File**: Just one [llms.py](https://github.com/ServiceStack/llms/blob/main/llms.py) file (easily customizable) |
| 33 | +- **Single Dependency**: Single `aiohttp` dependency |
| 34 | +- **Zero Dependencies for ComfyUI**: Ideal for use in ComfyUI Custom Nodes |
| 35 | +- **No Setup**: Just download and use, configure preferred LLMs in [llms.json](https://github.com/ServiceStack/llms/blob/main/llms.json) |
| 36 | + |
| 37 | + |
| 38 | +## π¦ Installation Options |
| 39 | + |
| 40 | +#### Option 1: PyPI Package |
| 41 | + |
| 42 | +:::sh |
| 43 | +pip install lapi |
| 44 | +::: |
| 45 | + |
| 46 | +#### Option 2: Direct Download |
| 47 | + |
| 48 | +For standalone use, download [llms.py](https://github.com/ServiceStack/llms/blob/main/llms.py) and make it executable: |
| 49 | + |
| 50 | +```bash |
| 51 | +curl -O https://raw.githubusercontent.com/ServiceStack/llms/main/llms.py |
| 52 | +chmod +x llms.py |
| 53 | +mv llms.py ~/.local/bin/llms |
| 54 | +``` |
| 55 | + |
| 56 | +Then install its only dependency: |
| 57 | + |
| 58 | +:::sh |
| 59 | +pip install aiohttp |
| 60 | +::: |
| 61 | + |
| 62 | +#### Ideal for usage with ComfyUI Custom Nodes |
| 63 | + |
| 64 | +Simply drop [llms.py](https://github.com/ServiceStack/llms/blob/main/llms.py) into your ComfyUI custom nodes directory - no additional dependencies required! |
| 65 | + |
| 66 | +## π§ Quick Start |
| 67 | + |
| 68 | +```bash |
| 69 | +# Initialize configuration (saved in ~/.llms/llms.json) |
| 70 | +llms --init |
| 71 | + |
| 72 | +# Enable providers |
| 73 | +llms --enable openrouter_free google_free groq openai anthropic grok |
| 74 | + |
| 75 | +# List available providers and models |
| 76 | +llms ls |
| 77 | +llms ls openrouter_free groq |
| 78 | + |
| 79 | +# Start chatting |
| 80 | +llms "Explain quantum computing in simple terms" |
| 81 | + |
| 82 | +# With preferred Model |
| 83 | +llms -m grok-4-fast "jq command to sort openai models by created" |
| 84 | + |
| 85 | +# With system prompt |
| 86 | +llms -s "You are a quantum computing expert" "Explain quantum computing" |
| 87 | + |
| 88 | +# Use with images |
| 89 | +llms --image photo.jpg "What's in this image?" |
| 90 | + |
| 91 | +# Use with audio |
| 92 | +llms --audio talk.mp3 "Transcribe this audio file" |
| 93 | + |
| 94 | +# With custom request template |
| 95 | +llms --chat request.json "Explain quantum computing in simple terms" |
| 96 | +llms --chat image-request.json --image photo.jpg "What's in this image?" |
| 97 | +llms --chat audio-request.json --audio talk.mp3 "Transcribe this audio file" |
| 98 | + |
| 99 | +# Set default model |
| 100 | +llms --default grok-4-fast |
| 101 | + |
| 102 | +# Run an Open AI Chat compatible server |
| 103 | +llms --server --port 8080 |
| 104 | +``` |
| 105 | +### π Configurable Multi-Provider Gateway |
| 106 | + |
| 107 | +Acts as an intelligent gateway that can route requests for 160+ models across: |
| 108 | + |
| 109 | +#### Cloud Providers with Free tiers |
| 110 | + |
| 111 | +- OpenRouter |
| 112 | +- Groq |
| 113 | +- Codestral |
| 114 | +- Google |
| 115 | + |
| 116 | +#### Premium Cloud Providers |
| 117 | + |
| 118 | + - OpenAI |
| 119 | + - Anthropic |
| 120 | + - Google |
| 121 | + - Grok |
| 122 | + - Qwen |
| 123 | + - Mistral |
| 124 | + |
| 125 | +#### Local Providers |
| 126 | + |
| 127 | +- Ollama |
| 128 | + - Restrict access to custom models |
| 129 | + - Or auto-discovery of installed models |
| 130 | + |
| 131 | +#### Custom Providers |
| 132 | + |
| 133 | +Use JSON config to add any OpenAI-compatible API endpoints and models |
| 134 | + |
| 135 | +### π Intelligent Request Routing |
| 136 | + |
| 137 | +- **Automatic Failover**: If one provider fails, automatically retry with the next available provider |
| 138 | +- **Cost Optimization**: Define free/cheap/local providers first to minimize costs |
| 139 | +- **Model Mapping**: Use unified model names that map to different provider-specific names. |
| 140 | + |
| 141 | +## π Key Features |
| 142 | + |
| 143 | +### Multi-Modal Support |
| 144 | +- **Text Generation**: Chat completions with any supported model |
| 145 | +- **Vision Models**: Process images through vision-capable models (GPT-4V, Gemini Vision, etc.) |
| 146 | +- **Audio Processing**: Handle audio inputs through audio-capable models |
| 147 | + |
| 148 | +### Flexible Deployment Options |
| 149 | +- **CLI Tool**: Interactive command-line interface for quick queries |
| 150 | +- **HTTP Server**: OpenAI-compatible server at `http://localhost:{PORT}/v1/chat/completions` |
| 151 | +- **Python Module**: Import and use programmatically in your applications |
| 152 | +- **ComfyUI Node**: Embed directly in ComfyUI workflows |
| 153 | + |
| 154 | +### Simple and Customizable |
| 155 | +- **Environment Variables**: Secure API key management |
| 156 | +- **Provider Management**: Easy enable/disable of providers |
| 157 | +- **Custom Models**: Define your own model aliases and mappings |
| 158 | +- **Unified Configuration**: Single [llms.json](https://github.com/ServiceStack/llms/blob/main/llms.json) to configure all providers and models |
| 159 | + |
| 160 | +## π― Use Cases |
| 161 | + |
| 162 | +### For Developers |
| 163 | +- **API Gateway**: Centralize all LLM provider access through one endpoint |
| 164 | +- **Cost Management**: Automatically route to cheapest available providers |
| 165 | +- **Reliability**: Built-in failover ensures high availability |
| 166 | +- **Testing**: Easily switch between models and providers for comparison |
| 167 | + |
| 168 | +### For ComfyUI Users |
| 169 | +- **Hybrid Workflows**: Combine local Ollama models with cloud APIs |
| 170 | +- **Zero Setup**: No dependency management headaches |
| 171 | +- **Provider Flexibility**: Switch providers without changing your workflow |
| 172 | +- **Cost Control**: Use free tiers and local models when possible |
| 173 | + |
| 174 | +### For Enterprises |
| 175 | +- **Vendor Independence**: Avoid lock-in to any single LLM provider |
| 176 | +- **Scalability**: Distribute load across multiple providers |
| 177 | +- **Compliance**: Keep sensitive data local while using cloud for general tasks |
| 178 | +- **Budget Control**: Intelligent routing to optimize costs |
| 179 | + |
| 180 | + |
| 181 | +## π Why llms.py? |
| 182 | + |
| 183 | +1. **Simplicity**: One file, one dependency, infinite possibilities |
| 184 | +2. **Flexibility**: Works with any OpenAI-compatible client or framework |
| 185 | +3. **Reliability**: Automatic failover ensures your workflows never break |
| 186 | +4. **Economy**: Intelligent routing minimizes API costs |
| 187 | +5. **Privacy**: Mix local and cloud models based on your data sensitivity |
| 188 | +6. **Future-Proof**: Easily add new providers as they emerge |
| 189 | + |
| 190 | +**llms.py** transforms the complexity of managing multiple LLM providers into a simple, unified experience. |
| 191 | +Whether you're researching capabilities of new models, building the next breakthrough AI application or just want |
| 192 | +reliable access to the best models available, llms.py has you covered. |
| 193 | + |
| 194 | +Get started today and avoid expensive cloud lock-ins with the freedom of provider-agnostic AI development! π |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +**Links:** |
| 199 | +- π [Documentation & Examples](https://github.com/ServiceStack/llms) |
| 200 | +- π¦ [PyPI Package](https://pypi.org/project/lapi/) |
| 201 | +- π§ [Source Code](https://github.com/ServiceStack/llms) |
0 commit comments