Open Notebook supports OpenAI-compatible API endpoints across all AI modalities (language models, embeddings, speech-to-text, and text-to-speech), giving you the flexibility to use popular tools like LM Studio, Text Generation WebUI, vLLM, and custom inference servers.
- 🆓 Cost Flexibility: Use free local inference or choose cost-effective cloud providers
- 🔒 Privacy Control: Run models locally or choose privacy-focused hosted services
- 🎯 Model Selection: Access to thousands of open-source models
- ⚡ Performance Tuning: Optimize inference for your specific hardware
- 🔧 Full Control: Deploy on your infrastructure with your configurations
- 🌐 Universal Standard: Works with any service that implements the OpenAI API specification
For LM Studio (simplest):
# Start LM Studio and enable server mode on port 1234
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
# Most LM Studio endpoints don't require an API key
# export OPENAI_COMPATIBLE_API_KEY=not_neededFor Text Generation WebUI:
# Start with --api flag
# python server.py --api --listen
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:5000/v1For vLLM:
# Start vLLM server
# vllm serve MODEL_NAME --port 8000
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:8000/v1Use different endpoints for different capabilities:
# Language models on LM Studio
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1
# Embeddings on a dedicated embedding server
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1
# Speech services on a different server
export OPENAI_COMPATIBLE_BASE_URL_STT=http://localhost:9000/v1
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1🎙️ Want free, local text-to-speech? Check our Local TTS Setup Guide for completely private, zero-cost podcast generation!
Use these when you want the same endpoint for all modalities:
| Variable | Purpose | Required |
|---|---|---|
OPENAI_COMPATIBLE_BASE_URL |
Base URL for all AI services | Yes (unless using mode-specific) |
OPENAI_COMPATIBLE_API_KEY |
API key if endpoint requires auth | Optional |
Example:
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
export OPENAI_COMPATIBLE_API_KEY=your_key_here # If neededUse these when you want different endpoints for different capabilities:
| Variable | Purpose | Modality |
|---|---|---|
OPENAI_COMPATIBLE_BASE_URL_LLM |
Language model endpoint | Language models |
OPENAI_COMPATIBLE_API_KEY_LLM |
API key for LLM endpoint | Language models |
OPENAI_COMPATIBLE_BASE_URL_EMBEDDING |
Embedding model endpoint | Embeddings |
OPENAI_COMPATIBLE_API_KEY_EMBEDDING |
API key for embedding endpoint | Embeddings |
OPENAI_COMPATIBLE_BASE_URL_STT |
Speech-to-text endpoint | Speech-to-Text |
OPENAI_COMPATIBLE_API_KEY_STT |
API key for STT endpoint | Speech-to-Text |
OPENAI_COMPATIBLE_BASE_URL_TTS |
Text-to-speech endpoint | Text-to-Speech |
OPENAI_COMPATIBLE_API_KEY_TTS |
API key for TTS endpoint | Text-to-Speech |
Precedence: Mode-specific variables override the generic OPENAI_COMPATIBLE_BASE_URL
Example:
# LLM on LM Studio
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1
# Embeddings on dedicated server
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1
export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=secret_key_hereWhat is LM Studio? LM Studio is a desktop application for running large language models locally with a user-friendly interface.
Setup Steps:
-
Download and install LM Studio from lmstudio.ai
-
Download a model (e.g., Llama 3, Qwen, Mistral)
-
Start the local server:
- Go to the "Local Server" tab
- Click "Start Server"
- Note the port (default: 1234)
-
Configure Open Notebook:
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1What works:
- ✅ Language models (chat, completions)
- ✅ Embeddings (with embedding models)
- ❌ Speech-to-text (not supported)
- ❌ Text-to-speech (not supported)
Tips:
- LM Studio doesn't require an API key
- Choose quantized models (Q4, Q5) for better performance
- Monitor RAM usage - larger models need more memory
What is Text Generation WebUI? A powerful Gradio-based web interface for running Large Language Models.
Setup Steps:
- Install following official instructions
- Download a model using the UI or manually
- Start with API mode:
python server.py --api --listen- Configure Open Notebook:
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:5000/v1What works:
- ✅ Language models (excellent support)
- ✅ Embeddings (with compatible models)
- ❌ Speech services (not supported)
Tips:
- Use
--listento accept connections from Docker - Supports more model formats than LM Studio
- Great for fine-tuned models
What is vLLM? High-performance inference server optimized for serving large language models at scale.
Setup Steps:
- Install vLLM:
pip install vllm- Start the server:
vllm serve meta-llama/Llama-3-8B-Instruct --port 8000- Configure Open Notebook:
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:8000/v1What works:
- ✅ Language models (optimized inference)
- ✅ Embeddings (with embedding models)
- ❌ Speech services (not supported)
Tips:
- Best performance for production deployments
- Supports tensor parallelism for large models
- Excellent for high-throughput scenarios
Many services implement the OpenAI API specification:
Examples:
- Together AI: Cloud-hosted models
- Anyscale Endpoints: Ray-based inference
- Replicate: Cloud model hosting
- LocalAI: Self-hosted alternative to OpenAI
- FastChat: Multi-model serving
Configuration:
# Generic setup
export OPENAI_COMPATIBLE_BASE_URL=https://api.your-service.com/v1
export OPENAI_COMPATIBLE_API_KEY=your_api_key_hereUse Case: Running LM Studio for language models only
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1Result:
- ✅ Language models available
- ✅ Embeddings available (if model supports)
- ✅ Speech services available (if endpoint supports)
- All use the same endpoint
Use Case: Language models on LM Studio, embeddings on dedicated server
# Language models on LM Studio
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1
# Embeddings on specialized server
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1
export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=embedding_key_hereResult:
- ✅ Language models use LM Studio (port 1234)
- ✅ Embeddings use specialized server (port 8080)
- ❌ Speech services not available (not configured)
Use Case: Local models for privacy, cloud for specialized tasks
# Local LLM (privacy-sensitive work)
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1
# Cloud embeddings (better quality)
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=https://api.cloud-provider.com/v1
export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=cloud_key_here
# Cloud speech services
export OPENAI_COMPATIBLE_BASE_URL_TTS=https://api.cloud-provider.com/v1
export OPENAI_COMPATIBLE_API_KEY_TTS=cloud_key_hereResult:
- ✅ Sensitive chat stays local
- ✅ High-quality embeddings from cloud
- ✅ Professional TTS from cloud
- 🔒 Privacy for conversations, cloud for non-sensitive features
Use Case: Open Notebook in Docker, LM Studio on host machine
On macOS/Windows:
export OPENAI_COMPATIBLE_BASE_URL=http://host.docker.internal:1234/v1On Linux:
# Use host networking or find host IP
export OPENAI_COMPATIBLE_BASE_URL=http://172.17.0.1:1234/v1
# or use --network host in docker runImportant:
- LM Studio must be set to listen on
0.0.0.0, not justlocalhost - In LM Studio settings, enable "Allow network connections"
Problem: Docker containers can't reach localhost on the host
Solutions:
Option 1: Use host.docker.internal (Mac/Windows)
export OPENAI_COMPATIBLE_BASE_URL=http://host.docker.internal:1234/v1Option 2: Use host IP address (Linux)
# Find host IP
ip addr show docker0 | grep inet
# Use in environment
export OPENAI_COMPATIBLE_BASE_URL=http://172.17.0.1:1234/v1Option 3: Host networking (Linux only)
docker run --network host \
-v ./notebook_data:/app/data \
-e OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1 \
lfnovo/open_notebook:v1-latest-singleUse Case: OpenAI-compatible service on a different machine
# Replace with your server's IP or hostname
export OPENAI_COMPATIBLE_BASE_URL=http://192.168.1.100:1234/v1Security Notes:
- Only use on trusted networks
- Consider using HTTPS for production
- Implement API key authentication if possible
- Use firewall rules to restrict access
If you're running your OpenAI-compatible service behind a reverse proxy with self-signed SSL certificates (e.g., Caddy, nginx with custom certs), you may encounter SSL verification errors:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate
Connection error.
Solutions:
Option 1: Use a custom CA bundle (recommended)
# Point to your CA certificate file
export ESPERANTO_SSL_CA_BUNDLE=/path/to/your/ca-bundle.pemOption 2: Disable SSL verification (development only)
# WARNING: Only use in trusted development environments
export ESPERANTO_SSL_VERIFY=falseDocker Compose example with SSL configuration:
services:
open-notebook:
image: lfnovo/open_notebook:v1-latest-single
environment:
- OPENAI_COMPATIBLE_BASE_URL=https://lmstudio.local:1234/v1
# Option 1: Custom CA bundle
- ESPERANTO_SSL_CA_BUNDLE=/certs/ca-bundle.pem
# Option 2: Disable verification (dev only)
# - ESPERANTO_SSL_VERIFY=false
volumes:
- /path/to/your/ca-bundle.pem:/certs/ca-bundle.pem:roSecurity Note: Disabling SSL verification exposes you to man-in-the-middle attacks. Always prefer using a custom CA bundle in production environments.
Problem: Default port (1234) is already in use
Solution: Change the port in your inference server
LM Studio:
- Settings → Local Server → Port → Change to different port
Then update environment:
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:8888/v1Symptom: "Connection refused" or "Could not connect to endpoint"
Solutions:
-
Verify server is running:
curl http://localhost:1234/v1/models
-
Check firewall settings: Ensure the port is not blocked
-
For Docker: Use
host.docker.internalinstead oflocalhost -
Check server binding: Server must listen on
0.0.0.0, not just127.0.0.1
Symptom: "Model not found" or "No models available"
Solutions:
- Verify model is loaded in your inference server
- Check model name matches what Open Notebook expects
- For LM Studio: Ensure model is loaded in the local server tab
- Test endpoint:
curl http://localhost:1234/v1/models
Symptom: Responses take a long time
Solutions:
- Use quantized models (Q4, Q5 instead of full precision)
- Check RAM usage: Model might be swapping to disk
- Reduce context length: Smaller context = faster inference
- Enable GPU acceleration: If available
- For vLLM: Enable tensor parallelism for large models
Symptom: "Unauthorized" or "Invalid API key"
Solutions:
-
Set API key if your endpoint requires it:
export OPENAI_COMPATIBLE_API_KEY=your_key_here -
Check key validity: Test with curl:
curl -H "Authorization: Bearer YOUR_KEY" \ http://localhost:1234/v1/models -
For mode-specific: Use the correct key variable:
export OPENAI_COMPATIBLE_API_KEY_LLM=llm_key export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=embedding_key
Symptom: Connection works locally but not from Docker
Solutions:
-
Use
host.docker.internal(Mac/Windows):export OPENAI_COMPATIBLE_BASE_URL=http://host.docker.internal:1234/v1 -
On Linux: Use host IP or
--network host -
Check server listening: Must listen on
0.0.0.0:1234, not127.0.0.1:1234 -
Test from inside container:
docker exec -it open-notebook curl http://host.docker.internal:1234/v1/models
Symptom: Search or embeddings fail
Solutions:
- Verify embedding model is loaded: Many inference servers need explicit embedding model setup
- Use dedicated embedding endpoint: If available
- Check model compatibility: Not all models support embeddings
- For LM Studio: Load an embedding model separately
Symptom: Language models work, but embeddings or speech don't
Solution: Use mode-specific configuration:
# What works
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1
# For embeddings, use a different provider
export OPENAI_API_KEY=your_openai_key # Fallback to OpenAI for embeddings-
API Keys:
- Use environment variables, never hardcode
- Rotate keys regularly for cloud services
- Use different keys for different services
-
Network:
- Only expose on trusted networks
- Use HTTPS in production
- Implement firewall rules
-
Data Privacy:
- Use local models for sensitive data
- Check service privacy policies
- Understand data retention policies
-
Model Selection:
- Quantized models (Q4, Q5) for better speed/memory trade-off
- Smaller models for simple tasks
- Larger models only when needed
-
Resource Management:
- Monitor RAM and GPU usage
- Use appropriate batch sizes
- Consider model caching strategies
-
Network:
- Use local endpoints when possible for lower latency
- For cloud: Choose geographically close servers
-
Fallback Strategy:
# Primary: Local LLM export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1 # Fallback: Use OpenAI if local is unavailable export OPENAI_API_KEY=your_backup_key
-
Health Checks:
- Periodically test endpoints
- Monitor server status
- Set up alerts for downtime
-
Testing:
- Test configuration before production
- Validate all required modalities work
- Check error handling
OpenAI-Compatible Setups:
- Local TTS Setup - Free, private text-to-speech for podcasts
- Ollama Setup - Local language models and embeddings
- AI Models Guide - Complete model configuration overview
Community Resources:
- Open Notebook Discord - Get help with Open Notebook integration
- LM Studio Discord - LM Studio-specific support
- Text Generation WebUI GitHub - Issues and discussions
Debugging Steps:
- Test endpoint directly with curl before configuring Open Notebook
- Check Open Notebook logs for detailed error messages
- Verify environment variables are set correctly
- Test with simple requests first (list models, simple completion)
Common curl tests:
# List models
curl http://localhost:1234/v1/models
# Test completion
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "your-model",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Test embeddings
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "embedding-model",
"input": "Test text"
}'This guide should help you successfully configure OpenAI-compatible providers with Open Notebook. For general AI model configuration, see the AI Models Guide.