Skip to content

Commit 11263dc

Browse files
Fix compat models push to LiteLLM and update docs
- Fix compat models not registering in LiteLLM by resolving mapped provider/model - Make _build_litellm_params async to query database for compat model mappings - Add session parameter to push_model_to_litellm and update_model_in_litellm - Compat models now correctly resolve to their mapped provider's api_base and model string - Set docker-compose.yml to use frontend.api:create_app instead of legacy web - Clean up legacy documentation from CLAUDE.md - Remove obsolete planning documents Fixes issue where code-davinci-002, gpt-4-code, gpt-4-turbo-vision, and gpt-4o-vision compat models were not appearing in LiteLLM after push. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 94f285b commit 11263dc

File tree

6 files changed

+163
-1588
lines changed

6 files changed

+163
-1588
lines changed

CLAUDE.md

Lines changed: 106 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,21 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
LiteLLM Updater now runs as two FastAPI services built from shared code:
7+
LiteLLM Updater runs as two FastAPI services built from shared code:
88
- `backend/` → headless sync worker (`backend/sync_worker.py`) that fetches provider models and can push them into LiteLLM on a schedule.
99
- `frontend/` → UI + API (`frontend/api.py`) for manual fetch/push/sync and CRUD over providers/models.
1010

1111
Data lives in `./data/models.db` (SQLite) mounted into both services. Docker Compose also brings up LiteLLM (`http://localhost:4000`, API key `sk-1234` by default) and the UI on `http://localhost:4001`.
1212

1313
## Terminology (keep consistent in UI + API)
14-
- Fetch: only pull models from providers into the local database.
15-
- Push: send database models to LiteLLM (deduped, no new fetch).
16-
- Sync: fetch + push in one operation.
14+
- **Fetch**: only pull models from providers into the local database.
15+
- **Push**: send database models to LiteLLM (deduped, no new fetch).
16+
- **Sync**: fetch + push in one operation.
1717

18-
## Current layout
18+
## Current Layout
1919
- `shared/`: database models/CRUD, source fetchers, normalization, tags, and config helpers shared by both services.
2020
- `backend/`: provider sync pipeline, LiteLLM client, and the scheduler entrypoint (`python -m backend.sync_worker`).
2121
- `frontend/`: FastAPI UI, templates, and the provider/model routes that call into `backend.provider_sync`.
22-
- `litellm_updater/`: legacy entrypoint kept for compatibility; most logic now lives under `shared/` + `backend/`.
2322

2423
## Development Commands
2524

@@ -32,13 +31,30 @@ pip install -e .
3231
pip install -e ".[dev]"
3332
```
3433

35-
### Running the service
34+
### Running Services Locally
35+
36+
**Frontend (development):**
3637
```bash
37-
# Using the CLI entrypoint
38-
PORT=8000 litellm-updater
38+
PORT=8000 uvicorn frontend.api:create_app --factory --host 0.0.0.0 --port 8000 --reload
39+
```
3940

40-
# Or using uvicorn directly
41-
PORT=8000 uvicorn litellm_updater.web:create_app --host 0.0.0.0 --port $PORT
41+
**Backend worker (development):**
42+
```bash
43+
python -m backend.sync_worker
44+
```
45+
46+
### Docker Deployment
47+
48+
```bash
49+
# Build images
50+
docker compose build --no-cache model-updater-backend model-updater-web
51+
52+
# Start all services
53+
docker compose up -d
54+
55+
# View logs
56+
docker compose logs -f model-updater-web
57+
docker compose logs -f model-updater-backend
4258
```
4359

4460
### Testing
@@ -54,15 +70,6 @@ cp tests/example.env tests/.env
5470
# Edit tests/.env with TEST_OLLAMA_URL, TEST_OPENAI_URL and optional API keys
5571
```
5672

57-
### Docker
58-
```bash
59-
60-
# Using docker-compose
61-
cp example.env .env
62-
docker compose --env-file .env build --no-cache model-updater-backend model-updater-web
63-
docker compose --env-file .env up -d
64-
```
65-
6673
### Linting
6774
```bash
6875
# Run ruff for linting/formatting
@@ -73,12 +80,14 @@ ruff format .
7380
## Deployment & Live Testing
7481

7582
Compose brings up:
76-
- `model-updater-backend`: sync worker (no HTTP).
77-
- `model-updater-web`: UI/API on `http://localhost:4001`.
83+
- `model-updater-backend`: sync worker (no HTTP). Runs `python -m backend.sync_worker`.
84+
- `model-updater-web`: UI/API on `http://localhost:4001`. Runs `frontend.api:create_app` via uvicorn.
7885
- `litellm`: target proxy on `http://localhost:4000` (`Authorization: Bearer sk-1234`).
7986
- `db`: Postgres backing LiteLLM.
8087
- `watchtower`: optional image updater (labelled).
8188

89+
**IMPORTANT:** The `model-updater-web` service MUST use `command: uvicorn frontend.api:create_app --factory --host 0.0.0.0 --port 8000` in docker-compose.yml to run the correct application.
90+
8291
Rebuild + relaunch after code changes:
8392
```bash
8493
docker compose build --no-cache model-updater-backend model-updater-web
@@ -88,14 +97,15 @@ docker compose up -d
8897
Quick checks:
8998
```bash
9099
docker compose ps
91-
curl -s http://localhost:4001/health
100+
curl -s http://localhost:4001/sources # Check if UI is accessible
92101
curl -s -H "Authorization: Bearer sk-1234" http://localhost:4000/health/liveliness
93102
docker compose logs --tail=50 model-updater-web
94103
docker compose logs --tail=50 model-updater-backend
95104
```
96105

97-
## Operational notes
98-
- Fetch = load models from providers into the database, Push = register existing DB models into LiteLLM, Sync = fetch + push. UI buttons and routes follow this naming (`/api/providers/fetch-all`, `/api/providers/sync-all`, per-provider Fetch/Sync/Push).
106+
## Operational Notes
107+
- Fetch = load models from providers into the database, Push = register existing DB models into LiteLLM, Sync = fetch + push.
108+
- UI buttons and routes follow this naming: `/api/providers/fetch-all`, `/api/providers/sync-all`, per-provider Fetch/Sync/Push.
99109
- LiteLLM pushes dedupe by lowercasing `unique_id` and pruning duplicates before registration; per-provider Push and Push All avoid re-adding existing models.
100110
- Ollama details: by default only `/api/tags` is fetched. Set `FETCH_OLLAMA_DETAILS=true` to pull `/api/show` per model; heavy fields (tensors/modelfile/license/etc.) are stripped before storing to keep memory usage low.
101111

@@ -139,7 +149,7 @@ docker compose logs --tail=50 model-updater-backend
139149
**Synchronization** (`backend/provider_sync.py`, `backend/sync_worker.py`)
140150
- `sync_provider()` handles fetch + DB upsert + optional LiteLLM push. Uses `_clean_ollama_payload` for heavy models and honors `push_to_litellm` flag.
141151
- `sync_worker.py` schedules periodic syncs using provider defaults (`sync_enabled` flag + interval from config).
142-
- Manual UI endpoints call into the same `sync_provider` with explicit fetch/push/sync semantics.
152+
- Manual UI endpoints in `frontend/api.py` call into `backend.provider_sync` with explicit fetch/push/sync semantics.
143153

144154
**LiteLLM Integration** (`backend/litellm_client.py`)
145155
- Model registration: `POST /model/new` with `{model_name, litellm_params, model_info}`
@@ -149,10 +159,10 @@ docker compose logs --tail=50 model-updater-backend
149159
- Model listing: `GET /model/info` returns complete model data including database UUIDs
150160

151161
**Web Layer** (`frontend/api.py` + `frontend/templates/`)
152-
- FastAPI UI surfaced on `:4001` via Docker.
162+
- FastAPI application served on `:4001` via Docker.
153163
- Database initialization in lifespan context manager (uses `shared/database.init_session_maker` + migrations).
154164
- Provider/model routes wrap `backend.provider_sync` for fetch/push/sync actions and expose per-provider + global buttons in `/sources`.
155-
- Admin page uses modal dialogs for add/edit provider.
165+
- Admin page at `/admin` uses modal dialogs for add/edit provider.
156166

157167
**Provider Management API:**
158168
- `GET /api/providers` - List all providers from database
@@ -167,53 +177,60 @@ docker compose logs --tail=50 model-updater-backend
167177

168178
**Model Management API:**
169179
- `GET /api/providers/{id}/models` - Get models for provider (with orphan filtering)
170-
- `GET /api/models/db/{id}` - Get specific model by database ID
171-
- `POST /api/models/db/{id}/params` - Update model user parameters
172-
- `DELETE /api/models/db/{id}/params` - Reset to provider defaults
173-
- `POST /api/models/db/{id}/refresh` - Refresh single model from provider
174-
- `POST /api/models/db/{id}/push` - Push single model to LiteLLM with effective params
180+
- `GET /api/models/{id}` - Get specific model by database ID
181+
- `PATCH /api/models/{id}/params` - Update model user parameters
182+
- `DELETE /api/models/{id}/params` - Reset to provider defaults
183+
- `POST /api/models/{id}/refresh` - Refresh single model from provider
184+
- `POST /api/models/{id}/push` - Push single model to LiteLLM with effective params
175185
- `POST /api/models/push-all` - Push all non-orphaned models to LiteLLM
186+
- `POST /api/models/db/reset-all` - Delete all models from database
176187

177-
**Legacy/Compatibility API:**
178-
- `POST /sync` - Manual sync trigger (uses database session)
179-
- `/models/show?source=X&model=Y` - Fetch Ollama model details on demand
180-
- `/api/sources`, `/api/models` - JSON APIs (SyncState-based)
181-
182-
> The flows below describe the legacy `litellm_updater` entrypoint. The Docker services now route through `backend/provider_sync.py` and `frontend/api.py`, but the database behaviors (upserts, orphan handling, effective params) remain the same.
188+
**Compatibility Models API:**
189+
- `GET /api/compat/models` - List all compat models
190+
- `POST /api/compat/models` - Create new compat model mapping
191+
- `PUT /api/compat/models/{id}` - Update compat model
192+
- `DELETE /api/compat/models/{id}` - Delete compat model
193+
- `POST /api/compat/register-defaults` - Register default OpenAI model mappings
183194

184195
### Key Data Flow
185196

186197
**Initial Setup:**
187198
1. User adds providers in `/admin` (stored in database)
188199

189200
**Synchronization Flow:**
190-
1. Scheduler (or manual `/sync` trigger) calls `sync_once()` with database session
201+
1. Backend worker or manual trigger calls `sync_provider()` from `backend/provider_sync.py`
191202
2. For each provider:
192203
- `fetch_source_models()` retrieves raw model list from provider
193204
- Each raw model is normalized via `ModelMetadata.from_raw()`
194205
- `upsert_model()` creates or updates model in database
195206
- User-edited parameters (`user_params`) are preserved during update
196207
- Models not in fetch are marked as `is_orphaned = True`
197-
- If LiteLLM configured, models are POSTed to `/model/new`
198-
3. Results also stored in `SyncState` for backward compatibility
208+
- If LiteLLM configured and `push_to_litellm=True`, models are POSTed to `/model/new`
199209

200210
**Model Management Flow:**
201211
1. User views providers/models at `/sources` (loads from database via API)
202212
2. Orphaned models displayed in RED, modified models in BLUE
203-
3. Per-model actions:
204-
- **Refresh**: Fetches latest data from provider, updates database with `full_update=True`
205-
- **Edit Params**: Updates `user_params` (preserved across syncs), sets `user_modified=True`
206-
- **Push to LiteLLM**: Sends single model with `effective_params` and proper tags
207-
4. Bulk actions:
213+
3. Per-provider actions:
214+
- **Fetch**: Fetches models from provider into database (no LiteLLM push)
215+
- **Sync**: Fetches models from provider + pushes to LiteLLM
216+
- **Push**: Pushes existing database models to LiteLLM (no fetch)
217+
4. Per-model actions:
218+
- **Configure**: Opens modal to edit parameters, tags, pricing, sync settings
219+
- **Refresh from Provider**: Fetches latest data from provider, updates database with `full_update=True`
220+
- **Save & Push to LiteLLM**: Saves config and immediately pushes to LiteLLM
221+
- **Delete**: Removes model from database
222+
5. Global actions:
223+
- **Fetch All Providers**: Fetches all enabled providers into database
224+
- **Sync All Providers**: Fetches + pushes all enabled providers
208225
- **Push All to LiteLLM**: Pushes all non-orphaned models with tags (`lupdater`, `provider:*`, `type:*`)
209-
- **Sync All Providers**: Fetches models from all providers, updates database with `full_update=False`
210-
5. LiteLLM page at `/litellm` shows models with tag filtering:
226+
- **Reset Model Database**: Deletes all models from database
227+
6. LiteLLM page at `/litellm` shows models with tag filtering:
211228
- Click tag buttons to filter models by tags (OR logic for multiple tags)
212229
- Tags include: `lupdater`, `provider:<name>`, `type:<ollama|litellm>`
213230

214231
**Database Schema:**
215-
- **Providers**: id, name, base_url, type, prefix, default_ollama_mode, api_key
216-
- **Models**: id, provider_id, model_id, litellm_params, user_params, is_orphaned, user_modified, first_seen, last_seen
232+
- **Providers**: id, name, base_url, type, prefix, default_ollama_mode, api_key, sync_enabled
233+
- **Models**: id, provider_id, model_id, litellm_params, user_params, is_orphaned, user_modified, sync_enabled, tags, pricing, first_seen, last_seen
217234

218235
### Important Patterns
219236

@@ -312,34 +329,32 @@ model_info = {
312329
**Ollama Payload Cleaning**
313330
- The `/api/show` endpoint returns very large responses (tensors, full modelfile)
314331
- Always use `_clean_ollama_payload()` before storing/caching Ollama responses
315-
- Cleaned payload is used in `ModelDetailsCache` and returned by `/models/show`
332+
- Cleaned payload is used in `ModelDetailsCache` and returned by API
316333

317334
**URL Normalization**
318335
- All URLs stored as Pydantic `HttpUrl` type
319336
- Use `normalized_base_url` property to get string without trailing slash for path joining
320337
- Don't manually strip slashes; use the property
321338

322339
**Thread Safety**
323-
- `SyncState` and `ModelDetailsCache` use asyncio locks (`asyncio.Lock()`)
324-
- Always use `async with self._lock` pattern when accessing/modifying shared state
325340
- Database sessions are async-safe via SQLAlchemy async engine
341+
- Use proper async/await patterns throughout
326342

327343
## Configuration Notes
328344

329-
**NEW: Providers are now in the database!**
345+
**Providers are managed in the database**
330346

331-
The `data/config.json` schema (reduced):
347+
The `data/config.json` schema (minimal):
332348
```json
333349
{
334350
"litellm": {"base_url": "http://localhost:4000", "api_key": null},
335-
"sources": [],
336351
"sync_interval_seconds": 300
337352
}
338353
```
339354

340355
- `sync_interval_seconds`: 0 = disabled, minimum 30 when enabled
341356
- `litellm.base_url`: Can be null to disable LiteLLM registration (still fetches models)
342-
- **`sources` array is legacy** - providers are now managed in database
357+
- Providers are managed in database, not config file
343358

344359
**Database Schema (`data/models.db`):**
345360

@@ -349,10 +364,11 @@ CREATE TABLE providers (
349364
id INTEGER PRIMARY KEY,
350365
name VARCHAR UNIQUE NOT NULL,
351366
base_url VARCHAR NOT NULL,
352-
type VARCHAR NOT NULL, -- 'ollama' or 'litellm'
367+
type VARCHAR NOT NULL, -- 'ollama', 'litellm', or 'compat'
353368
api_key VARCHAR,
354369
prefix VARCHAR, -- e.g., 'mks-ollama'
355370
default_ollama_mode VARCHAR, -- 'ollama' or 'openai'
371+
sync_enabled BOOLEAN NOT NULL DEFAULT TRUE,
356372
created_at DATETIME NOT NULL,
357373
updated_at DATETIME NOT NULL
358374
);
@@ -373,7 +389,12 @@ CREATE TABLE models (
373389
litellm_params TEXT NOT NULL, -- JSON object (provider defaults)
374390
raw_metadata TEXT NOT NULL, -- JSON object (full raw response)
375391
user_params TEXT, -- JSON object (user edits)
392+
user_tags TEXT, -- JSON array (user-defined tags)
376393
ollama_mode VARCHAR, -- Per-model override
394+
sync_enabled BOOLEAN NOT NULL DEFAULT TRUE,
395+
pricing_profile VARCHAR, -- e.g., 'gpt-4o', 'whisper-1'
396+
pricing_override TEXT, -- JSON object {input_cost_per_token, output_cost_per_token}
397+
access_groups TEXT, -- JSON array (for LiteLLM access control)
377398
first_seen DATETIME NOT NULL,
378399
last_seen DATETIME NOT NULL,
379400
is_orphaned BOOLEAN NOT NULL DEFAULT FALSE,
@@ -385,6 +406,19 @@ CREATE TABLE models (
385406
);
386407
```
387408

409+
Compat Models table:
410+
```sql
411+
CREATE TABLE compat_models (
412+
id INTEGER PRIMARY KEY,
413+
model_name VARCHAR UNIQUE NOT NULL, -- e.g., 'gpt-4', 'gpt-3.5-turbo'
414+
mapped_provider_id INTEGER REFERENCES providers(id) ON DELETE CASCADE,
415+
mapped_model_id VARCHAR, -- model_id in the models table
416+
access_groups TEXT, -- JSON array
417+
created_at DATETIME NOT NULL,
418+
updated_at DATETIME NOT NULL
419+
);
420+
```
421+
388422
## Provider Management
389423

390424
### Adding New Providers
@@ -397,7 +431,7 @@ CREATE TABLE models (
397431

398432
**Via API:**
399433
```bash
400-
curl -X POST http://localhost:8000/admin/providers \
434+
curl -X POST http://localhost:4001/admin/providers \
401435
-F "name=my-ollama" \
402436
-F "base_url=http://localhost:11434" \
403437
-F "type=ollama" \
@@ -409,24 +443,24 @@ curl -X POST http://localhost:8000/admin/providers \
409443

410444
**Refresh Single Model:**
411445
```bash
412-
curl -X POST http://localhost:8000/api/models/db/123/refresh
446+
curl -X POST http://localhost:4001/api/models/123/refresh
413447
```
414448

415449
**Edit Model Parameters:**
416450
```bash
417-
curl -X POST http://localhost:8000/api/models/db/123/params \
451+
curl -X PATCH http://localhost:4001/api/models/123/params \
418452
-H "Content-Type: application/json" \
419-
-d '{"max_tokens": 4096, "temperature": 0.7}'
453+
-d '{"params": {"max_tokens": 4096}, "tags": ["production", "gpu"]}'
420454
```
421455

422456
**Push to LiteLLM:**
423457
```bash
424-
curl -X POST http://localhost:8000/api/models/db/123/push
458+
curl -X POST http://localhost:4001/api/models/123/push
425459
```
426460

427461
**Reset to Defaults:**
428462
```bash
429-
curl -X DELETE http://localhost:8000/api/models/db/123/params
463+
curl -X DELETE http://localhost:4001/api/models/123/params
430464
```
431465

432466
## Testing Strategy
@@ -440,11 +474,6 @@ curl -X DELETE http://localhost:8000/api/models/db/123/params
440474
- Uses `pytest-asyncio` for async test support
441475
- Tests skip when endpoints not configured (graceful degradation)
442476

443-
**Database Testing:**
444-
- All new database functionality has been manually tested
445-
- Tested: Provider CRUD, model persistence, orphan detection
446-
- See commit history for test results
447-
448477
**Manual Testing Workflow:**
449478
```bash
450479
# 1. Install dependencies
@@ -453,17 +482,18 @@ pip install -e .
453482
# 2. Run unit tests
454483
pytest tests/test_model_details_cache.py tests/test_ollama_payload_cleaning.py -v
455484

456-
# 3. Test API endpoints
457-
curl http://localhost:8000/api/providers
458-
curl http://localhost:8000/api/providers/1/models
485+
# 3. Start services
486+
docker compose up -d
459487

460-
# 4. Test model management
461-
# Use UI at /sources to refresh, edit, and push models
488+
# 4. Test via UI
489+
open http://localhost:4001/sources
462490
```
463491

464492
## Recent Changes & Gotchas
493+
- The **frontend service must run `frontend.api:create_app`** (not the legacy `litellm_updater.web`). This is configured via `command:` in docker-compose.yml.
465494
- Ollama `/api/tags` responses that return a bare list (instead of `{ "models": [...] }`) are now parsed correctly; this fixes empty syncs from some servers.
466495
- `mode:*` tags are only generated for Ollama providers. OpenAI/compat providers should no longer get `mode:ollama` attached to their models.
467496
- Duplicate detection when pushing to LiteLLM now reads tags from both `litellm_params` and `model_info`, so older LiteLLM entries without top-level tags are still de-duped.
468-
- The Providers page uses a **Fetch** button that runs even if sync is disabled for that provider; the sync flag only controls scheduled syncs.
497+
- The Providers page shows **Fetch**, **Sync**, and **Push** buttons for each provider. The sync flag only controls scheduled syncs from the backend worker.
469498
- On the Admin page, adding and editing providers happen in modals; the inline add form is gone.
499+
- Compat models page loads models from all available providers (filtered to exclude type='compat'), not just a hardcoded provider.

0 commit comments

Comments
 (0)