PRD: Model Registry

1. Overview

Purpose: Model Registry provides a centralized catalog of AI models with tenant-level availability and approval workflows.

Model Registry is the authoritative source for model metadata, capabilities, provider cost data, and tenant access control. It tracks which models are available from which providers and manages approval workflows. LLM Gateway queries the registry to resolve model identifiers to provider endpoints and verify tenant access.

Key Concepts:

Canonical Model ID: Deterministic identifier in format {provider_slug}::{provider_model_id} (e.g., openai-prod::gpt-4o, ollama-us-west::mistral). Parsing rule: split on first :: occurrence.
Provider Slug: Human-readable unique identifier for a specific provider configuration (instance). Different instances of the same provider type have different slugs (e.g., azure-corp-global, azure-rnd-team, ollama-us-west, ollama-us-east). Each slug represents a separate provider with its own credentials, base URL, and configuration.
Tenant Hierarchy: Tree structure with root tenant at top; providers and approvals inherit down the tree (additive only)
Provider Plugins: Each provider type has its own plugin; all requests route through Outbound API Gateway

Provider Slug Examples:

Provider Type	Slug	Tenant	Description
`azure`	`azure-prod`	root	Platform-wide Azure production
`azure`	`azure-prod`	tenant-A	Tenant A's own Azure (shadows root)
`openai`	`openai`	root	Platform OpenAI account
`ollama`	`ollama-local`	tenant-B	Tenant B's self-hosted Ollama

Provider Slug Resolution: When resolving {provider_slug}::{model_id}, the system searches tenant → parent → ... → root (same as alias resolution). Child tenant's provider with same slug shadows parent's provider.

Shadowing Example:

Root tenant configures azure-prod pointing to platform Azure subscription
Tenant A configures own azure-prod pointing to their corporate Azure subscription
When Tenant A requests azure-prod::gpt-4o, it resolves to Tenant A's Azure
When Tenant B (no override) requests azure-prod::gpt-4o, it resolves to root's Azure

Implication: The same canonical ID can resolve to different provider instances depending on tenant context. Approvals are per (canonical_id, tenant) — approving azure-prod::gpt-4o in Tenant A approves their instance, not root's.

Target Users:

LLM Gateway — Primary consumer for model resolution and availability checks
Tenant Administrators — Approve/reject models, manage tenant-specific providers
Platform Administrators — Configure root tenant providers

Key Problems Solved:

Model discovery: Automatic polling of provider APIs to discover available models
Unified identification: Canonical IDs abstract provider-specific naming
Access control: Tenant-level approval workflow with hierarchical inheritance
Provider cost normalization: AICredits-based provider cost data with tier support (sync/batch/cached) — used as input for billing calculations, not user-facing pricing

Success Criteria:

Model resolution latency < 10ms P99
99.9% availability

1.1 Background

LLM Gateway requires a centralized source of truth for model availability, capabilities, and provider cost. Without Model Registry, each consumer would need to maintain its own model catalog, leading to inconsistency and duplicated approval workflows.

1.2 Goals

Single source of truth for AI model metadata across the platform — including both unmanaged models (cloud, frontier) and managed models (local, self-hosted)
Tenant-controlled model availability with inheritance from parent tenants
Streamlined approval process

1.3 Scope

In Scope

Model catalog CRUD (models, providers)
Tenant-level model availability configuration
Approval workflows (request → approve/reject)
Provider cost metadata (AICredits per tier) — raw cost from providers, not user-facing pricing
Model capabilities metadata
Cache management with TTL-based invalidation

Out of Scope

Item	Reason / Owner
LLM inference execution	LLM Gateway
Provider credential management	OAGW
User-facing pricing (promos, discounts, tiered, regional)	License Manager
Usage metering & billing	License Manager
Tenant hierarchy management	Tenant Resolver
Actual rate limiting enforcement	Infrastructure / OAGW
Inference/routing health monitoring	OAGW (per-route, per-tenant-key availability)
Approval workflow engine	Generic Approval Service (Model Registry integrates with it)
Audit log storage & retention	Core platform
Model fine-tuning / training	Not in scope for v1
Provider API contracts	Each provider plugin
Provider plugin architecture	DESIGN.md
Notifications	Separate notification system

1.4 Assumptions

Tenant Resolver provides tenant hierarchy data reliably and is highly available
OAGW handles all provider authentication
OAGW enforces outbound URL policy (blocks internal networks, requires HTTPS)
Each provider plugin exposes an endpoint returning available models (implementation is plugin responsibility)
Distributed cache is available (default: Redis); cache backend is pluggable for vendor customization. If cache unavailable, fallback to direct DB queries
Platform authenticates requests and provides verified tenant context
Platform provides audit logging for all operations
Platform provides distributed tracing, structured logging, metrics, and health endpoints

1.5 Risks

Risk	Impact	Mitigation
Cache invalidation delay	Stale model data served (up to TTL)	TTL-based expiry (own data 30 min, inherited 5 min)
Tenant hierarchy changes	Inherited approvals may become invalid	Invalidate tenant cache on re-parenting event
Provider removes model without notice	Requests fail until catalog synced	Periodic sync detection

1.6 Glossary

Term	Definition
AICredits	Internal platform currency for model usage cost/pricing
Provider Cost	Raw cost data from providers in AICredits; NOT user-facing pricing
OAGW	Outbound API Gateway - handles provider authentication and circuit breaking
GTS	Global Type System - platform-wide type definitions and contracts
GTS Type (Provider)	Versioned provider type identifier (e.g., `gts.x.genai.model.provider.v1~msft.azure._.ai_studio.v1~`)
Root Tenant	Top-level tenant from which all other tenants inherit
Canonical ID	Unique model identifier in format `{provider_slug}::{provider_model_id}`
Provider Slug	Human-readable unique identifier for a provider instance (e.g., `azure-corp-global`)
Provider Plugin	Module responsible for communication with specific LLM provider

2. Actors

2.1 Human Actors

Tenant Administrator

ID: cpt-cf-model-registry-actor-tenant-admin

Role: Approves or rejects models for tenant access. Manages tenant-specific providers. Can only restrict access compared to parent tenant, not expand.

Platform Administrator

ID: cpt-cf-model-registry-actor-platform-admin

Role: Manages root tenant configuration. Configures global providers. Sets baseline that all tenants inherit.

2.2 System Actors

LLM Gateway

ID: cpt-cf-model-registry-actor-llm-gateway

Role: Queries registry to resolve model identifiers (canonical ID) to provider details. Checks tenant availability. Retrieves model capabilities and provider cost.

3. Domain Model

3.1 Core Entities

Provider

Represents a configured AI provider instance for a tenant.

Fields:

id: Internal unique identifier (UUID)
slug: Human-readable unique identifier (e.g., azure-corp-global, ollama-us-west). Used in canonical model IDs. Immutable after creation.
tenant_id: Owner tenant
name: Display name
gts_type: GTS type identifier for provider (e.g., gts.x.genai.model.provider.v1~msft.azure._.ai_studio.v1~)
base_url: Provider API endpoint
status: active | disabled
discovery: Discovery config (enabled, interval)
timestamps: created_at, updated_at

GTS Type Benefits:

Versioned metadata schema per provider type (settings, UI configurations)
Vendor and service encoded (distinguish deepseek as vendor vs deepseek hosted by nvidia)
Native access control (grant/revoke access to specific provider types)
Artifact lifecycle management (see all per-vendor artifacts in one place)

Slug constraints:

1-64 chars, lowercase alphanumeric + hyphen
Unique within tenant (same slug can exist in different tenants)
Immutable after creation — changing slug would invalidate all model references

Inheritance & Shadowing:

Providers inherit down tenant hierarchy (additive)
Child tenant sees parent's providers + own
Child tenant CAN shadow inherited provider by creating provider with same slug
Shadowing provider completely overrides parent's provider for that tenant and descendants
Resolution order: tenant → parent → ... → root (first match wins)

Excluding inherited providers: Child tenant can exclude an inherited provider by shadowing it with status: disabled. This allows tenants to enforce their own policies (vendor partnership, liability cap, region restrictions, compliance isolation).

Example: Root has azure-prod (active). Tenant A shadows with azure-prod (disabled) → Azure is excluded for Tenant A and all its descendants.

Health: ProviderHealth stored at provider's owner tenant only. Child tenants inherit health status from parent.

Model

Represents an AI model in the catalog.

Fields:

Identification: canonical_id ({provider_slug}::{provider_model_id}), provider_id, tenant_id, provider_model_id
Display: name, description
Lifecycle: lifecycle_status (GTS type for access control)
Infrastructure (for local/managed LLMs):
- managed: boolean — whether CyberFabric can load/unload this model
- architecture: string — model architecture (e.g., qwen, llama, mistral, gpt)
- size_bytes: integer — model size in bytes (for capacity planning)
- format: string — model format (e.g., gguf, mlx, safetensors, api-only)
Capabilities (Tier 1): Boolean flags for text/image/audio/video/document input/output, tools, structured_output, streaming, embeddings, realtime_audio, batch_api
Limits (Tier 2): context_window, max_output_tokens, max_images_per_request, max_image_size_mb, max_audio_duration_sec
Provider Cost: AICredits per tier (sync/batch/cached) for input/output tokens and media — raw provider cost data, not user-facing pricing
Status: active, deprecated (soft-delete with deprecated_at timestamp)
Version: Provider's model version, stored as-is

Lifecycle Status (GTS types for native access control):

Status	GTS Type	Description
`production`	`gts.x.genai.model.lifecycle.v1~production~`	Stable, fully supported
`preview`	`gts.x.genai.model.lifecycle.v1~preview~`	Feature-complete but limited support
`experimental`	`gts.x.genai.model.lifecycle.v1~experimental~`	Early access, may change
`deprecated`	`gts.x.genai.model.lifecycle.v1~deprecated~`	Scheduled for removal
`sunset`	`gts.x.genai.model.lifecycle.v1~sunset~`	End of life, read-only

Infrastructure Fields Rationale: For local/self-hosted LLMs, these fields enable:

Capacity planning (size_bytes)
Hardware compatibility checks (format, architecture)
Dynamic model loading/unloading (managed)

ModelApproval

Tracks tenant approval status for a model. Integrates with generic Approval Service for workflow management.

Note: Model Registry does not implement approval workflow logic. It delegates to a generic Approval Service that can handle approvals for any resource type. Model Registry:

Registers model as approvable resource with Approval Service
Queries approval status from Approval Service
Reacts to approval status changes via events

Fields (stored in Approval Service, referenced by Model Registry):

resource_type: model
resource_id: model_canonical_id
tenant_id: tenant context
status: pending/approved/rejected/revoked
decided_at, decided_by: approval decision metadata
auto_approval_rule_id: reference to rule that triggered auto-approval (null for manual)

State Machine (managed by Approval Service):

stateDiagram-v2
    [*] --> pending: Model discovered
    pending --> approved: Admin approves
    pending --> approved: Auto-approval rule matches
    pending --> rejected: Admin rejects
    approved --> revoked: Admin revokes
    rejected --> approved: Admin reconsiders
    revoked --> approved: Admin reinstates

AutoApprovalRule (P2)

Defines rules for automatic model approval. Managed by Approval Service with model-specific criteria defined by Model Registry.

Note: Auto-approval rules are a feature of the generic Approval Service. Model Registry provides model-specific criteria schema; Approval Service handles rule evaluation and execution.

Fields (in Approval Service): id, resource_type (model), tenant_id (root = platform-wide), criteria, action (allow/block), priority, created_at, created_by.

Model-specific criteria schema (provided by Model Registry):

provider_gts_type: GTS type pattern | "" (required, "" = any, supports wildcards for version matching)
provider_slug: string | "" (required, "" = any)
capabilities: string[] (optional, empty = any)

Matching: ALL criteria must match (AND). Model must have ALL listed capabilities (subset matching).

Rule evaluation (by Approval Service):

Platform (root tenant) rules set the ceiling
Tenant rules can only restrict further, not expand
block from platform = blocked for all descendants
Tenant cannot allow what platform blocked

Authorization: Read/list visible to tenant admins only.

ProviderHealth (P2)

Stores provider discovery health status — NOT routing/inference health.

Scope limitation: This is discovery-level health only (can we reach the provider's models endpoint?). It does NOT reflect:

Inference endpoint availability (OAGW responsibility)
Per-route or per-tenant-API-key availability (OAGW responsibility)
SLA metrics for actual model calls (OAGW responsibility)

Rationale: Same provider can have different availability depending on route or per-tenant API key. Routing health is OAGW's responsibility. Model Registry only tracks whether discovery can reach the provider.

Fields: provider_id, tenant_id, status (healthy/degraded/unhealthy), metrics (latency p50/p99, consecutive failures/successes), last_check, last_success, last_error, last_error_message.

Status derivation (from discovery call results):

unhealthy: 3+ consecutive discovery failures
degraded: discovery response latency > 2000ms
healthy: 2+ consecutive successes, latency OK

Authorization: status field visible to all authenticated users within tenant hierarchy. Error details (last_error_message, last_error) visible to tenant admins only.

Alias (P2)

Maps human-friendly names to canonical model IDs.

Fields: name (1-64 chars, alphanumeric + hyphen/underscore), tenant_id, canonical_id (must be canonical ID, not another alias), created_at, created_by.

Resolution order: Tenant alias → Parent tenant alias → ... → Root tenant alias → Canonical ID

Shadowing: Child tenant aliases can shadow parent aliases. Child tenant controls their namespace.

4. Functional Requirements

P1 — Core (MVP)

Tenant Isolation

p1 - ID: cpt-cf-model-registry-fr-tenant-isolation

The system must enforce tenant isolation for all operations.

All API operations MUST include tenant context
Model/Approval queries MUST filter by tenant hierarchy (current tenant + ancestors)
Write operations MUST validate tenant ownership
Admin operations (approve/reject) MUST verify actor has admin role for target tenant

Authorization

p1 - ID: cpt-cf-model-registry-fr-authorization

The system must enforce role-based and GTS-based authorization.

Role-based access (operations):

Operation	Required Role
List/Get models	Any authenticated user
Request model approval	Tenant member
Approve/Reject request	Tenant admin
Manage providers	Platform admin (root tenant)

GTS-based access (model/provider access control):

Access Type	GTS Claim Required	Example
Provider access	Provider GTS type	`gts.x.genai.model.provider.v1~msft.azure.*` grants access to all Azure models
Lifecycle access	Lifecycle GTS type	`gts.x.genai.model.lifecycle.v1~experimental~` grants access to experimental models

Benefits of GTS-based access control:

Cheap generic rules — no custom development needed
Native platform integration — use existing GTS claim infrastructure
Flexible — grant/revoke access by provider type or model category at token level

Actors: cpt-cf-model-registry-actor-tenant-admin, cpt-cf-model-registry-actor-platform-admin

Input Validation

p1 - ID: cpt-cf-model-registry-fr-input-validation

The system must validate all input data.

Field	Validation
Canonical ID	Must match pattern `{provider_slug}::{model_id}`, provider with slug must exist. Parse on first `::`.
Provider slug	1-64 chars, lowercase alphanumeric + hyphen. Unique within tenant. Immutable.
Provider name	1-32 chars, lowercase alphanumeric + hyphen
Capabilities	Must conform to GTS capability schema
Pricing values	Non-negative decimal (AICredits)

Cache Isolation

p1 - ID: cpt-cf-model-registry-fr-cache-isolation

The system must isolate cached data by tenant.

Cache keys MUST include tenant_id as prefix.

Format: mr:{tenant_id}:{entity}:{id}

TTL strategy:

Own data (tenant created): TTL 30 min
Inherited data (from parent): TTL 5 min

Cache invalidation on tenant re-parenting: On tenant.reparented event, invalidate ALL cache entries for that tenant.

Cache unavailable: Fallback to direct DB queries (latency SLOs may be violated). Cache backend is pluggable (default: Redis).

Get Tenant Model

p1 - ID: cpt-cf-model-registry-fr-get-tenant-model

The system must resolve a canonical model ID for a tenant, returning model info and provider details if approved.

Resolution:

Look up model in catalog by canonical ID
Check tenant approval status (direct or inherited)
Return model info + provider details

Response structure defined in GTS contract.

Actors: cpt-cf-model-registry-actor-llm-gateway

List Tenant Models

p1 - ID: cpt-cf-model-registry-fr-list-tenant-models

The system must return all models available for a tenant.

Includes:

Models from tenant's own providers (if approved)
Models inherited from parent tenant hierarchy (if approved at any level)

Follows OData pagination standard. Supports OData $filter for filtering by capability, provider, approval_status.

Capability filtering uses subset matching: model must have AT LEAST requested capabilities.

Actors: cpt-cf-model-registry-actor-llm-gateway

Model Discovery

p1 - ID: cpt-cf-model-registry-fr-model-discovery

The system must support discovery of available models from providers via Outbound API Gateway.

Trigger mechanism:

Default: Manual action triggered by admin (via API or UI)
Optional: Can be automated via external scheduled workflow (e.g., platform scheduler, Kubernetes CronJob)

Model Registry provides discovery API endpoint; scheduling is NOT built into Model Registry.

Concurrency: Fixed concurrency limit + staggered intervals when multiple discoveries run.

Per (tenant, provider) pair where discovery is enabled:

Fetch models from provider's models endpoint (plugin responsibility)
New models → create with pending status
Existing models → update metadata (capabilities, limits, provider cost)
Missing models → soft-delete (mark as deprecated)

Dependencies: OAGW (executes provider API calls), Provider API (returns models list)

Model Approval Integration

p1 - ID: cpt-cf-model-registry-fr-model-approval

The system must integrate with generic Approval Service for tenant-level model approval workflow.

Model Registry responsibilities:

Register discovered models as approvable resources with Approval Service
Query approval status from Approval Service when resolving models
React to approval status change events

Approval Service responsibilities (out of Model Registry scope):

Approval workflow engine (state machine, concurrency control)
Approval UI and notifications
Audit trail for approval decisions

Approval granularity (P1): Tenant-level — approval grants access to all users in tenant.

Actors: cpt-cf-model-registry-actor-tenant-admin

Provider Management

p1 - ID: cpt-cf-model-registry-fr-provider-management

The system must support tenant-scoped provider configuration.

Provider inheritance:

Providers inherit down tenant hierarchy (additive only)
Child tenant sees parent's providers + own providers
Child CAN shadow inherited provider by creating provider with same slug (overrides for that tenant and descendants)
Child CAN exclude inherited provider by shadowing with status: disabled (for compliance, vendor policy, region restrictions)

Provider config:

ID, slug, name, gts_type, base URL, status (active/disabled)
Discovery enabled/interval

Credentials handled by OAGW — not stored in Model Registry.

Actors: cpt-cf-model-registry-actor-platform-admin

Model Provider Cost

p1 - ID: cpt-cf-model-registry-fr-model-pricing

The system must store and provide model provider cost data in AICredits.

Important: This is raw provider cost data obtained from providers, NOT user-facing pricing. User-facing pricing (including promos, volume discounts, tiered pricing, regional pricing) is the responsibility of License Manager.

Cost structure:

Unit: AICredits (internal platform currency)
Tiers: sync, batch, cached (different rates per tier)
Media: per image input, per audio minute, per image output

Model Registry returns provider cost only. Caller (LLM Gateway) fetches tenant pricing from License Manager and computes final user-facing price.

Actors: cpt-cf-model-registry-actor-llm-gateway

P2 — Enhanced Features

Auto-Approval Rules

p2 - ID: cpt-cf-model-registry-fr-auto-approval

The system must integrate with Approval Service for automatic model approval based on configurable rules.

Model Registry responsibilities:

Provide model-specific criteria schema to Approval Service
Supply model metadata for rule evaluation when model is discovered

Approval Service responsibilities:

Rule storage and management
Rule evaluation and execution
Hierarchy enforcement (platform ceiling, tenant restrictions)

Rule hierarchy (enforced by Approval Service):

Platform (root tenant) rules set the ceiling (max allowed)
Tenant rules can only restrict further
Tenant cannot auto-approve what platform blocked

Model-specific rule matching criteria:

Provider GTS type, provider slug, required capabilities (all must match)
Action: allow or block
Priority ordering for conflict resolution

Auto-approved models store reference to the triggering rule (auto_approval_rule_id).

Actors: cpt-cf-model-registry-actor-tenant-admin, cpt-cf-model-registry-actor-platform-admin

Provider Discovery Health Storage

p2 - ID: cpt-cf-model-registry-fr-health-monitoring

The system must store provider discovery health status derived from discovery calls.

Scope: Discovery health only — can we reach the provider's models endpoint? This is NOT routing/inference health (which is OAGW responsibility).

Implementation: Health status is a byproduct of model discovery — no separate health probing infrastructure. When discovery runs, the response (success/failure, latency) updates health status.

Health derivation:

healthy: discovery responding normally, latency acceptable
degraded: discovery responding but latency > threshold
unhealthy: consecutive discovery failures exceed threshold

Health stored at provider owner tenant only. Child tenants inherit parent's health status.

Out of scope (OAGW responsibility):

Inference endpoint health
Per-route availability
Per-tenant-API-key availability

Dependencies: OAGW (executes provider API calls), Provider API (returns response for health derivation)

Alias Management

p2 - ID: cpt-cf-model-registry-fr-alias-management

The system must support model aliases with hierarchical scoping.

Alias scope:

Root tenant: global aliases visible to all tenants
Child tenant: can override global aliases, add tenant-specific aliases

Resolution order: tenant → parent → ... → root → canonical ID

Constraint: Alias target MUST be a canonical ID, not another alias (prevents circular references).

Actors: cpt-cf-model-registry-actor-tenant-admin, cpt-cf-model-registry-actor-platform-admin

Degraded Mode

p2 - ID: cpt-cf-model-registry-fr-degraded-mode

The system must define tiered behavior when database is unavailable.

Model capabilities and metadata: serve from stale cache (up to 30 min TTL)
Approval verification: fail request with service_unavailable error

P1 behavior: DB unavailable = all requests fail (fail-closed).

Tenant Re-parenting

p2 - ID: cpt-cf-model-registry-fr-tenant-reparenting

The system must handle tenant hierarchy changes.

When tenant moves to different parent:

Tenant Resolver owns re-parenting logic
Model Registry invalidates all cache entries for affected tenant on tenant.reparented event
Re-evaluation of approvals happens on next access

Bulk Operations

p2 - ID: cpt-cf-model-registry-fr-bulk-operations

The system must support batch approval operations: approve_models(model_ids[]), reject_models(model_ids[]).

Manual Discovery/Probe Trigger

p2 - ID: cpt-cf-model-registry-fr-manual-trigger

The system must allow platform admins to manually trigger discovery and health probes.

P3 — Fine-Grained Access Control

User Group Approval

p3 - ID: cpt-cf-model-registry-fr-user-group-approval

The system must support model approval scoped to user groups within a tenant.

Actors: cpt-cf-model-registry-actor-tenant-admin

User-Level Override

p3 - ID: cpt-cf-model-registry-fr-user-level-override

The system must support individual user restrictions/allowances for model access.

Actors: cpt-cf-model-registry-actor-tenant-admin

5. Use Cases

UC-001: Get Tenant Model

p1 - ID: cpt-cf-model-registry-usecase-get-tenant-model

Actor: cpt-cf-model-registry-actor-llm-gateway

Preconditions: Tenant context available.

Flow:

LLM Gateway sends get_tenant_model(ctx, canonical_id)
Registry looks up model in catalog
Registry checks tenant approval (direct or inherited from parent)
Registry returns model info + provider details

Postconditions: Model info returned or error.

Acceptance criteria:

Returns model_not_found (404) if model not in catalog
Returns model_not_approved (403) if not approved for tenant (or any ancestor)
Returns model_deprecated (410) if model was soft-deleted

UC-002: List Tenant Models

p1 - ID: cpt-cf-model-registry-usecase-list-tenant-models

Actor: cpt-cf-model-registry-actor-llm-gateway

Preconditions: Tenant context available.

Flow:

LLM Gateway sends list_tenant_models(ctx) with OData query params
Registry collects approved models for tenant (direct + inherited)
Registry applies OData filters
Registry returns paginated models list

Postconditions: Filtered models list returned.

Acceptance criteria:

Follows OData pagination standard
Supports $filter by capability flags, provider slug, provider GTS type, approval_status, lifecycle_status, managed, architecture, format
Returns only approved models by default
Excludes deprecated models

UC-003: Model Discovery

p1 - ID: cpt-cf-model-registry-usecase-model-discovery

Actor: cpt-cf-model-registry-actor-platform-admin (manual) or External Scheduler (automated)

Preconditions: Provider configured with discovery.enabled = true.

Trigger:

Manual: Admin calls discovery API endpoint
Automated: External scheduled workflow calls discovery API endpoint

Flow:

Discovery triggered for (tenant, provider) pair
Registry sends GET to provider's models endpoint via OAGW
Provider returns models list
Registry compares with current catalog:
- New model → register with Approval Service as pending
- Existing model → update metadata
- Missing model → mark as deprecated (soft-delete)

Postconditions: Catalog updated.

Acceptance criteria:

Discovery runs per (tenant, provider) pair
Fixed concurrency limit with staggered intervals when multiple discoveries run
Deprecated models are soft-deleted (hidden, not purged)
Discovery API is idempotent (safe to call multiple times)

UC-004: Model Approval

p1 - ID: cpt-cf-model-registry-usecase-model-approval

Actor: cpt-cf-model-registry-actor-tenant-admin

Preconditions: Model in pending status for tenant in Approval Service.

Flow:

Tenant admin reviews pending models via Approval Service (or Model Registry API proxying to Approval Service)
Admin approves or rejects via Approval Service
Approval Service updates status and emits event
Model Registry receives event and updates local cache

Postconditions: Model approval status updated in Approval Service.

Acceptance criteria:

State transitions managed by Approval Service
Approval is tenant-scoped (P1)
Approval recorded with actor and timestamp (by Approval Service)
Model Registry correctly reflects approval status from Approval Service

UC-005: Model Revocation

p1 - ID: cpt-cf-model-registry-usecase-model-revocation

Actor: cpt-cf-model-registry-actor-tenant-admin

Preconditions: Model in approved status for tenant in Approval Service.

Flow:

Tenant admin selects approved model
Admin initiates revocation via Approval Service
Approval Service updates status to revoked and emits event
Model Registry receives event and updates local cache

Postconditions: Model access revoked.

Acceptance criteria:

Revoked models return model_not_approved on access attempts
In-flight requests complete, new requests rejected
Revocation recorded with actor and timestamp (by Approval Service)
Model can be reinstated: revoked → approved

UC-006: Register Provider

p1 - ID: cpt-cf-model-registry-usecase-register-provider

Actor: cpt-cf-model-registry-actor-platform-admin

Preconditions: Provider plugin exists for the specified type.

Flow:

Admin provides provider config (slug, name, gts_type, base_url, discovery config)
Registry validates slug is unique within tenant
Registry validates GTS type is supported (plugin exists)
Registry validates config against plugin requirements
Registry creates provider record with status active

Postconditions: Provider available for model sync. If slug matches parent's provider, this provider shadows the inherited one.

Acceptance criteria:

Provider slug must be unique within tenant (can shadow parent's provider with same slug)
Slug is immutable after creation
GTS type must be valid and supported (plugin exists for this GTS type)

UC-007: Disable Provider

p1 - ID: cpt-cf-model-registry-usecase-disable-provider

Actor: cpt-cf-model-registry-actor-platform-admin

Preconditions: Provider is active.

Flow:

Admin requests provider disable
Registry marks provider status as disabled
Registry suspends discovery for this provider

Postconditions: Provider disabled, models not resolvable.

Acceptance criteria:

Disabled provider's models return provider_not_found
Discovery suspended

UC-008: Re-enable Provider

p1 - ID: cpt-cf-model-registry-usecase-reenable-provider

Actor: cpt-cf-model-registry-actor-platform-admin

Preconditions: Provider is disabled.

Flow:

Admin requests provider re-enable
Registry marks provider status as active
Discovery resumes on next scheduled run

Postconditions: Provider active, models resolvable.

UC-009: Get Model Provider Cost

p1 - ID: cpt-cf-model-registry-usecase-get-pricing

Actor: cpt-cf-model-registry-actor-llm-gateway

Preconditions: Model exists.

Flow:

Gateway sends get_provider_cost(model_id)
Registry retrieves provider cost for model
Registry returns provider cost by tier

Postconditions: Provider cost returned.

Acceptance criteria:

Returns provider cost in AICredits (caller computes final user-facing price via License Manager)
Tiers: sync, batch, cached
Media cost included if applicable

UC-010: Configure Auto-Approval Rule

p2 - ID: cpt-cf-model-registry-usecase-auto-approval-rule

Actor: cpt-cf-model-registry-actor-tenant-admin, cpt-cf-model-registry-actor-platform-admin

Preconditions: Actor has admin role for target tenant.

Flow:

Admin defines rule criteria (provider GTS type, provider slug, capabilities)
Admin sets action (allow or block) and priority
Model Registry forwards rule to Approval Service with model-specific criteria schema
Approval Service validates rule does not expand beyond platform ceiling
Approval Service creates rule

Postconditions: Rule active for future model discoveries.

Acceptance criteria:

Tenant rules cannot allow what platform blocked (enforced by Approval Service)
Rules evaluated in priority order (by Approval Service)
Auto-approved models reference the triggering rule

UC-011: Get Provider Discovery Health

p2 - ID: cpt-cf-model-registry-usecase-provider-health

Actor: cpt-cf-model-registry-actor-llm-gateway

Preconditions: Provider exists and discovery is enabled (health derived from discovery).

Flow:

Gateway queries provider discovery health status
Registry returns stored health status (healthy/degraded/unhealthy)
Gateway MAY use status as one input for routing decisions (not the only signal)

Postconditions: Stored discovery health status returned.

Note: This is discovery health only. For routing decisions, Gateway should also consult OAGW for inference-level health.

Acceptance criteria:

Status field visible to all authenticated users within tenant hierarchy
Error details visible only to tenant admins
Child tenants inherit parent's provider health
Health status reflects latest discovery results

UC-012: Create Alias

p2 - ID: cpt-cf-model-registry-usecase-create-alias

Actor: cpt-cf-model-registry-actor-tenant-admin, cpt-cf-model-registry-actor-platform-admin

Preconditions: Target canonical ID exists.

Flow:

Admin provides alias name and target canonical ID
Registry validates alias name format (1-64 chars, alphanumeric + hyphen/underscore)
Registry validates target is canonical ID (not another alias)
Registry creates alias scoped to tenant

Postconditions: Alias resolvable for tenant and descendants.

Acceptance criteria:

Alias name must be unique within tenant
Target must be canonical ID, not alias (prevents cycles)
Child tenant aliases can shadow parent aliases

UC-013: Resolve Alias

p2 - ID: cpt-cf-model-registry-usecase-resolve-alias

Actor: cpt-cf-model-registry-actor-llm-gateway

Preconditions: Alias or canonical ID provided.

Flow:

Gateway sends model identifier (alias or canonical ID)
Registry checks tenant aliases → parent aliases → ... → root aliases
Registry returns resolved canonical ID

Postconditions: Canonical ID returned.

Acceptance criteria:

Resolution order: tenant → parent → ... → root → canonical ID
Non-existent alias falls through to canonical ID lookup

UC-014: Handle Degraded Mode

p2 - ID: cpt-cf-model-registry-usecase-degraded-mode

Actor: cpt-cf-model-registry-actor-llm-gateway

Preconditions: Database unavailable.

Flow:

Gateway requests model info
Registry detects DB unavailable
For metadata: serve from stale cache (up to TTL)
For approval check: return service_unavailable error

Postconditions: Partial response or error returned.

Acceptance criteria:

Metadata served from cache (best-effort)
Approval verification always fails when DB unavailable
Error clearly indicates degraded state

UC-015: Handle Tenant Re-parenting

p2 - ID: cpt-cf-model-registry-usecase-tenant-reparenting

Actor: Internal (event handler)

Preconditions: Tenant Resolver emits tenant.reparented event.

Flow:

Registry receives tenant.reparented event
Registry invalidates ALL cache entries for affected tenant
Next access triggers fresh resolution with new hierarchy

Postconditions: Cache invalidated, approvals re-evaluated on access.

Acceptance criteria:

All cache keys with tenant prefix invalidated
No stale inherited data served after re-parenting

UC-016: Bulk Approve Models

p2 - ID: cpt-cf-model-registry-usecase-bulk-approve

Actor: cpt-cf-model-registry-actor-tenant-admin

Preconditions: Models in pending status for tenant in Approval Service.

Flow:

Admin provides list of model IDs to approve
Model Registry forwards bulk approval request to Approval Service
Approval Service approves all models in single transaction

Postconditions: All models approved or none (atomic).

Acceptance criteria:

Atomic operation (all succeed or all fail) — handled by Approval Service
Returns list of results per model
Maximum batch size enforced (configurable)

UC-017: Trigger Discovery

p1 - ID: cpt-cf-model-registry-usecase-manual-discovery

Actor: cpt-cf-model-registry-actor-platform-admin, cpt-cf-model-registry-actor-tenant-admin

Preconditions: Provider configured with discovery enabled. Actor has admin access to provider's tenant.

Flow:

Admin (or external scheduler) calls discovery API for provider
Registry queues discovery job
Discovery executes and updates catalog

Postconditions: Provider catalog updated.

Acceptance criteria:

Returns job status (queued/running/completed)
Rate limited to prevent abuse
Tenant admin can trigger discovery for own providers; Platform admin can trigger for any provider

UC-018: Approve Model for User Group

p3 - ID: cpt-cf-model-registry-usecase-user-group-approval

Actor: cpt-cf-model-registry-actor-tenant-admin

Preconditions: Model approved at tenant level, user groups defined.

Flow:

Admin selects approved model
Admin restricts access to specific user groups
Registry creates group-scoped approval

Postconditions: Model accessible only to specified groups.

Acceptance criteria:

Group approval is restriction (not expansion) of tenant approval
Users in multiple groups get union of permissions

UC-019: Override User Access

p3 - ID: cpt-cf-model-registry-usecase-user-override

Actor: cpt-cf-model-registry-actor-tenant-admin

Preconditions: Model has tenant or group approval.

Flow:

Admin selects user and model
Admin grants or revokes access for specific user
Registry creates user-level override

Postconditions: User access modified independent of group/tenant.

Acceptance criteria:

User override takes precedence over group and tenant approvals
Can both grant (if tenant allows) and revoke access

6. Auditable Operations

The following operations MUST be logged for audit compliance:

Operation	Audit Fields
Model approved	model_id, tenant_id, actor_id, timestamp
Model rejected	model_id, tenant_id, actor_id, timestamp
Model revoked	model_id, tenant_id, actor_id, timestamp
Provider registered	provider_id, tenant_id, actor_id, timestamp
Provider disabled	provider_id, tenant_id, actor_id, timestamp
Provider enabled	provider_id, tenant_id, actor_id, timestamp
Alias created (P2)	alias_name, target, tenant_id, actor_id, timestamp
Alias updated (P2)	alias_name, old_target, new_target, tenant_id, actor_id, timestamp
Alias deleted (P2)	alias_name, tenant_id, actor_id, timestamp
Auto-approval rule created (P2)	rule_id, criteria, tenant_id, actor_id, timestamp
Auto-approval rule updated (P2)	rule_id, tenant_id, actor_id, timestamp
Auto-approval rule deleted (P2)	rule_id, tenant_id, actor_id, timestamp

Read operations are not audited (high volume, low value).

7. Non-Functional Requirements

Performance

p1 - ID: cpt-cf-model-registry-nfr-performance

Operation	P50	P99
`get_tenant_model`	2ms	10ms
`list_tenant_models`	10ms	50ms
`approve_model`	-	100ms
Discovery job (per provider)	-	30s

Caching: Distributed cache (default: Redis, pluggable) with TTL-based invalidation.

Own data: 30 min TTL
Inherited data: 5 min TTL

Availability

p1 - ID: cpt-cf-model-registry-nfr-availability

Target: 99.9% availability.

P1: DB unavailable = requests fail (fail-closed).

P2: Tiered degraded mode (metadata from cache, approval check fails).

Cache unavailable: Fallback to direct DB queries (higher latency).

Scale

p1 - ID: cpt-cf-model-registry-nfr-scale

Dimension	Target
Models per provider	100
Providers per tenant	20
Tenants	10,000
Total models (worst case)	~2,000,000
Read:Write ratio	1000:1

Rate Limiting

p1 - ID: cpt-cf-model-registry-nfr-rate-limiting

The system must specify rate limits for admin operations (enforcement by infrastructure).

Operation	Limit
Model approval requests	100/min per tenant
Provider management	10/min (platform-wide)

All limits must be configurable.

8. Error Codes

Code	HTTP Status	Description
`model_not_found`	404	Model identifier does not exist in catalog
`model_not_approved`	403	Model exists but not approved for tenant
`model_deprecated`	410	Model was removed by provider (soft-deleted)
`provider_not_found`	404	Provider identifier does not exist
`provider_disabled`	404	Provider exists but is disabled
`invalid_transition`	409	Invalid approval state transition (e.g., concurrent modification)
`validation_error`	400	Input validation failed
`unauthorized`	403	Actor lacks required role for operation
`service_unavailable`	503	Database unavailable

Error responses follow RFC 9457 Problem Details standard.

9. Security Considerations

Threat	Mitigation
Tenant data leakage	Tenant ID prefix in all cache keys; query filters enforce tenant scope
Unauthorized approval	Role-based authorization checks on all admin operations
Cache poisoning	TTL-based expiry; no user-controlled cache keys
Provider credential exposure	Credentials handled by OAGW, not stored in Model Registry
Privilege escalation via hierarchy	Child tenants can only restrict, not expand parent permissions
Stale approval served	Approval status always verified from DB (P1 fail-closed)

10. Dependencies

Module	Role
Outbound API Gateway	Execute provider API calls (discovery)
Tenant Resolver	Resolve tenant hierarchy (parent, children)
Approval Service	Generic approval workflow engine for model approvals
GTS	API contract types

11. Consumers

Module	Usage
LLM Gateway	Model resolution, availability checks, provider cost
Chat Engine	Model selection for conversations
Tenant Admin UI	Approval management, provider configuration

12. Public Library Interfaces

To be defined in DESIGN.md.

Key interfaces:

ModelRegistryClient — SDK for LLM Gateway integration
AdminClient — SDK for Tenant Admin UI

13. Acceptance Criteria

Category	Criteria	Priority
Functional	All P1 Use Cases pass acceptance tests	P1
Performance	`get_tenant_model` < 10ms P99	P1
Performance	`list_tenant_models` < 50ms P99	P1
Availability	99.9% uptime	P1
Security	Tenant isolation enforced for all operations	P1
Security	Authorization checks pass for all protected endpoints	P1
Integration	LLM Gateway can resolve models and check availability	P1
Integration	Tenant Admin UI can manage approvals	P1

14. Open Questions

#	Question	Status	Decision
1	Database-level locking vs application-level for approval concurrency	Deferred	ADR to be created
2	Specific QPS targets per endpoint	Deferred	DESIGN.md
3	Provider plugin retry policies	Deferred	DESIGN.md

15. Migration & Rollback

Initial deployment: No migration required (greenfield).

Schema changes:

Forward-compatible changes only
Rollback via previous deployment + compatible schema

Data migration: To be defined per release in DESIGN.md.

Cache invalidation on deployment: Clear all cache keys on major version deployment.

16. Traceability

Artifact	Link
LLM Gateway PRD	`modules/llm_gateway/docs/PRD.md`
ADR: Stateless Gateway	`modules/llm_gateway/docs/ADR/0001-fdd-llmgw-adr-stateless.md`
ADR: Pass-through Content	`modules/llm_gateway/docs/ADR/0002-fdd-llmgw-adr-pass-through.md`
ADR: Circuit Breaking	`modules/llm_gateway/docs/ADR/0004-fdd-llmgw-adr-circuit-breaking.md`
OData Pagination Standard	`docs/modkit_unified_system/07_odata_pagination_select_filter.md`
Error Handling Standard	`docs/modkit_unified_system/05_errors_rfc9457.md`
GTS Contracts	`gts/` (to be defined)

FilesExpand file tree

PRD.md

Latest commit

History

PRD.md

File metadata and controls

PRD: Model Registry

1. Overview

1.1 Background

1.2 Goals

1.3 Scope

In Scope

Out of Scope

1.4 Assumptions

1.5 Risks

1.6 Glossary

2. Actors

2.1 Human Actors

Tenant Administrator

Platform Administrator

2.2 System Actors

LLM Gateway

3. Domain Model

3.1 Core Entities

Provider

Model

ModelApproval

AutoApprovalRule (P2)

ProviderHealth (P2)

Alias (P2)

4. Functional Requirements

P1 — Core (MVP)

Tenant Isolation

Authorization

Input Validation

Cache Isolation

Get Tenant Model

List Tenant Models

Model Discovery

Model Approval Integration

Provider Management

Model Provider Cost

P2 — Enhanced Features

Auto-Approval Rules

Provider Discovery Health Storage

Alias Management

Degraded Mode

Tenant Re-parenting

Bulk Operations

Manual Discovery/Probe Trigger

P3 — Fine-Grained Access Control

User Group Approval

User-Level Override

5. Use Cases

UC-001: Get Tenant Model

UC-002: List Tenant Models

UC-003: Model Discovery

UC-004: Model Approval

UC-005: Model Revocation

UC-006: Register Provider

UC-007: Disable Provider

UC-008: Re-enable Provider

UC-009: Get Model Provider Cost

UC-010: Configure Auto-Approval Rule

UC-011: Get Provider Discovery Health

UC-012: Create Alias

UC-013: Resolve Alias

UC-014: Handle Degraded Mode

UC-015: Handle Tenant Re-parenting

UC-016: Bulk Approve Models

UC-017: Trigger Discovery

UC-018: Approve Model for User Group

UC-019: Override User Access

6. Auditable Operations

7. Non-Functional Requirements

Performance

Availability

Scale

Rate Limiting

8. Error Codes