Add Kubernetes Support for LLM Katan

### Is your feature request related to a problem? Please describe.

Currently, LLM Katan (the lightweight LLM server for testing) only supports Docker deployment and direct installation via PyPI. While the main Semantic Router project has comprehensive Kubernetes support with manifests in `deploy/kubernetes/`, LLM Katan lacks dedicated Kubernetes deployment configurations. This creates a gap for users who want to:

1. **Deploy LLM Katan in Kubernetes environments** for testing and development workflows
2. **Scale LLM Katan instances** horizontally for multi-tenant testing scenarios
3. **Integrate with existing Kubernetes-based CI/CD pipelines** that require containerized LLM endpoints
4. **Leverage Kubernetes features** like ConfigMaps, Secrets, and persistent volumes for model storage
5. **Use Kubernetes networking** for service discovery and load balancing across multiple LLM Katan instances

The current Docker-only approach limits adoption in cloud-native environments where Kubernetes is the standard orchestration platform.

### Describe the solution you'd like

Add comprehensive Kubernetes support for LLM Katan with the following components:

#### 1. Kubernetes Manifests
- **Deployment**: Configure LLM Katan pods with proper resource limits and health checks
- **Service**: Expose LLM Katan API endpoints (HTTP port 8000) within the cluster
- **ConfigMap**: Store LLM Katan configuration (model settings, served model names, etc.)
- **PersistentVolumeClaim**: Optional storage for model caching and persistence
- **Namespace**: Dedicated namespace for LLM Katan deployments

#### 2. Helm Chart (Optional)
- Parameterized deployment with values.yaml for easy customization
- Support for multiple model configurations
- Resource scaling based on workload requirements
- Integration with existing monitoring solutions

#### 3. Multi-Instance Support
- Deploy multiple LLM Katan instances with different model configurations
- Service discovery for different served model names (e.g., "gpt-3.5-turbo", "claude-3-haiku")
- Load balancing across instances for high availability

#### 4. Documentation
- Kubernetes deployment guide in `e2e-tests/llm-katan/docs/kubernetes.md`
- Examples for common use cases (CI/CD testing, multi-model setups)
- Integration patterns with the main Semantic Router

#### 5. Example Configurations
```yaml
# Example: Multi-instance deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-katan-gpt35
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: llm-katan
        image: llm-katan:latest
        args: ["--model", "Qwen/Qwen3-0.6B", "--served-model-name", "gpt-3.5-turbo"]
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
```

#### 6. Integration with Existing Infrastructure
- Leverage patterns from `deploy/kubernetes/` in the main project
- Compatibility with the existing Docker Compose setup
- Support for the same configuration options as the CLI

### Additional context

**Current State:**
- LLM Katan has Docker support (`e2e-tests/llm-katan/Dockerfile`)
- Docker Compose integration exists in the main project
- CLI supports all necessary configuration options
- FastAPI-based architecture is cloud-native ready

**Benefits:**
- **Testing at Scale**: Enable large-scale testing scenarios with multiple model endpoints
- **Cloud-Native Integration**: Seamless deployment in modern Kubernetes environments
- **Resource Management**: Better resource allocation and monitoring through Kubernetes
- **Service Mesh Compatibility**: Integration with Istio, Linkerd, and other service mesh solutions
- **GitOps Workflows**: Support for ArgoCD, Flux, and other GitOps tools

**Implementation Considerations:**
- Maintain compatibility with existing Docker and PyPI deployment methods
- Follow Kubernetes best practices for security (non-root containers, resource limits)
- Support both CPU and GPU workloads (though LLM Katan is designed for tiny models)
- Provide examples for both development (kind/minikube) and production environments

**Related Work:**
- Main Semantic Router already has comprehensive Kubernetes support
- Docker Compose setup provides a good reference for service configuration
- Existing Dockerfile can be reused with minimal modifications

/area environment user-experience
/milestone v0.1
/priority P1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Kubernetes Support for LLM Katan #278

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

1. Kubernetes Manifests

2. Helm Chart (Optional)

3. Multi-Instance Support

4. Documentation

5. Example Configurations

6. Integration with Existing Infrastructure

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Kubernetes Support for LLM Katan #278

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

1. Kubernetes Manifests

2. Helm Chart (Optional)

3. Multi-Instance Support

4. Documentation

5. Example Configurations

6. Integration with Existing Infrastructure

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions