LLM-Katan Terminal animation demo in the readme files (#240)

yossiovadia · claude · web-flow · commit 164c3916706a · 2025-09-26T16:20:57.000-04:00
* feat: add interactive terminal demo for multi-instance testing - Created animated terminal demo showcasing multi-instance capabilities - Added terminal-demo.html with realistic typing animations using TypeIt.js - Enhanced README with live demo link and improved use case documentation - Added embeddable demo widget (demo-embed.html) for external sites - Updated multi-instance examples to show mocking popular AI providers - Improved positioning documentation with strengths vs competitors - Highlighted key advantage: no GPU required, runs on laptops/Macs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * chore: add .gitignore to exclude build artifacts and demo recordings - Added .gitignore to exclude .cast files from asciinema recordings - Excluded common build artifacts and IDE files - Prevents accidental commits of temporary demo files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * docs: enhance demo accessibility with GitHub Pages link and preview - Added GitHub Pages link for live interactive demo - Added collapsible preview section showing terminal output - Included fallback instructions for local demo viewing - Added guide for creating demo GIF alternatives 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix: update demo links to point to main project repository - Changed GitHub Pages links from personal repo to vllm-project repository - Ensures demo will work once PR is merged to main - Provides correct canonical URL for PyPI and documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * docs: add demo testing guide for PR reviewers - Created instructions for reviewers to test the interactive demo - Provided multiple options: local checkout, raw file viewing, static preview - Explains why live links won't work until PR is merged - Helps reviewers experience the full animation during review process 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * chore: remove demo testing guide Removed DEMO_TESTING.md to keep the PR focused on the core demo functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix: improve terminal demo layout and fix markdown lint issues Terminal Demo: - Reduced terminal heights from 300px to 220px with max-height 250px - Added overflow-y for better space utilization - Prevents bottom terminal from requiring scroll Markdown Lint: - Fixed line length issues (MD013) by breaking long lines - Converted bold text to proper headings (MD036) - Added blank lines around headings and lists (MD022, MD032) - Added markdownlint disable comments for required HTML elements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix: improve terminal demo sizing and timing - Restored bottom terminal (terminal-full) to proper size (300px min-height) - Increased Terminal 3 delay from 8.5s to 10s for better timing - Ensures Terminal 3 starts only after both servers complete their setup - Top terminals remain compact at 220-250px for better layout 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix: resolve markdown lint issues in demo documentation - Added missing blank lines around fenced code blocks - Added trailing newlines to all markdown files - Added blank lines around lists - Ensures compliance with project markdown linting rules 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> --------- Signed-off-by: Yossi Ovadia <yovadia@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
diff --git a/e2e-tests/llm-katan/.gitignore b/e2e-tests/llm-katan/.gitignore
@@ -0,0 +1,20 @@
+# Build artifacts
+dist/
+build/
+*.egg-info/
+
+# Python cache
+__pycache__/
+*.pyc
+*.pyo
+
+# Demo recordings
+*.cast
+
+# IDE files
+.vscode/
+.idea/
+
+# OS files
+.DS_Store
+Thumbs.db
diff --git a/e2e-tests/llm-katan/README.md b/e2e-tests/llm-katan/README.md
@@ -1,6 +1,10 @@
 # LLM Katan - Lightweight LLM Server for Testing
 
-A lightweight LLM serving package using FastAPI and HuggingFace transformers, designed for testing and development with real tiny models.
+A lightweight LLM serving package using FastAPI and HuggingFace transformers,
+designed for testing and development with real tiny models.
+
+> **🎬 [See Live Demo](https://vllm-project.github.io/semantic-router/e2e-tests/llm-katan/terminal-demo.html)**
+> Interactive terminal showing multi-instance setup in action!
 
 ## Features
 
@@ -24,32 +28,34 @@ pip install llm-katan
 
 #### HuggingFace Token (Required)
 
-LLM Katan uses HuggingFace transformers to download models. You'll need a HuggingFace token for:
+LLM Katan uses HuggingFace transformers to download models.
+You'll need a HuggingFace token for:
 
 - Private models
 - Avoiding rate limits
 - Reliable model downloads
 
-**Option 1: Environment Variable**
+#### Option 1: Environment Variable
 
 ```bash
 export HUGGINGFACE_HUB_TOKEN="your_token_here"
 ```
 
-**Option 2: Login via CLI**
+#### Option 2: Login via CLI
 
 ```bash
 huggingface-cli login
 ```
 
-**Option 3: Token file in home directory**
+#### Option 3: Token file in home directory
 
 ```bash
 # Create ~/.cache/huggingface/token file with your token
 echo "your_token_here" > ~/.cache/huggingface/token
 ```
 
-**Get your token:** Visit [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
+**Get your token:**
+Visit [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
 
 ### Basic Usage
 
@@ -66,14 +72,59 @@ llm-katan --model Qwen/Qwen3-0.6B --port 8000 --backend vllm
 
 ### Multi-Instance Testing
 
+**🎬 [Live Demo](https://vllm-project.github.io/semantic-router/e2e-tests/llm-katan/terminal-demo.html)**
+See this in action with animated terminals!
+
+> *Note: If GitHub Pages isn't enabled, you can also
+> [download and open the demo locally](./terminal-demo.html)*
+
+<!-- markdownlint-disable MD033 -->
+<details>
+<summary>📺 Preview (click to expand)</summary>
+<!-- markdownlint-enable MD033 -->
+
 ```bash
-# Terminal 1: Qwen endpoint
-llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "Qwen/Qwen2-0.5B-Instruct"
+# Terminal 1: Installing and starting GPT-3.5-Turbo mock
+$ pip install llm-katan
+Successfully installed llm-katan-0.1.8
 
-# Terminal 2: Same model, different name
-llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+$ llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo"
+🚀 Starting LLM Katan server with model: Qwen/Qwen3-0.6B
+📛 Served model name: gpt-3.5-turbo
+✅ Server running on http://0.0.0.0:8000
+
+# Terminal 2: Starting Claude-3-Haiku mock
+$ llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku"
+🚀 Starting LLM Katan server with model: Qwen/Qwen3-0.6B
+📛 Served model name: claude-3-haiku
+✅ Server running on http://0.0.0.0:8001
+
+# Terminal 3: Testing both endpoints
+$ curl localhost:8000/v1/models | jq '.data[0].id'
+"gpt-3.5-turbo"
+
+$ curl localhost:8001/v1/models | jq '.data[0].id'
+"claude-3-haiku"
+
+# Same tiny model, different API names! 🎯
 ```
 
+</details>
+
+```bash
+# Terminal 1: Mock GPT-3.5-Turbo
+llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo"
+
+# Terminal 2: Mock Claude-3-Haiku
+llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku"
+
+# Terminal 3: Test both endpoints
+curl http://localhost:8000/v1/models  # Returns "gpt-3.5-turbo"
+curl http://localhost:8001/v1/models  # Returns "claude-3-haiku"
+```
+
+**Perfect for testing multi-provider scenarios with one tiny model!**
+
 ## API Endpoints
 
 - `GET /health` - Health check
@@ -116,10 +167,27 @@ curl http://127.0.0.1:8000/health
 
 ## Use Cases
 
-- **Testing**: Lightweight alternative to full LLM deployments
-- **Development**: Fast iteration with real model behavior
-- **CI/CD**: Automated testing with actual inference
-- **Prototyping**: Quick setup for AI application development
+### Strengths
+
+- **Fastest time-to-test**: 30 seconds from install to running
+- **Minimal resource footprint**: Designed for tiny models and efficient testing
+- **No GPU required**: Runs on laptops, Macs, and any CPU-only environment
+- **CI/CD integration friendly**: Lightweight and automation-ready
+- **Multiple instances**: Run same model with different names on different ports
+
+### Ideal For
+
+- **Automated testing pipelines**: Quick LLM endpoint setup for test suites
+- **Development environment mocking**: Real inference without production overhead
+- **Quick prototyping**: Fast iteration with actual model behavior
+- **Educational/learning scenarios**: Easy setup for AI development learning
+
+### Not Ideal For
+
+- **Production workloads**: Use Ollama or vLLM for production deployments
+- **Large model serving**: Designed for tiny models (< 1B parameters)
+- **Complex multi-agent workflows**: Use Semantic Kernel or similar frameworks
+- **High-performance inference**: Use vLLM or specialized serving solutions
 
 ## Configuration
 
@@ -133,7 +201,8 @@ Required:
   -m, --model TEXT              Model name to load (e.g., 'Qwen/Qwen3-0.6B') [required]
 
 Optional:
-  -n, --name, --served-model-name TEXT    Model name to serve via API (defaults to model name)
+  -n, --name, --served-model-name TEXT
+                                Model name to serve via API (defaults to model name)
   -p, --port INTEGER            Port to serve on (default: 8000)
   -h, --host TEXT               Host to bind to (default: 0.0.0.0)
   -b, --backend [transformers|vllm]      Backend to use (default: transformers)
@@ -159,7 +228,8 @@ llm-katan --model Qwen/Qwen3-0.6B --host 127.0.0.1 --port 9000
 
 # Multiple servers with different settings
 llm-katan --model Qwen/Qwen3-0.6B --port 8000 --max-tokens 512 --temperature 0.1
-llm-katan --model Qwen/Qwen3-0.6B --port 8001 --name "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --max-tokens 256 --temperature 0.9
+llm-katan --model Qwen/Qwen3-0.6B --port 8001 \
+  --name "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --max-tokens 256 --temperature 0.9
 ```
 
 ### Environment Variables
diff --git a/e2e-tests/llm-katan/create-demo-gif.md b/e2e-tests/llm-katan/create-demo-gif.md
@@ -0,0 +1,38 @@
+# Creating Demo GIF
+
+## Method 1: Using Browser + Screen Recorder
+
+1. Open `terminal-demo.html` in browser
+2. Use tool like LICEcap, GIMP, or ffmpeg to record:
+
+```bash
+# Using ffmpeg (if installed)
+ffmpeg -f avfoundation -i "1" -t 30 -r 10 demo.gif
+
+# Using LICEcap (GUI tool)
+# Download from: https://www.cockos.com/licecap/
+```
+
+## Method 2: Using Puppeteer (Automated)
+
+```javascript
+const puppeteer = require('puppeteer');
+
+(async () => {
+  const browser = await puppeteer.launch();
+  const page = await page.newPage();
+  await page.goto('file://' + __dirname + '/terminal-demo.html');
+
+  // Wait for animation to complete
+  await page.waitForTimeout(20000);
+
+  // Take screenshot or record
+  await page.screenshot({path: 'demo.png'});
+  await browser.close();
+})();
+```
+
+## Method 3: Embed as Raw HTML (Limited)
+
+GitHub README supports some HTML, but JavaScript is stripped.
+The TypeIt.js animation won't work, but we can show a static version.
diff --git a/e2e-tests/llm-katan/demo-embed.html b/e2e-tests/llm-katan/demo-embed.html
@@ -0,0 +1,61 @@
+<!-- Embeddable Terminal Demo Widget -->
+<div id="llm-katan-demo" style="
+    background: #1e1e1e;
+    color: #d4d4d4;
+    font-family: 'Consolas', 'Monaco', 'Courier New', monospace;
+    padding: 20px;
+    border-radius: 8px;
+    border: 1px solid #333;
+    max-width: 800px;
+    margin: 20px auto;
+    font-size: 13px;
+    line-height: 1.4;
+">
+    <div style="color: #569cd6; font-weight: bold; margin-bottom: 15px; text-align: center;">
+        🚀 LLM Katan Multi-Instance Demo
+    </div>
+    <div id="demo-content"></div>
+</div>
+
+<script src="https://unpkg.com/typeit@8.8.0/dist/index.umd.js"></script>
+<script>
+new TypeIt("#demo-content", {
+    speed: 40,
+    waitUntilVisible: true
+})
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">pip install llm-katan</span>')
+.break()
+.type('Successfully installed llm-katan-0.1.8')
+.break()
+.break()
+.pause(800)
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;"># Start mock GPT-3.5-Turbo on port 8000</span>')
+.break()
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo"</span>')
+.break()
+.type('<span style="color: #4fc1e9;">✅ Server running on http://0.0.0.0:8000</span>')
+.break()
+.break()
+.pause(800)
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;"># Start mock Claude-3-Haiku on port 8001</span>')
+.break()
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku"</span>')
+.break()
+.type('<span style="color: #4fc1e9;">✅ Server running on http://0.0.0.0:8001</span>')
+.break()
+.break()
+.pause(800)
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">curl localhost:8000/v1/models | jq \'.data[0].id\'</span>')
+.break()
+.type('"gpt-3.5-turbo"')
+.break()
+.break()
+.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">curl localhost:8001/v1/models | jq \'.data[0].id\'</span>')
+.break()
+.type('"claude-3-haiku"')
+.break()
+.break()
+.pause(800)
+.type('<span style="color: #4fc1e9;"># Same tiny model, different API names! 🎯</span>')
+.go();
+</script>
diff --git a/e2e-tests/llm-katan/demo-script.md b/e2e-tests/llm-katan/demo-script.md
@@ -0,0 +1,51 @@
+# Multi-Instance Demo Script
+
+## Terminal Commands to Record
+
+### Terminal 1: Start first instance (gpt-3.5-turbo)
+
+```bash
+# Clear screen
+clear
+
+# Install (simulate - already installed)
+echo "$ pip install llm-katan"
+echo "Requirement already satisfied: llm-katan"
+
+# Start first server
+echo "$ llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name 'gpt-3.5-turbo'"
+llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo" &
+sleep 3
+```
+
+### Terminal 2: Start second instance (claude-3-haiku)
+
+```bash
+clear
+echo "$ llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name 'claude-3-haiku'"
+llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku" &
+sleep 3
+```
+
+### Terminal 3: Test both endpoints
+
+```bash
+clear
+echo "$ curl http://localhost:8000/v1/models | jq '.data[0].id'"
+curl -s http://localhost:8000/v1/models | jq '.data[0].id'
+
+echo ""
+echo "$ curl http://localhost:8001/v1/models | jq '.data[0].id'"
+curl -s http://localhost:8001/v1/models | jq '.data[0].id'
+
+echo ""
+echo "# Same tiny model, different API names for testing!"
+```
+
+## Key Points to Highlight
+
+- One tiny model (Qwen3-0.6B)
+- Two different API endpoints
+- Different model names served
+- Perfect for testing multi-provider scenarios
+- Minimal resource usage
diff --git a/e2e-tests/llm-katan/pyproject.toml b/e2e-tests/llm-katan/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "llm-katan"
-version = "0.1.7"
+version = "0.1.8"
 description = "LLM Katan - Lightweight LLM Server for Testing - Real tiny models with FastAPI and HuggingFace"
 readme = "README.md"
 authors = [
diff --git a/e2e-tests/llm-katan/terminal-demo.html b/e2e-tests/llm-katan/terminal-demo.html