Skip to content

Commit 164c391

Browse files
yossiovadiaclaude
andauthored
LLM-Katan Terminal animation demo in the readme files (#240)
* feat: add interactive terminal demo for multi-instance testing - Created animated terminal demo showcasing multi-instance capabilities - Added terminal-demo.html with realistic typing animations using TypeIt.js - Enhanced README with live demo link and improved use case documentation - Added embeddable demo widget (demo-embed.html) for external sites - Updated multi-instance examples to show mocking popular AI providers - Improved positioning documentation with strengths vs competitors - Highlighted key advantage: no GPU required, runs on laptops/Macs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * chore: add .gitignore to exclude build artifacts and demo recordings - Added .gitignore to exclude .cast files from asciinema recordings - Excluded common build artifacts and IDE files - Prevents accidental commits of temporary demo files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: enhance demo accessibility with GitHub Pages link and preview - Added GitHub Pages link for live interactive demo - Added collapsible preview section showing terminal output - Included fallback instructions for local demo viewing - Added guide for creating demo GIF alternatives 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: update demo links to point to main project repository - Changed GitHub Pages links from personal repo to vllm-project repository - Ensures demo will work once PR is merged to main - Provides correct canonical URL for PyPI and documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: add demo testing guide for PR reviewers - Created instructions for reviewers to test the interactive demo - Provided multiple options: local checkout, raw file viewing, static preview - Explains why live links won't work until PR is merged - Helps reviewers experience the full animation during review process 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * chore: remove demo testing guide Removed DEMO_TESTING.md to keep the PR focused on the core demo functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: improve terminal demo layout and fix markdown lint issues Terminal Demo: - Reduced terminal heights from 300px to 220px with max-height 250px - Added overflow-y for better space utilization - Prevents bottom terminal from requiring scroll Markdown Lint: - Fixed line length issues (MD013) by breaking long lines - Converted bold text to proper headings (MD036) - Added blank lines around headings and lists (MD022, MD032) - Added markdownlint disable comments for required HTML elements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: improve terminal demo sizing and timing - Restored bottom terminal (terminal-full) to proper size (300px min-height) - Increased Terminal 3 delay from 8.5s to 10s for better timing - Ensures Terminal 3 starts only after both servers complete their setup - Top terminals remain compact at 220-250px for better layout 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: resolve markdown lint issues in demo documentation - Added missing blank lines around fenced code blocks - Added trailing newlines to all markdown files - Added blank lines around lists - Ensures compliance with project markdown linting rules 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent 858dd50 commit 164c391

File tree

7 files changed

+446
-17
lines changed

7 files changed

+446
-17
lines changed

e2e-tests/llm-katan/.gitignore

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Build artifacts
2+
dist/
3+
build/
4+
*.egg-info/
5+
6+
# Python cache
7+
__pycache__/
8+
*.pyc
9+
*.pyo
10+
11+
# Demo recordings
12+
*.cast
13+
14+
# IDE files
15+
.vscode/
16+
.idea/
17+
18+
# OS files
19+
.DS_Store
20+
Thumbs.db

e2e-tests/llm-katan/README.md

Lines changed: 86 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# LLM Katan - Lightweight LLM Server for Testing
22

3-
A lightweight LLM serving package using FastAPI and HuggingFace transformers, designed for testing and development with real tiny models.
3+
A lightweight LLM serving package using FastAPI and HuggingFace transformers,
4+
designed for testing and development with real tiny models.
5+
6+
> **🎬 [See Live Demo](https://vllm-project.github.io/semantic-router/e2e-tests/llm-katan/terminal-demo.html)**
7+
> Interactive terminal showing multi-instance setup in action!
48
59
## Features
610

@@ -24,32 +28,34 @@ pip install llm-katan
2428

2529
#### HuggingFace Token (Required)
2630

27-
LLM Katan uses HuggingFace transformers to download models. You'll need a HuggingFace token for:
31+
LLM Katan uses HuggingFace transformers to download models.
32+
You'll need a HuggingFace token for:
2833

2934
- Private models
3035
- Avoiding rate limits
3136
- Reliable model downloads
3237

33-
**Option 1: Environment Variable**
38+
#### Option 1: Environment Variable
3439

3540
```bash
3641
export HUGGINGFACE_HUB_TOKEN="your_token_here"
3742
```
3843

39-
**Option 2: Login via CLI**
44+
#### Option 2: Login via CLI
4045

4146
```bash
4247
huggingface-cli login
4348
```
4449

45-
**Option 3: Token file in home directory**
50+
#### Option 3: Token file in home directory
4651

4752
```bash
4853
# Create ~/.cache/huggingface/token file with your token
4954
echo "your_token_here" > ~/.cache/huggingface/token
5055
```
5156

52-
**Get your token:** Visit [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
57+
**Get your token:**
58+
Visit [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
5359

5460
### Basic Usage
5561

@@ -66,14 +72,59 @@ llm-katan --model Qwen/Qwen3-0.6B --port 8000 --backend vllm
6672

6773
### Multi-Instance Testing
6874

75+
**🎬 [Live Demo](https://vllm-project.github.io/semantic-router/e2e-tests/llm-katan/terminal-demo.html)**
76+
See this in action with animated terminals!
77+
78+
> *Note: If GitHub Pages isn't enabled, you can also
79+
> [download and open the demo locally](./terminal-demo.html)*
80+
81+
<!-- markdownlint-disable MD033 -->
82+
<details>
83+
<summary>📺 Preview (click to expand)</summary>
84+
<!-- markdownlint-enable MD033 -->
85+
6986
```bash
70-
# Terminal 1: Qwen endpoint
71-
llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "Qwen/Qwen2-0.5B-Instruct"
87+
# Terminal 1: Installing and starting GPT-3.5-Turbo mock
88+
$ pip install llm-katan
89+
Successfully installed llm-katan-0.1.8
7290

73-
# Terminal 2: Same model, different name
74-
llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
91+
$ llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo"
92+
🚀 Starting LLM Katan server with model: Qwen/Qwen3-0.6B
93+
📛 Served model name: gpt-3.5-turbo
94+
✅ Server running on http://0.0.0.0:8000
95+
96+
# Terminal 2: Starting Claude-3-Haiku mock
97+
$ llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku"
98+
🚀 Starting LLM Katan server with model: Qwen/Qwen3-0.6B
99+
📛 Served model name: claude-3-haiku
100+
✅ Server running on http://0.0.0.0:8001
101+
102+
# Terminal 3: Testing both endpoints
103+
$ curl localhost:8000/v1/models | jq '.data[0].id'
104+
"gpt-3.5-turbo"
105+
106+
$ curl localhost:8001/v1/models | jq '.data[0].id'
107+
"claude-3-haiku"
108+
109+
# Same tiny model, different API names! 🎯
75110
```
76111

112+
</details>
113+
114+
```bash
115+
# Terminal 1: Mock GPT-3.5-Turbo
116+
llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo"
117+
118+
# Terminal 2: Mock Claude-3-Haiku
119+
llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku"
120+
121+
# Terminal 3: Test both endpoints
122+
curl http://localhost:8000/v1/models # Returns "gpt-3.5-turbo"
123+
curl http://localhost:8001/v1/models # Returns "claude-3-haiku"
124+
```
125+
126+
**Perfect for testing multi-provider scenarios with one tiny model!**
127+
77128
## API Endpoints
78129

79130
- `GET /health` - Health check
@@ -116,10 +167,27 @@ curl http://127.0.0.1:8000/health
116167

117168
## Use Cases
118169

119-
- **Testing**: Lightweight alternative to full LLM deployments
120-
- **Development**: Fast iteration with real model behavior
121-
- **CI/CD**: Automated testing with actual inference
122-
- **Prototyping**: Quick setup for AI application development
170+
### Strengths
171+
172+
- **Fastest time-to-test**: 30 seconds from install to running
173+
- **Minimal resource footprint**: Designed for tiny models and efficient testing
174+
- **No GPU required**: Runs on laptops, Macs, and any CPU-only environment
175+
- **CI/CD integration friendly**: Lightweight and automation-ready
176+
- **Multiple instances**: Run same model with different names on different ports
177+
178+
### Ideal For
179+
180+
- **Automated testing pipelines**: Quick LLM endpoint setup for test suites
181+
- **Development environment mocking**: Real inference without production overhead
182+
- **Quick prototyping**: Fast iteration with actual model behavior
183+
- **Educational/learning scenarios**: Easy setup for AI development learning
184+
185+
### Not Ideal For
186+
187+
- **Production workloads**: Use Ollama or vLLM for production deployments
188+
- **Large model serving**: Designed for tiny models (< 1B parameters)
189+
- **Complex multi-agent workflows**: Use Semantic Kernel or similar frameworks
190+
- **High-performance inference**: Use vLLM or specialized serving solutions
123191

124192
## Configuration
125193

@@ -133,7 +201,8 @@ Required:
133201
-m, --model TEXT Model name to load (e.g., 'Qwen/Qwen3-0.6B') [required]
134202

135203
Optional:
136-
-n, --name, --served-model-name TEXT Model name to serve via API (defaults to model name)
204+
-n, --name, --served-model-name TEXT
205+
Model name to serve via API (defaults to model name)
137206
-p, --port INTEGER Port to serve on (default: 8000)
138207
-h, --host TEXT Host to bind to (default: 0.0.0.0)
139208
-b, --backend [transformers|vllm] Backend to use (default: transformers)
@@ -159,7 +228,8 @@ llm-katan --model Qwen/Qwen3-0.6B --host 127.0.0.1 --port 9000
159228

160229
# Multiple servers with different settings
161230
llm-katan --model Qwen/Qwen3-0.6B --port 8000 --max-tokens 512 --temperature 0.1
162-
llm-katan --model Qwen/Qwen3-0.6B --port 8001 --name "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --max-tokens 256 --temperature 0.9
231+
llm-katan --model Qwen/Qwen3-0.6B --port 8001 \
232+
--name "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --max-tokens 256 --temperature 0.9
163233
```
164234

165235
### Environment Variables
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Creating Demo GIF
2+
3+
## Method 1: Using Browser + Screen Recorder
4+
5+
1. Open `terminal-demo.html` in browser
6+
2. Use tool like LICEcap, GIMP, or ffmpeg to record:
7+
8+
```bash
9+
# Using ffmpeg (if installed)
10+
ffmpeg -f avfoundation -i "1" -t 30 -r 10 demo.gif
11+
12+
# Using LICEcap (GUI tool)
13+
# Download from: https://www.cockos.com/licecap/
14+
```
15+
16+
## Method 2: Using Puppeteer (Automated)
17+
18+
```javascript
19+
const puppeteer = require('puppeteer');
20+
21+
(async () => {
22+
const browser = await puppeteer.launch();
23+
const page = await page.newPage();
24+
await page.goto('file://' + __dirname + '/terminal-demo.html');
25+
26+
// Wait for animation to complete
27+
await page.waitForTimeout(20000);
28+
29+
// Take screenshot or record
30+
await page.screenshot({path: 'demo.png'});
31+
await browser.close();
32+
})();
33+
```
34+
35+
## Method 3: Embed as Raw HTML (Limited)
36+
37+
GitHub README supports some HTML, but JavaScript is stripped.
38+
The TypeIt.js animation won't work, but we can show a static version.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<!-- Embeddable Terminal Demo Widget -->
2+
<div id="llm-katan-demo" style="
3+
background: #1e1e1e;
4+
color: #d4d4d4;
5+
font-family: 'Consolas', 'Monaco', 'Courier New', monospace;
6+
padding: 20px;
7+
border-radius: 8px;
8+
border: 1px solid #333;
9+
max-width: 800px;
10+
margin: 20px auto;
11+
font-size: 13px;
12+
line-height: 1.4;
13+
">
14+
<div style="color: #569cd6; font-weight: bold; margin-bottom: 15px; text-align: center;">
15+
🚀 LLM Katan Multi-Instance Demo
16+
</div>
17+
<div id="demo-content"></div>
18+
</div>
19+
20+
<script src="https://unpkg.com/[email protected]/dist/index.umd.js"></script>
21+
<script>
22+
new TypeIt("#demo-content", {
23+
speed: 40,
24+
waitUntilVisible: true
25+
})
26+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">pip install llm-katan</span>')
27+
.break()
28+
.type('Successfully installed llm-katan-0.1.8')
29+
.break()
30+
.break()
31+
.pause(800)
32+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;"># Start mock GPT-3.5-Turbo on port 8000</span>')
33+
.break()
34+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo"</span>')
35+
.break()
36+
.type('<span style="color: #4fc1e9;">✅ Server running on http://0.0.0.0:8000</span>')
37+
.break()
38+
.break()
39+
.pause(800)
40+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;"># Start mock Claude-3-Haiku on port 8001</span>')
41+
.break()
42+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku"</span>')
43+
.break()
44+
.type('<span style="color: #4fc1e9;">✅ Server running on http://0.0.0.0:8001</span>')
45+
.break()
46+
.break()
47+
.pause(800)
48+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">curl localhost:8000/v1/models | jq \'.data[0].id\'</span>')
49+
.break()
50+
.type('"gpt-3.5-turbo"')
51+
.break()
52+
.break()
53+
.type('<span style="color: #4ec9b0;">$</span> <span style="color: #ce9178;">curl localhost:8001/v1/models | jq \'.data[0].id\'</span>')
54+
.break()
55+
.type('"claude-3-haiku"')
56+
.break()
57+
.break()
58+
.pause(800)
59+
.type('<span style="color: #4fc1e9;"># Same tiny model, different API names! 🎯</span>')
60+
.go();
61+
</script>

e2e-tests/llm-katan/demo-script.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Multi-Instance Demo Script
2+
3+
## Terminal Commands to Record
4+
5+
### Terminal 1: Start first instance (gpt-3.5-turbo)
6+
7+
```bash
8+
# Clear screen
9+
clear
10+
11+
# Install (simulate - already installed)
12+
echo "$ pip install llm-katan"
13+
echo "Requirement already satisfied: llm-katan"
14+
15+
# Start first server
16+
echo "$ llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name 'gpt-3.5-turbo'"
17+
llm-katan --model Qwen/Qwen3-0.6B --port 8000 --served-model-name "gpt-3.5-turbo" &
18+
sleep 3
19+
```
20+
21+
### Terminal 2: Start second instance (claude-3-haiku)
22+
23+
```bash
24+
clear
25+
echo "$ llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name 'claude-3-haiku'"
26+
llm-katan --model Qwen/Qwen3-0.6B --port 8001 --served-model-name "claude-3-haiku" &
27+
sleep 3
28+
```
29+
30+
### Terminal 3: Test both endpoints
31+
32+
```bash
33+
clear
34+
echo "$ curl http://localhost:8000/v1/models | jq '.data[0].id'"
35+
curl -s http://localhost:8000/v1/models | jq '.data[0].id'
36+
37+
echo ""
38+
echo "$ curl http://localhost:8001/v1/models | jq '.data[0].id'"
39+
curl -s http://localhost:8001/v1/models | jq '.data[0].id'
40+
41+
echo ""
42+
echo "# Same tiny model, different API names for testing!"
43+
```
44+
45+
## Key Points to Highlight
46+
47+
- One tiny model (Qwen3-0.6B)
48+
- Two different API endpoints
49+
- Different model names served
50+
- Perfect for testing multi-provider scenarios
51+
- Minimal resource usage

e2e-tests/llm-katan/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "llm-katan"
7-
version = "0.1.7"
7+
version = "0.1.8"
88
description = "LLM Katan - Lightweight LLM Server for Testing - Real tiny models with FastAPI and HuggingFace"
99
readme = "README.md"
1010
authors = [

0 commit comments

Comments
 (0)