Skip to content

Commit d2c9d2e

Browse files
committed
feat(gpt-bot): Replace with SmolLM2 family and fix model switching
- Replace Qwen/Llama models with SmolLM2 family (135M, 360M, 1.7B) - Fix model switching bug by calling dispose() before loading new model - Add RAM detection (navigator.deviceMemory) to auto-select best model - Dynamically populate model selector dropdown - Update architecture specs and documentation for SmolLM2
1 parent c8055c2 commit d2c9d2e

File tree

4 files changed

+191
-161
lines changed

4 files changed

+191
-161
lines changed

demos/chatbot-evolution/README.md

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -120,28 +120,28 @@ An interactive journey through 60 years of conversational AI development, from E
120120
- Follow-up questions to test context
121121
- Compare with rule-based responses above
122122

123-
### 5. GPT & Transformers (2020s)
124-
**Innovation:** Large Language Models with transformers
123+
### 5. SmolLM2 & Transformers (2020s)
124+
**Creator:** HuggingFace (2024)
125125

126-
**Approach:** Self-attention, massive scale, pre-training
126+
**Approach:** Decoder-only transformer, optimized for browser/edge deployment
127127

128128
**How it works:**
129-
- Transformer architecture (self-attention)
130-
- Pre-trained on vast text corpora
131-
- Fine-tuned for instruction following
132-
- Few-shot learning capabilities
129+
- Auto-selects model size based on your device RAM
130+
- Available in 135M, 360M, and 1.7B parameter variants
131+
- Pre-trained on FineWeb-Edu, code, and synthetic data
132+
- Instruction-tuned for helpful conversations
133+
- Runs entirely in your browser via WebGPU or WASM
133134

134135
**Famous for:**
135-
- ChatGPT, GPT-4, Claude, Gemini
136-
- Unprecedented fluency and coherence
137-
- Reasoning and problem-solving
138-
- Multi-turn conversations
136+
- Best-in-class quality for browser-sized models
137+
- Native Transformers.js support (ONNX bundled)
138+
- Open-source (Apache 2.0)
139139

140140
**Capabilities:**
141-
- Context understanding
142-
- Knowledge integration
143-
- Creative generation
144-
- Task completion
141+
- Multi-turn conversation with context preservation
142+
- General knowledge and reasoning
143+
- Code generation and explanation
144+
- Helpful, instruction-following responses
145145

146146
## Features
147147

@@ -237,7 +237,7 @@ An interactive journey through 60 years of conversational AI development, from E
237237
- Initial model loading time in browser
238238

239239
**What Makes This Demo Special:**
240-
This demo runs **real neural models** (BlenderBot 90M, Qwen2.5 0.5B) directly in your browser using Transformers.js. You'll experience genuine neural behavior - not simulations - allowing you to see the clear evolution from rule-based to learned conversation.
240+
This demo runs **real neural models** (BlenderBot 90M, SmolLM2 135M-1.7B) directly in your browser using Transformers.js. The SmolLM2 model auto-selects based on your device RAM. You'll experience genuine neural behavior - not simulations - allowing you to see the clear evolution from rule-based to learned conversation.
241241

242242
## Historical Timeline
243243

@@ -263,7 +263,7 @@ This demo runs **real neural models** (BlenderBot 90M, Qwen2.5 0.5B) directly in
263263
- **PARRY**: State machine with emotional variables
264264
- **A.L.I.C.E.**: ~40,000 AIML patterns
265265
- **BlenderBot Small**: 90 million parameters (real neural model)
266-
- **Qwen2.5 0.5B**: 500 million parameters (real neural model)
266+
- **SmolLM2**: 135M-1.7B parameters (real neural model, auto-selected)
267267
- **GPT-3**: 175 billion parameters (comparison reference)
268268

269269
### Architectural Progression
@@ -340,11 +340,13 @@ AIML-inspired pattern matching with improved context handling over ELIZA.
340340
- Loads and runs entirely in your browser (may take 30-60 seconds initially)
341341
- Fallback to DialoGPT-small if BlenderBot fails to load
342342

343-
### GPT / Qwen2.5
344-
Uses Transformers.js with Qwen2.5-0.5B-Instruct (500M parameters) for actual neural text generation in the browser. Falls back to Llama-3.2-1B-Instruct if Qwen fails to load. Demonstrates modern transformer capabilities:
343+
### SmolLM2
344+
Uses Transformers.js with HuggingFace's SmolLM2 family (135M, 360M, 1.7B parameters) for actual neural text generation in the browser. Key features:
345+
- **Auto-selects model** based on device RAM (navigator.deviceMemory API)
346+
- **Proper model switching** with dispose() to prevent memory leaks
345347
- Instruction-tuned for natural conversations
346348
- Conversation history preserved across turns
347-
- Runs entirely in-browser via WASM or WebGPU
349+
- Runs entirely in-browser via WebGPU (preferred) or WASM fallback
348350
- May take 30-60 seconds to download on first load
349351

350352
## Extensions

demos/chatbot-evolution/index.html

Lines changed: 27 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ <h1 class="hero-title">Chatbot Evolution Timeline</h1>
4141
<span class="era-label">2020<br>BlenderBot</span>
4242
</div>
4343
<div class="era era-2020s" data-era="2020s">
44-
<span class="era-label">2024<br>Qwen2.5</span>
44+
<span class="era-label">2024<br>SmolLM2</span>
4545
</div>
4646
</div>
4747
</section>
@@ -455,20 +455,20 @@ <h2>2020s: GPT & Transformers</h2>
455455

456456
<div class="chatbot-info">
457457
<div class="info-card">
458-
<h3>About Modern LLMs</h3>
459-
<p><strong>Innovation:</strong> Instruction-tuned transformers</p>
458+
<h3>About SmolLM2</h3>
459+
<p><strong>Creator:</strong> HuggingFace (2024)</p>
460460
<p><strong>Method:</strong> Decoder-only transformer with chat templates</p>
461-
<p><strong>Models:</strong> Qwen2.5 0.5B, SmolLM 360M</p>
462-
<p><strong>Context:</strong> Multi-turn conversation support</p>
461+
<p><strong>Models:</strong> 135M, 360M, 1.7B parameters</p>
462+
<p><strong>Innovation:</strong> Optimized for browser/edge deployment</p>
463463
</div>
464464

465465
<div class="info-card">
466466
<h3>How It Works</h3>
467467
<ul>
468-
<li>Pre-trained on vast text corpora</li>
469-
<li>Fine-tuned for instruction following</li>
470-
<li>System prompts guide behavior</li>
471-
<li>Runs entirely in your browser (WASM/WebGPU)</li>
468+
<li>Auto-selects model based on your device RAM</li>
469+
<li>Pre-trained on web text, code, and reasoning data</li>
470+
<li>Instruction-tuned for helpful conversations</li>
471+
<li>Runs entirely in your browser (WebGPU/WASM)</li>
472472
</ul>
473473
</div>
474474
</div>
@@ -490,24 +490,21 @@ <h3>How It Works</h3>
490490
<div class="chat-messages" id="gpt-messages"></div>
491491
<div class="chat-input-area">
492492
<div class="model-selector-row">
493-
<select id="gpt-model-selector" class="model-selector">
494-
<option value="0">Qwen 2.5 0.5B (Alibaba)</option>
495-
<option value="1">Llama 3.2 1B (Meta)</option>
496-
</select>
493+
<select id="gpt-model-selector" class="model-selector"></select>
497494
<button class="btn-clear-chat" id="gpt-clear-btn" title="Clear chat history">Clear</button>
498495
</div>
499496
<input type="text" class="chat-input" id="gpt-input" placeholder="Talk to the model...">
500497
<button class="chat-send" id="gpt-send-btn" onclick="sendMessage('gpt')">Send</button>
501498
</div>
502-
<p class="demo-note">Select a model above. Conversation history is preserved. First message loads the model (~30s).</p>
499+
<p class="demo-note">Model auto-selected based on your device RAM. First message loads the model (~30-60s).</p>
503500
</div>
504501
</div>
505502

506503
<!-- Architecture Tab -->
507504
<div class="chatbot-tab-content" id="gpt-architecture-tab">
508505
<div class="architecture-content">
509506
<div class="architecture-diagram">
510-
<h4>Decoder-Only Transformer (Qwen2.5 / Llama 3.2)</h4>
507+
<h4>Decoder-Only Transformer (SmolLM2)</h4>
511508
<div class="arch-flow">
512509
<div class="arch-block input-block">
513510
<div class="block-label">Input Prompt</div>
@@ -523,9 +520,9 @@ <h4>Decoder-Only Transformer (Qwen2.5 / Llama 3.2)</h4>
523520
<div class="block-label">Transformer Decoder Stack</div>
524521
<div class="block-content">
525522
<div class="sub-block">Grouped-Query Attention</div>
526-
<div class="sub-block">SiLU FFN</div>
523+
<div class="sub-block">SwiGLU FFN</div>
527524
<div class="sub-block">RMSNorm</div>
528-
<div class="block-note">x24 layers (Qwen) / x16 layers (Llama)</div>
525+
<div class="block-note">x9 (135M) / x16 (360M) / x24 (1.7B) layers</div>
529526
</div>
530527
</div>
531528
<div class="arch-arrow">&#8595;</div>
@@ -559,21 +556,21 @@ <h5>Quantization (q4)</h5>
559556
</div>
560557

561558
<div class="model-specs">
562-
<h4>Available Models</h4>
559+
<h4>SmolLM2 Model Family</h4>
563560
<table class="specs-table">
564-
<tr><th>Spec</th><th>Qwen 2.5 0.5B</th><th>Llama 3.2 1B</th></tr>
565-
<tr><td>Parameters</td><td>500 Million</td><td>1 Billion</td></tr>
566-
<tr><td>Architecture</td><td>Decoder-Only</td><td>Decoder-Only</td></tr>
567-
<tr><td>Layers</td><td>24</td><td>16</td></tr>
568-
<tr><td>Hidden Size</td><td>896</td><td>2048</td></tr>
569-
<tr><td>Context Length</td><td>32,768 tokens</td><td>131,072 tokens</td></tr>
570-
<tr><td>Organization</td><td>Alibaba</td><td>Meta</td></tr>
571-
<tr><td>Year</td><td>2024</td><td>2024</td></tr>
561+
<tr><th>Spec</th><th>135M</th><th>360M</th><th>1.7B</th></tr>
562+
<tr><td>Parameters</td><td>135 Million</td><td>360 Million</td><td>1.7 Billion</td></tr>
563+
<tr><td>Layers</td><td>9</td><td>16</td><td>24</td></tr>
564+
<tr><td>Hidden Size</td><td>576</td><td>960</td><td>2048</td></tr>
565+
<tr><td>Download (q4)</td><td>~85 MB</td><td>~210 MB</td><td>~980 MB</td></tr>
566+
<tr><td>Min RAM</td><td>2 GB</td><td>4 GB</td><td>8 GB</td></tr>
567+
<tr><td>Context Length</td><td colspan="3">8,192 tokens</td></tr>
572568
</table>
573569
<div class="model-note">
574570
<strong>Model Links:</strong>
575-
<a href="https://huggingface.co/onnx-community/Qwen2.5-0.5B-Instruct" target="_blank">Qwen2.5-0.5B-Instruct</a> |
576-
<a href="https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a>
571+
<a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct" target="_blank">135M</a> |
572+
<a href="https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct" target="_blank">360M</a> |
573+
<a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct" target="_blank">1.7B</a>
577574
</div>
578575
</div>
579576
</div>
@@ -642,8 +639,8 @@ <h3>Evolution Stats</h3>
642639
<span class="stat-value">90M params</span>
643640
</div>
644641
<div class="stat-item">
645-
<span class="stat-label">Qwen 2.5 (2024)</span>
646-
<span class="stat-value">500M params</span>
642+
<span class="stat-label">SmolLM2 (2024)</span>
643+
<span class="stat-value">135M-1.7B params</span>
647644
</div>
648645
<div class="stat-item">
649646
<span class="stat-label">Claude 4 Opus (2025)</span>

0 commit comments

Comments
 (0)