@@ -41,7 +41,7 @@ <h1 class="hero-title">Chatbot Evolution Timeline</h1>
4141 < span class ="era-label "> 2020< br > BlenderBot</ span >
4242 </ div >
4343 < div class ="era era-2020s " data-era ="2020s ">
44- < span class ="era-label "> 2023 < br > LaMini-T5 </ span >
44+ < span class ="era-label "> 2025 < br > Qwen3 </ span >
4545 </ div >
4646 </ div >
4747 </ section >
@@ -455,20 +455,20 @@ <h2>2020s: GPT & Transformers</h2>
455455
456456 < div class ="chatbot-info ">
457457 < div class ="info-card ">
458- < h3 > About LaMini-T5 </ h3 >
459- < p > < strong > Innovation:</ strong > Knowledge distillation from larger LLMs </ p >
460- < p > < strong > Method:</ strong > T5 encoder-decoder, instruction tuning </ p >
461- < p > < strong > Model:</ strong > LaMini-Flan-T5-248M (248M parameters)</ p >
462- < p > < strong > Inspiration :</ strong > GPT-4, Claude, Gemini use similar techniques at scale </ p >
458+ < h3 > About Qwen3 </ h3 >
459+ < p > < strong > Innovation:</ strong > Hybrid thinking/non-thinking reasoning </ p >
460+ < p > < strong > Method:</ strong > Decoder-only transformer, GQA attention </ p >
461+ < p > < strong > Model:</ strong > Qwen3-1.7B (1.7B parameters)</ p >
462+ < p > < strong > Benchmarks :</ strong > MMLU 71.2, HumanEval 65.8% </ p >
463463 </ div >
464464
465465 < div class ="info-card ">
466466 < h3 > How It Works</ h3 >
467467 < ul >
468- < li > T5 encoder-decoder architecture </ li >
469- < li > Distilled from larger instruction-tuned models </ li >
470- < li > Optimized for Q&A and chat </ li >
471- < li > Runs entirely in browser</ li >
468+ < li > Decoder-only transformer (GPT-style) </ li >
469+ < li > Grouped-Query Attention for efficiency </ li >
470+ < li > 100+ language support </ li >
471+ < li > WebGPU accelerated in browser</ li >
472472 </ ul >
473473 </ div >
474474 </ div >
@@ -484,50 +484,41 @@ <h3>How It Works</h3>
484484 < div class ="chat-interface ">
485485 < div class ="model-loading-status hidden " id ="gpt-loading-status ">
486486 < div class ="loading-spinner "> </ div >
487- < div class ="loading-text "> Loading LaMini-Flan-T5 ...</ div >
487+ < div class ="loading-text "> Loading Qwen3-1.7B ...</ div >
488488 < div class ="loading-progress " id ="gpt-progress "> Initializing...</ div >
489489 </ div >
490490 < div class ="chat-messages " id ="gpt-messages "> </ div >
491491 < div class ="chat-input-area ">
492- < input type ="text " class ="chat-input " id ="gpt-input " placeholder ="Talk to LaMini-T5 ... ">
492+ < input type ="text " class ="chat-input " id ="gpt-input " placeholder ="Talk to Qwen3 ... ">
493493 < button class ="chat-send " id ="gpt-send-btn " onclick ="sendMessage('gpt') "> Send</ button >
494494 </ div >
495- < p class ="demo-note "> Using LaMini-Flan-T5-248M - an instruction-tuned model . Loads on first message (~30s) .</ p >
495+ < p class ="demo-note "> Using Qwen3-1.7B (MMLU 71.2). WebGPU accelerated . Loads on first message.</ p >
496496 </ div >
497497 </ div >
498498
499499 <!-- Architecture Tab -->
500500 < div class ="chatbot-tab-content " id ="gpt-architecture-tab ">
501501 < div class ="architecture-content ">
502502 < div class ="architecture-diagram ">
503- < h4 > Encoder- Decoder Transformer (T5 )</ h4 >
503+ < h4 > Decoder-Only Transformer (Qwen3 )</ h4 >
504504 < div class ="arch-flow ">
505505 < div class ="arch-block input-block ">
506506 < div class ="block-label "> Input Prompt</ div >
507507 < div class ="block-content "> "What is AI?"</ div >
508508 </ div >
509509 < div class ="arch-arrow "> ↓</ div >
510510 < div class ="arch-block ">
511- < div class ="block-label "> Tokenizer + Embeddings</ div >
512- < div class ="block-content "> Token IDs + Position Bias </ div >
511+ < div class ="block-label "> Tokenizer + RoPE Embeddings</ div >
512+ < div class ="block-content "> Token IDs + Rotary Position </ div >
513513 </ div >
514514 < div class ="arch-arrow "> ↓</ div >
515- < div class ="arch-block encoder -block ">
516- < div class ="block-label "> T5 Encoder </ div >
515+ < div class ="arch-block decoder-only -block ">
516+ < div class ="block-label "> Qwen3 Decoder Stack </ div >
517517 < div class ="block-content ">
518- < div class ="sub-block "> Self-Attention</ div >
519- < div class ="sub-block "> Feed Forward</ div >
520- < div class ="block-note "> x12 layers</ div >
521- </ div >
522- </ div >
523- < div class ="arch-arrow "> ↓</ div >
524- < div class ="arch-block decoder-block ">
525- < div class ="block-label "> T5 Decoder</ div >
526- < div class ="block-content ">
527- < div class ="sub-block "> Masked Self-Attention</ div >
528- < div class ="sub-block "> Cross-Attention</ div >
529- < div class ="sub-block "> Feed Forward</ div >
530- < div class ="block-note "> x12 layers</ div >
518+ < div class ="sub-block "> Grouped-Query Attention</ div >
519+ < div class ="sub-block "> SwiGLU FFN</ div >
520+ < div class ="sub-block "> RMSNorm</ div >
521+ < div class ="block-note "> x28 layers</ div >
531522 </ div >
532523 </ div >
533524 < div class ="arch-arrow "> ↓</ div >
@@ -542,37 +533,40 @@ <h4>Encoder-Decoder Transformer (T5)</h4>
542533 < h4 > Key Concepts</ h4 >
543534 < div class ="concept-grid ">
544535 < div class ="concept-card ">
545- < h5 > Encoder- Decoder</ h5 >
546- < p > T5 uses both encoder (understands input) and decoder (generates output). More flexible than decoder-only for certain tasks .</ p >
536+ < h5 > Decoder-Only </ h5 >
537+ < p > Like GPT, Qwen3 uses only decoder layers. Each token attends to all previous tokens (causal attention) .</ p >
547538 </ div >
548539 < div class ="concept-card ">
549- < h5 > Knowledge Distillation </ h5 >
550- < p > LaMini models are trained to mimic larger LLMs, compressing their knowledge into a smaller, faster model .</ p >
540+ < h5 > Grouped-Query Attention </ h5 >
541+ < p > GQA reduces memory usage by sharing key-value heads across query heads, enabling larger context windows .</ p >
551542 </ div >
552543 < div class ="concept-card ">
553- < h5 > Instruction Tuning </ h5 >
554- < p > Fine-tuned on instruction-following datasets to understand and respond to user queries naturally .</ p >
544+ < h5 > Hybrid Reasoning </ h5 >
545+ < p > Qwen3 can use "thinking mode" for complex problems or "fast mode" for quick responses .</ p >
555546 </ div >
556547 < div class ="concept-card ">
557- < h5 > Text-to-Text </ h5 >
558- < p > T5 treats all NLP tasks as text-to-text: input text goes in, output text comes out. Simple but powerful .</ p >
548+ < h5 > RoPE Positions </ h5 >
549+ < p > Rotary Position Embeddings encode position through rotation, enabling better length generalization .</ p >
559550 </ div >
560551 </ div >
561552 </ div >
562553
563554 < div class ="model-specs ">
564- < h4 > LaMini-Flan-T5-248M Specifications</ h4 >
555+ < h4 > Qwen3-1.7B Specifications</ h4 >
565556 < table class ="specs-table ">
566- < tr > < td > Parameters</ td > < td > 248 Million</ td > </ tr >
567- < tr > < td > Architecture</ td > < td > Encoder-Decoder Transformer (T5)</ td > </ tr >
568- < tr > < td > Layers</ td > < td > 12 encoder + 12 decoder</ td > </ tr >
569- < tr > < td > Hidden Size</ td > < td > 768</ td > </ tr >
570- < tr > < td > Attention Heads</ td > < td > 12</ td > </ tr >
571- < tr > < td > Training</ td > < td > Distilled from larger LLMs on 2.58M instruction samples</ td > </ tr >
572- < tr > < td > Year</ td > < td > 2023 (MBZUAI)</ td > </ tr >
557+ < tr > < td > Parameters</ td > < td > 1.7 Billion</ td > </ tr >
558+ < tr > < td > Architecture</ td > < td > Decoder-Only Transformer</ td > </ tr >
559+ < tr > < td > Layers</ td > < td > 28</ td > </ tr >
560+ < tr > < td > Hidden Size</ td > < td > 2048</ td > </ tr >
561+ < tr > < td > Attention Heads</ td > < td > 16 Q / 4 KV (GQA)</ td > </ tr >
562+ < tr > < td > Context Length</ td > < td > 65,536 tokens</ td > </ tr >
563+ < tr > < td > MMLU</ td > < td > 71.2</ td > </ tr >
564+ < tr > < td > HumanEval</ td > < td > 65.8%</ td > </ tr >
565+ < tr > < td > Training</ td > < td > 36T tokens (web, code, math, multilingual)</ td > </ tr >
566+ < tr > < td > Year</ td > < td > 2025 (Alibaba Qwen)</ td > </ tr >
573567 </ table >
574568 < div class ="model-note ">
575- < strong > Browser Optimized :</ strong > Runs entirely in your browser using Transformers.js. Falls back to DistilGPT2 if primary model fails .
569+ < strong > WebGPU Accelerated :</ strong > Runs in browser with GPU acceleration. Falls back to Qwen3-0.6B on older devices .
576570 </ div >
577571 </ div >
578572 </ div >
@@ -641,8 +635,8 @@ <h3>Evolution Stats</h3>
641635 < span class ="stat-value "> 90M params</ span >
642636 </ div >
643637 < div class ="stat-item ">
644- < span class ="stat-label "> LaMini-T5 (2023 )</ span >
645- < span class ="stat-value "> 248M params</ span >
638+ < span class ="stat-label "> Qwen3 (2025 )</ span >
639+ < span class ="stat-value "> 1.7B params</ span >
646640 </ div >
647641 < div class ="stat-item ">
648642 < span class ="stat-label "> Claude 4 Opus (2025)</ span >
0 commit comments