Skip to content

ztzhu1/LLM-papers-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

LLM Papers from Scratch

Introduction

This repository reimplements classic and cutting-edge (also interesting) Large Language Models (LLMs) and multimodal models from scratch, with hands-on tutorials and unit tests for each model. The goal is to help beginners understand key concepts of these models while learning by doing.

Models

  • ViT (Vision Transformer) - Applies the Transformer architecture to image patches for vision tasks.
  • CLIP (Contrastive Language-Image Pre-training) - Learns joint image-text representations using contrastive pretraining.
  • Stable Diffusion - Generates high-quality images from text prompts using Latent Diffusion Models (LDMs).
  • Qwen3MoE - Selectively routes each input to a subset of expert networks for faster, more efficient inference.
  • Z-Image -Generates images based on the Scalable Single-Stream Diffusion Transformer (S3-DiT), which processes text and image tokens together in one transformer, avoiding cross-attention.
  • Wan - An open video generative model built on diffusion transformers with spatio-temporal VAE.
  • DeepSeek-OCR - A vision‑based OCR model that compresses high‑resolution pages into compact vision tokens and decodes them to recover text with high precision, enabling efficient long‑context document understanding.
  • ...

Tutorials and tests

Step-by-step tutorials and unit tests for each model are coming soon.

About

Walk through classic LLM papers step by step: learn the core model details, implement key components by yourself, and validate your work with unit tests.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors