LLM Papers from Scratch

Introduction

This repository reimplements classic and cutting-edge (also interesting) Large Language Models (LLMs) and multimodal models from scratch, with hands-on tutorials and unit tests for each model. The goal is to help beginners understand key concepts of these models while learning by doing.

Models

ViT (Vision Transformer) - Applies the Transformer architecture to image patches for vision tasks.
CLIP (Contrastive Language-Image Pre-training) - Learns joint image-text representations using contrastive pretraining.
Stable Diffusion - Generates high-quality images from text prompts using Latent Diffusion Models (LDMs).
Qwen3MoE - Selectively routes each input to a subset of expert networks for faster, more efficient inference.
Z-Image -Generates images based on the Scalable Single-Stream Diffusion Transformer (S3-DiT), which processes text and image tokens together in one transformer, avoiding cross-attention.
Wan - An open video generative model built on diffusion transformers with spatio-temporal VAE.
DeepSeek-OCR - A vision‑based OCR model that compresses high‑resolution pages into compact vision tokens and decodes them to recover text with high precision, enabling efficient long‑context document understanding.
...

Tutorials and tests

Step-by-step tutorials and unit tests for each model are coming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
llm_papers		llm_papers
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Papers from Scratch

Introduction

Models

Tutorials and tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Papers from Scratch

Introduction

Models

Tutorials and tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages