📚 From Paper to Code: Implementations of Key Computer Vision Papers from Scratch

Welcome to my computer vision model implementation series!
This repository serves as a central hub for my Computer vision paper implementation series of legendary computer vision model.

🚀 Introduction

This repository briefly introduces my Computer vision paper implementation series, where I faithfully reimplement influential papers to gain hands-on, in-depth understanding of the techniques driving modern visual AI systems.

🎯 Motivation

The purpose of this project is twofold:

Develop Practical Proficiency: To gain end-to-end experience in building deep learning pipelines — from data preparation to training, inference, and deployment — in the domain of computer vision.
Build Foundational Understanding: To deeply understand the core architectural paradigms of modern vision models, such as Convolutional Neural Networks (CNNs) and Transformers.

🔁 Workflow

Each implementation follows a consistent and rigorous pipeline:

Paper Reading & Understanding
Analyze the original research paper to understand the theoretical backbone of the model.
Data Preparation
Prepare the dataset according to the specific requirements of the model.
Model Building
Implement the architecture from scratch using PyTorch.
Training & Experimentation
Train the model, tune hyperparameters, and analyze performance metrics.
Inference & Deployment
Evaluate the model with test samples and explore basic deployment options.

🧠 Implemented Models

📌 YOLOv1 – You Only Look Once (v1)

🔗 GitHub Repository | Medium Article

YOLOv1 is a real-time object detection model that reframes detection as a single regression problem, directly predicting class probabilities and bounding boxes from full images in one evaluation. It introduced a new paradigm compared to two-stage detectors like R-CNN.

📌 ViT – Vision Transformer

🔗 GitHub Repository | Medium Article

ViT replaces convolutional layers with pure transformer blocks. It splits the image into patches, flattens them, and feeds them into a standard transformer architecture, achieving state-of-the-art performance with sufficient data.

📌 DETR – DEtection TRansformer

🔗 GitHub Repository | Medium Article

DETR is a novel object detection model that unifies detection with transformers and bipartite matching. It eliminates the need for many hand-crafted components like anchor boxes or NMS, making detection elegantly simple.

🤝 Contribution

This is a solo learning project, but feel free to:

Star ⭐ the repos
Open issues for suggestions
Fork and experiment

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 From Paper to Code: Implementations of Key Computer Vision Papers from Scratch

🚀 Introduction

🎯 Motivation

🔁 Workflow

🧠 Implemented Models

📌 YOLOv1 – You Only Look Once (v1)

📌 ViT – Vision Transformer

📌 DETR – DEtection TRansformer

🤝 Contribution

About

Uh oh!

Releases

Packages

bskkimm/Paper-Implementation-Series

Folders and files

Latest commit

History

Repository files navigation

📚 From Paper to Code: Implementations of Key Computer Vision Papers from Scratch

🚀 Introduction

🎯 Motivation

🔁 Workflow

🧠 Implemented Models

📌 YOLOv1 – You Only Look Once (v1)

📌 ViT – Vision Transformer

📌 DETR – DEtection TRansformer

🤝 Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages