Skip to content

bskkimm/Paper-Implementation-Series

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 

Repository files navigation

πŸ“š From Paper to Code: Implementations of Key Computer Vision Papers from Scratch

Welcome to my computer vision model implementation series!
This repository serves as a central hub for my Computer vision paper implementation series of legendary computer vision model.


πŸš€ Introduction

This repository briefly introduces my Computer vision paper implementation series, where I faithfully reimplement influential papers to gain hands-on, in-depth understanding of the techniques driving modern visual AI systems.


🎯 Motivation

The purpose of this project is twofold:

  1. Develop Practical Proficiency: To gain end-to-end experience in building deep learning pipelines β€” from data preparation to training, inference, and deployment β€” in the domain of computer vision.
  2. Build Foundational Understanding: To deeply understand the core architectural paradigms of modern vision models, such as Convolutional Neural Networks (CNNs) and Transformers.

πŸ” Workflow

Each implementation follows a consistent and rigorous pipeline:

image

  1. Paper Reading & Understanding
    Analyze the original research paper to understand the theoretical backbone of the model.

  2. Data Preparation
    Prepare the dataset according to the specific requirements of the model.

  3. Model Building
    Implement the architecture from scratch using PyTorch.

  4. Training & Experimentation
    Train the model, tune hyperparameters, and analyze performance metrics.

  5. Inference & Deployment
    Evaluate the model with test samples and explore basic deployment options.


🧠 Implemented Models

πŸ“Œ YOLOv1 – You Only Look Once (v1)

πŸ”— GitHub Repository | Medium Article

YOLOv1 is a real-time object detection model that reframes detection as a single regression problem, directly predicting class probabilities and bounding boxes from full images in one evaluation. It introduced a new paradigm compared to two-stage detectors like R-CNN.


πŸ“Œ ViT – Vision Transformer

πŸ”— GitHub Repository | Medium Article

ViT replaces convolutional layers with pure transformer blocks. It splits the image into patches, flattens them, and feeds them into a standard transformer architecture, achieving state-of-the-art performance with sufficient data.


πŸ“Œ DETR – DEtection TRansformer

πŸ”— GitHub Repository | Medium Article

DETR is a novel object detection model that unifies detection with transformers and bipartite matching. It eliminates the need for many hand-crafted components like anchor boxes or NMS, making detection elegantly simple.


🀝 Contribution

This is a solo learning project, but feel free to:

  • Star ⭐ the repos
  • Open issues for suggestions
  • Fork and experiment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published