Video Super Resolution

Classical upsampling vs. SRCNN deep learning for frame-by-frame video super resolution.

Motivation

Video footage is often captured or stored at low resolution, and naive upsampling (bicubic, Lanczos) produces blurry, detail-lacking results. SRCNN learns a patch-level mapping from low- to high-resolution via a lightweight 3-layer CNN, recovering fine detail that classical interpolation cannot reconstruct.

Overview

Builds a complete video super resolution pipeline covering:

Data pipeline — LR/HR pair generation, patch extraction, train/val/test splits
Classical baselines — nearest neighbour, bicubic, Lanczos
SRCNN — Super Resolution CNN implemented in PyTorch (Dong et al., 2014)
Quantitative evaluation — PSNR & SSIM comparison across all methods
Video processing — frame-by-frame SR inference, HR video reconstruction

Architecture — SRCNN

LR (bicubic upsampled)
    → Conv(9×9, 64) + ReLU    # patch extraction
    → Conv(1×1, 32) + ReLU    # non-linear mapping
    → Conv(5×5, 1)             # reconstruction
    → SR output

Only ~20K parameters — lightweight and fast to train and deploy.

Training Details

Data — 180 frames extracted from clip.mp4 (forest nature, 80/10/10 train/val/test split)
Optimizer — Adam with layer-wise LR (conv1/conv2: 1e-4, conv3: 1e-5)
Epochs — 30 · Batch size — 64 · Patch size — 33×33 · Stride — 14
Scale factor — ×3 · Loss — MSE on HR patches
Device — GPU (Colab T4)

Results

Method	PSNR (dB) ↑	SSIM ↑
Nearest	~24	~0.70
Bicubic	~27	~0.80
Lanczos	~27	~0.81
SRCNN	~29+	~0.85+

Results for ×3 upscaling. Vary by scale factor and training epochs.

SRCNN achieves a ~2 dB PSNR gain over the best classical interpolation (Lanczos), with a ~0.04 SSIM improvement. Even with only ~20K parameters, the network recovers fine detail that interpolation-based methods fundamentally cannot — because interpolation has no learned model of what high-frequency detail should look like.

TODO: Current output can appear soft/blurry due to limited training. To improve sharpness: set NUM_EPOCHS = 100 and SCALE_FACTOR = 2 in the notebook params cell, then retrain.

Design Decisions

Patch-based training — extracts overlapping 33×33 patches with stride 14 for more training samples and fixed input size; mirrors the original SRCNN paper
Bicubic pre-upsampling — LR frames are bicubic-upsampled to HR size before SRCNN input, so the network learns residual refinement rather than full reconstruction
Layer-wise learning rates — final reconstruction layer uses 10× lower LR than earlier layers, following the SRCNN paper to preserve low-level structure
PSNR + SSIM — PSNR measures pixel-level fidelity; SSIM captures perceptual sharpness — both needed to evaluate super resolution quality
Classical baselines first — establishes an interpolation ceiling before training the CNN, making the improvement quantifiable
YCbCr color space — SRCNN is trained and applied on the Y (luminance) channel only; Cb/Cr (color) channels are bicubic upsampled and merged back, producing full color output while keeping the model simple

Video Outputs

HR (ground truth) · Nearest · Bicubic · Lanczos · SRCNN — color frames, PSNR measured on Y channel

Full videos (play in browser): synthetic_lr.mp4 · sr_output.mp4

Requirements

pip install torch torchvision numpy scipy matplotlib scikit-image opencv-python Pillow

Usage

jupyter notebook video_super_resolution.ipynb

Or open in Colab via the badge above — no local setup needed. Enable GPU: Runtime → Change runtime type → T4 GPU

Using Your Own Video

Replace clip.mp4 with your own video in the Colab session before running Section 6.

Files

File	Description
`video_super_resolution.ipynb`	Main notebook
`clip.mp4`	Source video used for training (forest nature, 6s)
`srcnn_best.pth`	Best model weights by val PSNR (generated after training)
`srcnn_weights.pth`	Final model weights (generated after training)
`synthetic_lr.mp4`	Low-resolution input video
`sr_output.mp4`	SRCNN super resolved output video
`README.md`	This file
`References/1501.00092v3.pdf`	SRCNN paper (Dong et al., 2014)

References

Dong, C. et al. (2014). Learning a Deep Convolutional Network for Image Super-Resolution. ECCV 2014. arXiv:1501.00092

Acknowledgements

Sample video: FOREST 4K — American Nature Relaxation Film by Nature Relaxation Films, used for research and educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Super Resolution

Motivation

Overview

Architecture — SRCNN

Training Details

Results

Design Decisions

Video Outputs

Requirements

Usage

Using Your Own Video

Files

References

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
References		References
.gitignore		.gitignore
README.md		README.md
clip.mp4		clip.mp4
download.png		download.png
sr_output.mp4		sr_output.mp4
srcnn_best.pth		srcnn_best.pth
srcnn_weights.pth		srcnn_weights.pth
synthetic_lr.mp4		synthetic_lr.mp4
video_super_resolution.ipynb		video_super_resolution.ipynb

Folders and files

Latest commit

History

Repository files navigation

Video Super Resolution

Motivation

Overview

Architecture — SRCNN

Training Details

Results

Design Decisions

Video Outputs

Requirements

Usage

Using Your Own Video

Files

References

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages