nanoclip

🚫 Vibe Coding Disclaimer

Disclaimer
This repository was not created using “vibe coding.”

nanoclip

nanoclip is a minimalist, nano-scale implementation of the Contrastive Language-Image Pre-training (CLIP) model, built using PyTorch. This project aims to provide a clear and concise example of how to implement a dual-encoder model that learns to embed images and text into a shared, multimodal latent space.

Features:

Vision Encoder: A lightweight Vision Transformer (ViT) for processing image inputs.
Text Encoder: A compact Transformer model for encoding text inputs.
Contrastive Learning: Implements the core CLIP training objective to align image and text embeddings.
Configurable: Parameters for both vision and text models are easily adjustable via config.py.

Currently the project is in development, only one toy training run has been done on a remote 1xRTX 3600 on CIFAR10. More to come..

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
__pycache__		__pycache__
data		data
logs		logs
model		model
script		script
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
dataset.py		dataset.py
debug_dataset.py		debug_dataset.py
logo.png		logo.png
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚫 Vibe Coding Disclaimer

nanoclip

Features:

Training Plots:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚫 Vibe Coding Disclaimer

nanoclip

Features:

Training Plots:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages