📸 PicMatch: Your Visual Search Companion 🔍

title

PicMatch: Your Visual Search Companion

emoji

📷🔍

colorFrom

blue

colorTo

green

sdk

gradio

python_version

3.9

sdk_version

4.39.0

suggested_hardware

t4-small

suggested_storage

medium

app_file

app.py

fullWidth

true

header

mini

short_description

Search images using text or other images as queries.

models

wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M

Salesforce/blip-image-captioning-base

📸 PicMatch: Your Visual Search Companion 🔍

PicMatch lets you effortlessly search through your image archive using either a text description or another image as your query. Find those needle-in-a-haystack photos in a flash! ✨

Try PicMatch image search with 25,000 Unsplash images on this 🤗 Space

🚀 Getting Started:

Prerequisites: Ensure you have Python 3.9 or higher installed on your system. 🐍
Create a Virtual Environment:
```
python -m venv env
```
Activate the Environment:
```
source ./venv/bin/activate 
```

Install Dependencies:

python -m pip install -r requirements.txt

Start the App (with Sample Data):
```
python app.py
```
Open Your Browser: Head to localhost:7860 to access the PicMatch interface. 🌐

📂 Data: Organize Your Visual Treasures

Make sure you have the following folders in your project's root directory:

data
├── images   
└── features

🛠️ Image Pipeline: Download & Process with Speed ⚡

The engine/download_data.py Python script streamlines downloading and processing images from a list of URLs. It's designed for performance and reliability:

Async Operations: Uses asyncio for concurrent image downloading and processing. ⏩
Rate Limiting: Follows API usage rules to prevent blocks with a RateLimiter. 🚦
Parallel Resizing: Employs a ProcessPoolExecutor for fast image resizing. ⚙️
State Management: Saves progress in a JSON file so you can resume later. 💾

🗝️ Key Components:

ImagePipeline Class: Manages the entire pipeline, its state, and rate limiting. 🎛️
Functions: Handle URL feeding (url_feeder), downloading (image_downloader), and processing (image_processor). 📥
ImageSaver Class: Defines how images are processed and saved. 🖼️
resize_image Function: Ensures image resizing maintains the correct aspect ratio. 📏

🏃 How it Works:

Start: Configure the pipeline with your URL list, download limits, and rate settings.
Feed URLs: Asynchronously read URLs from your file.
Download: Download images concurrently while respecting rate limits.
Process: Save the original images and resize them in parallel.
Save State: Regularly save progress to avoid starting over if interrupted.

To get the sample data run the command

cd engine && python download_data.py

✨ Feature Creation: Making Your Images Searchable ✨

This step prepares your images for searching. We generate two types of embeddings:

Visual Embeddings (CLIP): Capture the visual content of your images. 👁️‍🗨️
Textual Embeddings: Create embeddings from image captions for text-based search. 💬

To generate these features run the command

cd engine && python generate_features.py

This process uses these awesome models from Hugging Face:

TinyCLIP: wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M
BLIP Image Captioning: Salesforce/blip-image-captioning-base
SentenceTransformer: all-MiniLM-L6-v2

⚡ Asynchronous Feature Extraction: Supercharge Your Process ⚡

This script extracts image features (both visual and textual) efficiently:

Asynchronous: Loads images, extracts features, and saves them concurrently. ⚡
Dual Embeddings: Creates both CLIP (visual) and caption (textual) embeddings. 🖼️📝
Checkpoints: Keeps track of progress and allows resuming from interruptions. 🔄
Parallel: Uses multiple CPU cores for feature extraction. ⚙️

📊 Vector Database Module: Milvus for Fast Search 🚤

This module connects to the Milvus vector database to store and search your image embeddings:

Milvus: A high-performance database built for handling vector data. 📊
Easy Interface: Provides a simple way to manage embeddings and perform searches. 🔍
Single Server: Ensures only one Milvus server is running for efficiency.
Indexing: Automatically creates an index to speed up your searches. 🚀
Similarity Search: Find the most similar images using cosine similarity. 💯

📚 References: The Brains Behind PicMatch 🧠

PicMatch leverages these incredible open-source projects:

TinyCLIP: The visual powerhouse for understanding your images.
- 👉 https://huggingface.co/wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M
Image Captioning: The wordsmith that describes your photos in detail.
- 👉 https://huggingface.co/Salesforce/blip-image-captioning-base
Sentence Transformers: Turns captions into embeddings for text-based search.
- 👉 https://sbert.net
Unsplash: Images used were taken from Unsplash's open source data
- 👉 https://github.com/unsplash/datasets

Let's give credit where credit is due! 🙌 These projects make PicMatch smarter and more capable.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
engine		engine
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
copy_images_features.py		copy_images_features.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📸 PicMatch: Your Visual Search Companion 🔍

🚀 Getting Started:

📂 Data: Organize Your Visual Treasures

🛠️ Image Pipeline: Download & Process with Speed ⚡

🗝️ Key Components:

🏃 How it Works:

✨ Feature Creation: Making Your Images Searchable ✨

⚡ Asynchronous Feature Extraction: Supercharge Your Process ⚡

📊 Vector Database Module: Milvus for Fast Search 🚤

📚 References: The Brains Behind PicMatch 🧠

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📸 PicMatch: Your Visual Search Companion 🔍

🚀 Getting Started:

📂 Data: Organize Your Visual Treasures

🛠️ Image Pipeline: Download & Process with Speed ⚡

🗝️ Key Components:

🏃 How it Works:

✨ Feature Creation: Making Your Images Searchable ✨

⚡ Asynchronous Feature Extraction: Supercharge Your Process ⚡

📊 Vector Database Module: Milvus for Fast Search 🚤

📚 References: The Brains Behind PicMatch 🧠

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages