Visual Search System

An end-to-end AI-powered visual search engine built with modern MLOps practices

🎯 Overview

Visual Search System is a production-ready machine learning application that enables reverse image search capabilities. Upload an image, and the system retrieves visually similar images from a dataset using deep learning embeddings and efficient similarity search.

Built on Fashion-MNIST dataset (easily adaptable to any image collection), this project demonstrates:

Deep metric learning with triplet loss
GPU-accelerated similarity search using FAISS
Scalable REST API with FastAPI
Complete MLOps pipeline (MLflow, DVC, Kedro)
Modern software engineering best practices

✨ Features

Feature	Description
Visual Search	Find similar images using CNN embeddings
Fast Inference	Sub-second search with GPU-accelerated FAISS
Metric Learning	Fine-tuned ResNet50 with triplet loss
Comprehensive Metrics	Precision@K, Recall@K, mAP evaluation
MLOps Ready	MLflow tracking, DVC versioning, Kedro pipelines
REST API	Production-ready FastAPI service
Docker Compatible	Easy containerization for cloud deployment
Modular Design	Cookiecutter Data Science structure

🏗️ Architecture

System Flow

graph LR
    A[User Image] --> B[FastAPI Server]
    B --> C[VisualSearchService]
    C --> D[EmbeddingExtractor<br/>ResNet50 + Projection]
    D --> E[128-dim Vector]
    E --> F[FAISS Index<br/>60K Embeddings]
    F --> G[Top-K Results]
    G --> H[JSON Response]

Key Components

EmbeddingExtractor: ResNet50 backbone with custom projection head
TripletLossModel: Hard negative mining for robust embeddings
FeatureStore: FAISS-based similarity search with metadata
FastAPI: Asynchronous request handling

🚀 Quick Start

Get the system running in 5 minutes:

# 1. Clone repository
git clone https://github.com/HGSChandeepa/visual-search-ml
cd visual-search-ml

# 2. Create environment
make create_environment
conda activate visual_search_env

# 3. Install dependencies
make install

# 4. Run complete pipeline
make all

# 5. Start API server
make api

API will be available at http://localhost:8000

📦 Installation

Prerequisites

Python 3.8+
CUDA 11.8+ (recommended for GPU acceleration)
Conda package manager

Setup Steps

Option 1: Automated (Recommended)

# Creates environment, installs dependencies, runs pipeline
make create_environment
make env
make install
conda activate visual_search_env
make install
make all

Option 2: Manual Installation

# Create conda environment
conda env create -f conda_env.yml
conda activate visual_search_env

# Install Python packages
pip install -r requirements.txt

# Run pipeline steps manually
python src/data/make_dataset.py
python src/features/build_features.py
python src/models/train_model.py
python src/evaluation/evaluate.py

Option 3: Docker

# Build image
docker build -t visual-search .

# Run container
docker run -p 8000:8000 visual-search

🔧 Configuration

All settings are centralized in config.yaml:

# Model Settings
model:
  backbone: "resnet50"      # Options: resnet50, efficientnet_b0, clip
  embedding_dim: 128        # Dimension of embedding vectors
  freeze_backbone: false    # Fine-tune full network or just head

# Training Hyperparameters
training:
  batch_size: 256
  epochs: 10
  learning_rate: 0.001
  margin: 0.5               # Triplet loss margin

# API Settings
api:
  host: "0.0.0.0"
  port: 8000
  top_k: 10                 # Default number of results

Override settings via environment variables or command line:

# Environment variable
export BATCH_SIZE=512

# Command line (via Hydra)
python src/models/train_model.py training.batch_size=512 model.embedding_dim=256

🎯 Usage Examples

1. Run Complete Pipeline

# Execute all steps: data → features → train → evaluate
make all

# Individual stages
make data        # Download and prepare dataset
make features    # Extract embeddings and build FAISS index
make train       # Fine-tune model with triplet loss
make evaluate    # Compute metrics and generate visualizations

2. Start API Server

# Development mode with auto-reload
make api

# Or manual command
uvicorn src.deployment.api:app --reload --host 0.0.0.0 --port 8000

# Production mode (multiple workers)
uvicorn src.deployment.api:app --workers 4 --host 0.0.0.0 --port 8000

3. Perform Visual Search

Python Client:

import requests

url = "http://localhost:8000/search"
with open("query_image.jpg", "rb") as f:
    response = requests.post(url + "?top_k=5", files={"file": f})

results = response.json()
for item in results["results"]:
    print(f"Index: {item['index']}, Distance: {item['distance']:.3f}")

cURL:

curl -X POST "http://localhost:8000/search?top_k=5" \
  -F "file=@query_image.jpg"

JavaScript:

const formData = new FormData();
formData.append("file", fileInput.files[0]);

fetch("http://localhost:8000/search?top_k=5", {
  method: "POST",
  body: formData
})
.then(response => response.json())
.then(data => console.log(data.results));

📖 API Reference

Endpoints

`GET /health`

Health check endpoint.

curl http://localhost:8000/health

Response:

{"status": "healthy", "service": "visual_search"}

`POST /search`

Search for visually similar images.

Content-Type: multipart/form-data
Parameters: top_k (optional, default=10)
Body: Image file

Example Request:

curl -X POST "http://localhost:8000/search?top_k=5" \
  -F "file=@shirt.jpg"

Response:

{
  "status": "success",
  "query_filename": "shirt.jpg",
  "results": [
    {
      "index": 12345,
      "distance": 0.892,
      "label": 0
    },
    {
      "index": 67890,
      "distance": 0.845,
      "label": 0
    }
  ]
}

Error Responses:

400 Bad Request: Invalid file type
500 Internal Server Error: Model or search error
503 Service Unavailable: Service not initialized

🔄 MLOps Pipeline

MLflow Experiment Tracking

# View experiments
mlflow ui

# Runs are automatically logged during training
# Track metrics, parameters, and model artifacts

Logged artifacts:

Training/validation loss curves
Model checkpoint (best_model.pt)
Hyperparameters and config
Evaluation metrics (precision@k, mAP)

DVC Data Versioning

# Track data and model files
dvc add data/raw/
dvc add models/faiss_index

# Push to remote storage (S3, GCS, Azure)
dvc remote add -d storage s3://my-bucket/visual-search
dvc push

# Reproduce pipeline
dvc repro

Kedro Pipeline Execution

# Run complete Kedro pipeline
kedro run

# Visualize pipeline
kedro viz

🧪 Testing

# Run all tests
make test

# Or with pytest directly
pytest tests/ -v

# Test coverage
pytest tests/ --cov=src --cov-report=html

Test categories:

tests/test_features.py: Embedding extraction, FAISS search
tests/test_models.py: Model loading, inference
tests/test_api.py: API endpoints (when running)

📊 Performance Benchmarks

Metric	Value	Notes
Precision@1	85%	Top result accuracy
Precision@5	72%	3-4 relevant in top-5
Search Latency	~100ms	GPU (RTX 3080)
Search Latency	~500ms	CPU (16 cores)
Throughput	100+ QPS	GPU, 4 workers
Memory Usage	~120MB	For 60K embeddings
Model Size	~100MB	ResNet50 + projection

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Contribution Workflow

# 1. Fork repository
# 2. Create feature branch
git checkout -b feature/amazing-feature

# 3. Make changes and test
make test

# 4. Commit changes
git commit -m "Add amazing feature"

# 5. Push to fork and create PR
git push origin feature/amazing-feature

🛣️ Roadmap

Support for billion-scale datasets (distributed FAISS)
Text-to-image search using CLIP embeddings
Real-time index updates via message queue
A/B testing framework for models
Kubernetes deployment manifests
Frontend demo application
Support for video search

📚 Additional Resources

Documentation: Full Workflow Guide
Demo Notebook: notebooks/demo.ipynb (interactive examples)
API Docs: Run server and visit http://localhost:8000/docs
Fashion-MNIST Dataset: https://github.com/zalandoresearch/fashion-mnist

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Facebook AI Research for FAISS
PyTorch team for excellent deep learning framework
FastAPI team for modern web framework
DrivenData for Cookiecutter Data Science structure

Built with ❤️ for the ML community

⭐ Star this repo if you found it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
runs/Nov08_12-29-41_Samin		runs/Nov08_12-29-41_Samin
src		src
test		test
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
dvc.yaml		dvc.yaml
notes.md		notes.md
notes.pdf		notes.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Search System

🎯 Overview

✨ Features

🏗️ Architecture

System Flow

Key Components

🚀 Quick Start

📦 Installation

Prerequisites

Setup Steps

🔧 Configuration

🎯 Usage Examples

1. Run Complete Pipeline

2. Start API Server

3. Perform Visual Search

📖 API Reference

Endpoints

`GET /health`

`POST /search`

🔄 MLOps Pipeline

MLflow Experiment Tracking

DVC Data Versioning

Kedro Pipeline Execution

🧪 Testing

📊 Performance Benchmarks

🤝 Contributing

Quick Contribution Workflow

🛣️ Roadmap

📚 Additional Resources

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

HGSChandeepa/visual-search-ml

Folders and files

Latest commit

History

Repository files navigation

Visual Search System

🎯 Overview

✨ Features

🏗️ Architecture

System Flow

Key Components

🚀 Quick Start

📦 Installation

Prerequisites

Setup Steps

🔧 Configuration

🎯 Usage Examples

1. Run Complete Pipeline

2. Start API Server

3. Perform Visual Search

📖 API Reference

Endpoints

GET /health

POST /search

🔄 MLOps Pipeline

MLflow Experiment Tracking

DVC Data Versioning

Kedro Pipeline Execution

🧪 Testing

📊 Performance Benchmarks

🤝 Contributing

Quick Contribution Workflow

🛣️ Roadmap

📚 Additional Resources

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /health`

`POST /search`

Packages