Information Retrieval Project - Multimodal Search Engine

This project implements a multimodal search engine for images and text, developed as part of the Information Retrieval course at Innopolis University. The system allows users to search for images using text queries through different search methodologies.

Project Overview

The project implements three main search approaches:

K-gram Index with TF-IDF: A text-based search engine that breaks down text into k-grams and uses TF-IDF scoring to match queries to captions.
Dense Vector Search: Uses neural embeddings to encode both text and images into the same vector space, allowing for semantic search.
Image Segmentation Pipeline: Segments images and generates descriptions for specific parts of images, enabling more precise and localized search.

Project Structure

The project is organized into several key components:

1. K-gram Index

Located in the kgram_index/ directory
Implements a k-gram based index with TF-IDF scoring
Supports flexible k values and wildcard searches
Main files:
- build_index.py: Core implementation of the k-gram index
- test_index.ipynb: Notebook for testing the index

2. Dense Vector Search

Located in two directories:
- dense_index/: Initial implementation
- dense_index_v2/: Improved version with optimizations
Uses embeddings from neural models (JINA-CLIP and ColQwen) to encode queries and images
Supports different index types:
- FAISS index for fast vector search
- Ball Tree index for nearest neighbor search
Main files:
- demo.py: Interactive demo application
- faiss_index.py: FAISS index implementation
- ball_tree.py: Ball Tree index implementation
- colqwen_emb.py/siglig_embeddings.py: Embedding generation

3. Segmentation Pipeline

Located in the segmentation_pipeline/ directory
Segments images and generates descriptions for specific regions
Creates a search index for these localized descriptions
Main files:
- demo.py: Interactive demo application
- mask_images.py: Image segmentation implementation
- generate_descriptions.py: Description generation for segments
- embed_data.py: Embedding generation for segments

4. Demo Applications

Located in the demo/ directory
Streamlit-based web interfaces for the search engines
Allows interactive querying and result visualization

Dataset

The project uses the DCI (Densely Captioned Images) dataset from Meta, which provides images with detailed captions. The dataset usage and processing code is referenced from Meta's implementation.

Getting Started

Prerequisites

Python 3.10+
Required packages (see REPRODUCE.md for detailed setup)

Setup

Follow the instructions in REPRODUCE.md to set up the environment and download the dataset“

Important remark: for prompt refinement you need to have ollama with gemma3:4b installed and running.

ollama serve gemma3:4b

Technologies Used

FAISS: For efficient similarity search
PyTorch: For neural network models
Streamlit: For interactive demo interfaces
Transformer models: JINA-CLIP and ColQwen for text/image embeddings
Ollama: For AI-assisted query refinement

Research Contributions

This project explores different approaches to multimodal search and compares their effectiveness:

Traditional text search with k-grams and TF-IDF
Neural embedding-based search with different models
Segmentation-based search for more localized results

The implementation demonstrates how these approaches can be combined to create a comprehensive search engine for images and text.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
DCI		DCI
demo		demo
dense_index		dense_index
dense_index_v2		dense_index_v2
kgram_index		kgram_index
segmentation_pipeline		segmentation_pipeline
test_data_DCI		test_data_DCI
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
REPRODUCE.md		REPRODUCE.md
environment.yml		environment.yml
parse.py		parse.py
tech_report.md		tech_report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Information Retrieval Project - Multimodal Search Engine

Project Overview

Project Structure

1. K-gram Index

2. Dense Vector Search

3. Segmentation Pipeline

4. Demo Applications

Dataset

Getting Started

Prerequisites

Setup

Technologies Used

Research Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

VladimirZelenokor1/Information-Retrieval-Project---Multimodal-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval Project - Multimodal Search Engine

Project Overview

Project Structure

1. K-gram Index

2. Dense Vector Search

3. Segmentation Pipeline

4. Demo Applications

Dataset

Getting Started

Prerequisites

Setup

Technologies Used

Research Contributions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages