Automated Identity Matching using Multimodal LLMs

This repository contains the code and resources for the project, "Towards Automated Identity Matching from Facial Images via Multimodal Large Language Models."

This project introduces a novel framework that goes beyond traditional facial recognition. Instead of just matching faces, it ascertains a person's real-world identity by synergizing computer vision techniques with the advanced reasoning capabilities of Multimodal Large Language Models (LLMs).

📜 Abstract

The proliferation of digital imagery has created significant challenges in automated identity verification. Traditional systems excel at matching images but lack the contextual understanding to determine a person's identity. This project proposes a novel pipeline that uses a facial image to perform a reverse image search, gathers a corpus of web pages, and then leverages a powerful LLM (Gemma) to analyze the textual and semantic content of these pages to confidently determine the individual's name, role, and background.

✨ Key Features

Image Preprocessing: Cleans and prepares input images using Gaussian and Median filters for denoising.
Background Segmentation: Employs a pre-trained U-Net to isolate the human subject from the background, reducing noise and focusing the analysis.
High-Fidelity Face Extraction: Uses a Multi-task Cascaded Convolutional Network (MTCNN) for robust, multi-face detection and alignment.
AI-Powered Super-Resolution: Enhances the resolution of extracted faces using a Super-Resolution Generative Adversarial Network (SRGAN) to ensure a high-quality query image.
Web-Scale Information Retrieval: Submits the enhanced facial image to a reverse image search engine to gather relevant URLs.
LLM-Powered Analysis: Leverages the Gemma multimodal model to analyze full-page screenshots of retrieved URLs, understanding text, images, and layout contextually.
Structured Data Output: Synthesizes the findings into a structured JSON object containing the person's name, affiliations, a background summary, and the source URLs for transparency.

🚀 How It Works

The system operates as a multi-stage pipeline, systematically processing an input image to produce a confident identity claim.

Stage 1: Preprocessing: The input image is enhanced and denoised. The background is removed using a U-Net to isolate the subject.
Stage 2: Face Extraction: An MTCNN detects and crops faces from the image. The resulting low-resolution crop is then upscaled using an SRGAN to improve clarity.
Stage 3: Information Retrieval: The high-resolution face is used as a query in a reverse image search to collect a list of web pages where the image appears.
Stage 4: LLM Analysis: The system captures screenshots of the top search results and feeds them to the Gemma LLM, which analyzes the content—including headlines, captions, and text—to deduce the person's identity.
Output: The final result is a structured JSON object containing the identified name, roles, a brief background, and source URLs.

💻 Technologies Used

AI / ML
- Gemma: For multimodal analysis of web content.
- U-Net: For background-foreground segmentation.
- MTCNN (Multi-task Cascaded Convolutional Network): For high-accuracy face detection.
- SRGAN (Super-Resolution Generative Adversarial Network): For enhancing the quality of cropped facial images.
Computer Vision
- OpenCV (implied): Used for classic image processing filters like Gaussian Blur and Median Filter.
Core Stack
- Python: The primary programming language for the pipeline.
- Reverse Image Search API (e.g., Google Images): For web-scale information retrieval.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Final_Results		Final_Results
Images		Images
src		src
DIP_Project_Final.ipynb		DIP_Project_Final.ipynb
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Identity Matching using Multimodal LLMs

📜 Abstract

✨ Key Features

🚀 How It Works

💻 Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated Identity Matching using Multimodal LLMs

📜 Abstract

✨ Key Features

🚀 How It Works

💻 Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages