FAISS - Embeddings Analysis

This project implements semantic search functionality for film analysis using OpenAI embeddings and FAISS vector store.

Overview

The system allows for semantic searching through a film database using embeddings generated from academic value descriptions. It utilizes OpenAI's text-embedding-3-large model for generating embeddings and FAISS for efficient similarity search.

Prerequisites

Python 3.8+
OpenAI API key

Installation

Clone the repository:

git clone [your-repository-url]
cd [repository-name]

Install required packages:

pip install -r requirements.txt

Set up your OpenAI API key:

export OPENAI_API_KEY='your-api-key-here'

Project Structure

.
├── data/
│   └── fp_enrichment_14films_09-2024_with_academic_categories_enriched.csv
├── notebooks/
│   └── film_embeddings.ipynb
├── main.py
├── stoper.py
├── requirements.txt
└── README.md

Features

Load and process film data from CSV
Utilize pre-computed embeddings for efficient search
Perform semantic similarity search using FAISS
Performance timing using custom Stoper class
Jupyter notebook for interactive analysis

Usage

Command Line Interface

Run the main script:

python main.py

Jupyter Notebook

Start Jupyter notebook:

jupyter notebook

Navigate to notebooks/film_embeddings.ipynb for interactive analysis.

Data Format

The CSV file should contain the following columns:

film_id: Unique identifier for each film
film_name: Name of the film
academic_value: Text description of the film's academic value
academic_value_embedding: Pre-computed embeddings for the academic value text

Performance Monitoring

The project includes a custom Stoper class for performance monitoring:

Tracks execution time between different stages
Provides detailed timing information
Helps identify potential bottlenecks

Example Usage

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Create vector store
vector_store = FAISS.from_embeddings(
    text_embeddings=list(zip(texts, text_embeddings)),
    metadatas=metadata,
    embedding=embeddings
)

# Perform similarity search
results = vector_store.similarity_search_by_vector(query_embedding, k=8)

Dependencies

pandas
langchain
langchain-openai
faiss-cpu
jupyter
notebook
openai

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAISS - Embeddings Analysis

Overview

Prerequisites

Installation

Project Structure

Features

Usage

Command Line Interface

Jupyter Notebook

Data Format

Performance Monitoring

Example Usage

Dependencies

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
benchmark_csv_faiss-index.py		benchmark_csv_faiss-index.py
main.py		main.py
main_no_db.py		main_no_db.py
requirements.txt		requirements.txt
stoper.py		stoper.py

Folders and files

Latest commit

History

Repository files navigation

FAISS - Embeddings Analysis

Overview

Prerequisites

Installation

Project Structure

Features

Usage

Command Line Interface

Jupyter Notebook

Data Format

Performance Monitoring

Example Usage

Dependencies

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages