This project implements semantic search functionality for film analysis using OpenAI embeddings and FAISS vector store.
The system allows for semantic searching through a film database using embeddings generated from academic value descriptions. It utilizes OpenAI's text-embedding-3-large model for generating embeddings and FAISS for efficient similarity search.
- Python 3.8+
- OpenAI API key
- Clone the repository:
git clone [your-repository-url]
cd [repository-name]- Install required packages:
pip install -r requirements.txt- Set up your OpenAI API key:
export OPENAI_API_KEY='your-api-key-here'.
├── data/
│ └── fp_enrichment_14films_09-2024_with_academic_categories_enriched.csv
├── notebooks/
│ └── film_embeddings.ipynb
├── main.py
├── stoper.py
├── requirements.txt
└── README.md
- Load and process film data from CSV
- Utilize pre-computed embeddings for efficient search
- Perform semantic similarity search using FAISS
- Performance timing using custom Stoper class
- Jupyter notebook for interactive analysis
Run the main script:
python main.pyStart Jupyter notebook:
jupyter notebookNavigate to notebooks/film_embeddings.ipynb for interactive analysis.
The CSV file should contain the following columns:
film_id: Unique identifier for each filmfilm_name: Name of the filmacademic_value: Text description of the film's academic valueacademic_value_embedding: Pre-computed embeddings for the academic value text
The project includes a custom Stoper class for performance monitoring:
- Tracks execution time between different stages
- Provides detailed timing information
- Helps identify potential bottlenecks
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
# Create vector store
vector_store = FAISS.from_embeddings(
text_embeddings=list(zip(texts, text_embeddings)),
metadatas=metadata,
embedding=embeddings
)
# Perform similarity search
results = vector_store.similarity_search_by_vector(query_embedding, k=8)pandas
langchain
langchain-openai
faiss-cpu
jupyter
notebook
openai
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request