This project implements a Retrieval-Augmented Generation (RAG) application that leverages both Knowledge Graphs and Vector Search for improved performance in generating accurate and contextually appropriate responses. Using Wikipedia articles, the application creates a Knowledge Graph and a vector search database to retrieve relevant information and generates responses using LLM.
-
Data Collection:
- Process 100,000 Wikipedia articles.
- Use these articles to build a Knowledge Graph and vector database.
-
Knowledge Graph Integration:
- Enhance retrieval using structured data from Knowledge Graphs using Neo4j.
- Use query mechanisms to extract meaningful relationships and concepts.
-
Vector Search Implementation:
- Generate embeddings for text data using pre-trained models with Pinecone.
- Use vector search libraries for efficient retrieval.
-
RAG Pipeline:
- Combine vector search and Knowledge Graphs for enhanced retrieval.
- Use an LLM to generate contextually accurate responses.
-
Evaluation:
- Assess response quality based on accuracy, relevance, and contextual fit.
- Optimize retrieval and generation processes.
Ensure you have the following libraries installed before running the project:
-
Data Processing:
pandaspyarrowdatasetsspacy
-
Embedding Generation:
transformerstorchtiktoken
-
Vector Search:
pinecone-client
-
Knowledge Graph Integration:
neo4jtqdm
-
Utilities:
langchainre
- Clone the repository:
git clone https://github.com/your-repo-name.git cd your-repo-name
- Paola Munoz
- Hilary Wang
- Rohan Vuppala