Project completed by:
- Ansh Harjai (ah7163)
- Ritvik Vasantha Kumar (rv2459)
This repository contains all the necessary files to run the Retrieval-Augmented Generation (RAG) project.
In this project, we fine-tuned the Llama-3-8b-instruct model on Google Colab using a custom dataset stored in data.json. The fine-tuned model was saved to Hugging Face for potential use (link for the fine-tuned model: https://huggingface.co/Data-harjai/ai_project_fine_tuned_llama). However, due to performance constraints when running the fine-tuned model locally, we implemented the RAG pipeline using the Llama-3-70b-chat-hf model, hosted online on Together.ai.
A video demonstration of the Gradio app has been uploaded to this repository, showcasing the interactions with the Large Language Model (LLM), where messages and contextual information are sent, and the model responds accordingly.
- The source data is extracted from the ROS 2 Documentation GitHub Repository.
- The repository's text files are cleaned and stored in MongoDB.
- Coding files are also stored in MongoDB without any preprocessing. All other non-text files in the repository are ignored.
- Each file is saved in MongoDB along with metadata such as:
file_nameurlrepo_name
- The extracted data is divided into smaller chunks.
- Each chunk is converted into a 300-dimensional embedding vector.
- These embeddings, along with their associated payloads, are stored in the Qdrant Vector Database.
- When a user submits a query, the following steps are performed:
- Retrieve the 5 most similar embeddings from Qdrant based on the query.
- Create a prompt consisting of:
- A system message
- The user query
- The retrieved context from Qdrant.
- The constructed prompt is sent to the LLM.
- The LLM processes the prompt and generates a response.
- The response is displayed to the user via the Gradio interface.
-
data.json
Contains the custom dataset used for fine-tuning the model. -
fine_tuning_notebook.ipynb
The Python notebook used for fine-tuning the Llama-3-8b-instruct model. -
fine_tuning_notebook.ipynb
The Python notebook used for fine-tuning the Llama-3-8b-instruct model. -
Embedding_demo.ipynb
The Python notebook used to demonstrate the embedding process. -
Extraction_demo.ipynb
The Python notebook used to demonstrate the data extraction process. -
llm_connection.ipynb
The Python notebook used to demonstrate retrieval and response from the llm. -
Gradio App Video
A video showcasing the working of the Gradio interface for user interactions with the RAG system. -
Screenshots
This folder has the screenshots of the work done to complete this project. -
Docker Files
Docker files to run the project.
- Clone the repository.
- Build the container using: docker build -t ai_demo_run_1 .
- Run the docker compose file.
- create a docker network using this command: docker network create my_network
- Run mongodb container on this network: docker run -d --name mongodb --network my_network -p 27017:27017 mongo:5.0
- Run qdrant container on this network: docker run -d --name qdrant --network my_network -p 6333:6333 qdrant/qdrant:v1.3.0
- Run the app container using your clearml credentials: docker run -p 8501:8501 -e CLEARML_API_ACCESS_KEY=YOUR_API_KEY -e CLEARML_API_SECRET_KEY=YOUR_API_SECRET -e CLEARML_API_HOST=YOUR_SERVER_URL ai_demo_run_1
- Then open the gradio app on your browser and run the initialize database command to start the etl pipeline, and initialize a qdrant vector database.
- Finally you can ask questions to the LLM.
This repository covers the full RAG workflow, demonstrating the use of modern tools to efficiently retrieve context-aware responses using LLMs.