CMPS 6730 RAG Based Question and Answering system

Goals:

Our goal in this project was to create a RAG Bases Question and Answering system and exeperiment with various chunking methods in order to get the best results

Methods Used:

Our goal is to develop a chatbot that leverages the Retrieval-Augmented Generation (RAG) architecture to provide accurate and contextually relevant responses with minimal retraining. The system comprises of two main modules: a retrieval module that fetches relevant information from the knowledge base and the large language model that generates the final response based on the user query and the retrieved context (see Figure 1 for a high-level diagram)

The system first takes a user query and uses the retrieval module to identify the top five most relevant documents from the knowledge base. These retrieved documents, along with the original user query, are then fed into the large language model to generate the response.

Retrieval Module:

The retrieval module or component of our chatbot operates as follows:

• Knowledge Base: Our knowledge base is the MS MARCO[2] dataset, an open-source, human-generated machine reading comprehension dataset curated for question answering. • Embedding Generation: To represent both user queries and documents within the MS-MARCOdataset as dense vectors, we utilize the Sentence Transformer all-MiniLM-L6 v2[5] model. This model is known for generating effective sentence embeddings. • Indexing and Similarity Search: For efficient storage and retrieval of these vector embeddings, we employ the FAISS. Specifically, we use the IndexFlatL2[3] index, which performs a flat (brute-force) k-nearest neighbors search based on the Euclidean distance (L2 norm) between the query vector and the document vectors. • Retrieval Process: When a user inputs a query (Xq), it is first encoded into an embedding Q(Xq) using the Sentence Transformer model. We then calculate the Euclidean distance between this query embedding and all document embeddings within our FAISS index. The top five documents with the smallest Euclidean distances (i.e., the most similar) are retrieved as context.

Large Language Model: The generative component of our chatbot is powered by the microsoft/phi-3-mini-instruct[1] model. This instruction-tuned LLM is designed for high-quality reasoning and was trained on a publicly available dataset. The retrieved top five documents, along with the original user query, are provided as context to this parameterized LLM (Pθ) to generate a relevant and informative response. The LLM processes both the non-parametric memory (retrieved documents) and its internal parametric knowledge to produce the final output.

In this project we had experimented with two different chunking stratagies. Chunking is the process of dividing a pargraph into smaller pieces and converting these smaller chunks into embeddings for storage into our vector database. These are the following chunking stratagies we have implemented:

Recursive Based Chunking
Token Based Chunking

Experimentation:

In hopes to find the best performing system we have experimented with both these methods in hopes to find the more effective choice for a system. The following experiments were conducted on 50 randomized queries:

Token Based Chunking with Chunk size=200 and no overlapping
Recursive Based Chunking with Chunk size=200 and no overlapping
Recursive Based Chunking with Chunk size=200 and 20% overlapping
Recursive Based Chunking with Chunk size=200 and 50% overlapping

Conclusions:

Based on our observations, the choice of document chunking strategy significantly impacts the performance of the RAG-based question answering system, as measured by the ROUGE-L score. While the average performance is relatively close, the introduction of overlap within the recursive chunking method demonstrates a promising avenue for improvement. Specifically, the recursive strategy with a 50% overlap yielded the highest mean and median ROUGE-L scores across our evaluation set. This suggests that providing more continuous contextual information to the language model can lead to better alignment with the reference answers.

Notes:

The code does not currently include a Hugging Face API key. You can generate this key from your personal Hugging Face account. Please add your API key in the designated cell within the notebook.
During module installation, you may be prompted to restart the session. If this happens, restart the session as instructed, but do not re-run the same command or any commands that precede it.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Results		Results
docs		docs
nlp		nlp
notebooks		notebooks
report		report
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
GettingStarted.md		GettingStarted.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
credentials.json		credentials.json
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
web.png		web.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMPS 6730 RAG Based Question and Answering system

Goals:

Methods Used:

Experimentation:

Conclusions:

Notes:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CMPS 6730 RAG Based Question and Answering system

Goals:

Methods Used:

Experimentation:

Conclusions:

Notes:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages