Natural language processing course: Conversational Agent with Retrieval-Augmented Generation for research assistance
This is a project implementing RAG to improve a chatbot's responses to queries about research papers, specifically targeting re-identification. For more details, check the report in report.pdf
file.
- Marko Medved
- Matej Vrečar
- Sebastijan Trojer
- create a python environment (for example conda):
conda create --name myenv python=3.10
- activate the environment
conda activate myenv
- install dependencies:
pip install -r requirements.txt
- then you can run the scripts and notebooks in the code folder
- The main script is
optirag_improved.py
, where you can use the chatbot improved with RAG interactively in the terminal - If you want to use the local cvf database, you need to first run the web scraper:
create_cvf_database.py
-
code/
create_cvf_database.py
- create a local database by scraping cvf open accesscreate_db_arxiv.py
- create a local database of papers scraped from arxiv (not used anymore since we can directly scrape when the user has a query)evaluation.ipynb
- code for calculating and plotting evaluation metricsfinding_similar_papers_directly_with_arxiv_module.ipynb
- experimental notebookfinding_similar_papers_using_local_db.ipynb
- experimental notebookoptirag.py
- baseline implementation scriptoptirag_improved.py
- final implementation scriptquery_test.py
- short script to test a queryutils.py
- utility functions for easier reuse
-
results/
papers_to_check.txt
- a list of paper titles and queries we checkedpreliminary_tests.docx
- opinion based test for the baseline modelqueries
- all queries that were used to test paper retrievalresults.xlsx
- results obtained from retrieval testing
-
report.pdf - the project report