University of Ljubljana — 2024/2025
This project aims to integrate structured knowledge into Large Language Models (LLMs) using a Retrieval-Augmented Generation (RAG) approach. We aim to construct a knowledge graph dataset containing world leaders and their related facts. Subgraphs are extracted and stored in a vector database. Based on a user question, the most semantically similar facts are retrieved and used to inform the model’s response. We also analyze the use of different LLM frameworks and how prompt formulation influences the generated answers.
src/
– Source code for graph processing and RAG pipelinedata/
– Dataset of world leader QA pairs and knowledge graphsreport/
– Report for the project
Clone the repository and install the dependencies:
git clone https://github.com/UL-FRI-NLP-Course/ul-fri-nlp-course-project-2024-2025-triolingual
cd ul-fri-nlp-course-project-2024-2025-triolingual
pip install -r requirements.txt