Skip to content

UL-FRI-NLP-Course/ul-fri-nlp-course-project-2024-2025-triolingual

 
 

Repository files navigation

Natural language processing course: Integrating Structured Knowledge into Large Language Models

University of Ljubljana — 2024/2025

This project aims to integrate structured knowledge into Large Language Models (LLMs) using a Retrieval-Augmented Generation (RAG) approach. We aim to construct a knowledge graph dataset containing world leaders and their related facts. Subgraphs are extracted and stored in a vector database. Based on a user question, the most semantically similar facts are retrieved and used to inform the model’s response. We also analyze the use of different LLM frameworks and how prompt formulation influences the generated answers.

Repository structure

  • src/ – Source code for graph processing and RAG pipeline
  • data/ – Dataset of world leader QA pairs and knowledge graphs
  • report/ – Report for the project

Getting Started

Clone the repository and install the dependencies:

git clone https://github.com/UL-FRI-NLP-Course/ul-fri-nlp-course-project-2024-2025-triolingual
cd ul-fri-nlp-course-project-2024-2025-triolingual
pip install -r requirements.txt

About

ul-fri-nlp-classroom-ul-fri-nlp-course-project-2024-2025-Project-template created by GitHub Classroom

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.1%
  • Python 1.8%
  • TeX 1.1%