Natural language processing course 2024/2025: Conversational Agent with Retrieval-Augmented Generation
The project will involve natural language processing (NLP), web scraping, and retrieval-augmented generation (RAG) techniques to provide quality answers for news in real time.
You can set up the required environment using either pip
with requirements.txt
or conda
with environment.yml
. Choose one method.
-
(Optional but Recommended) Create and activate a virtual environment:
python -m venv venv # On Windows: # venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install the requirements:
pip install -r requirements.txt
Ensure you have Conda or Miniconda installed.
-
Create the environment from the
environment.yml
file:conda env create -f environment.yml
(This might take a few minutes)
-
Activate the newly created environment (the environment name is usually defined inside the
.yml
file, check it if unsure):conda activate <your_environment_name>
You should now have the necessary dependencies installed to run the project.
After setting up your environment using either pip
or conda
and installing the requirements, you need to download the Slovene language model for spaCy. Run the following command in your terminal (ensure your virtual environment is activated if you are using one):
python -m spacy download sl_core_news_sm
This model is necessary for natural language processing tasks in Slovene within the project.
-
Environment Variables: Create a
.env
file inside thesrc/
directory (src/.env
). This file is used to store sensitive information and configuration settings. -
Required Variables: Add the following variables to your
src/.env
file:# Necessary for connecting to the PostgreSQL database (adjust if needed - see src/docker-compose-yaml) DATABASE_URL=postgresql://test:test@localhost:5432/test # Optional: Add your OpenAI API key to use the OpenAI LLM provider # If omitted, and GEMINI_API_KEY is also omitted, the application will use the local (mocked) provider. # OPENAI_API_KEY=sk-... # Optional: Add your Gemini API key to use the Gemini LLM provider # If omitted, and OPENAI_API_KEY is also omitted, the application will use the local (mocked) provider. # GEMINI_API_KEY=AIza...
-
Navigate to the Source Directory: Open your terminal and change to the
src/
directory:cd src
-
Start Services with Docker Compose: Ensure Docker is running. Then, start the database service defined in
docker-compose.yml
:docker compose up -d # The -d flag runs it in detached mode (background)
Wait a few moments for the database container to initialize.
-
Run the Main Script: Execute the main Python script to start the chatbot:
python main.py
Optional flags:
--use-chat-history
: Enable conversation history.
Note: For best results, capitalize all names and write questions grammatically correct.
-
Stopping Services: When you are finished, stop the Docker Compose services:
docker compose down
Blaž Špacapan | Matevž Jecl | Tilen Ožbot |
---|---|---|