Skip to content

Data πŸ€– - A chatbot to support Data Science students, utilising RAG to provide information from a knowledge base including degree program details and module specifics, while following ethical guidelines.

Notifications You must be signed in to change notification settings

NLP-Challenges/Study-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

94 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Chatbot Data πŸ€–

This repository hosts the implementation of our Chatbot Data. It is based on a Gradio Dashboard.

The Documentation the best entry point to fully understanding the project. You can read it here.

Repository Structure

The repository is structured as follows:

└── πŸ“Study-Bot
    └── .env.template [ℹ️ Template for the .env file]
    └── README.md
    └── requirements.txt
    └── πŸ“assets [ℹ️ Assets used in README.md and docs]
    └── πŸ“docs [ℹ️ Contains our documentation markdowns]
    └── πŸ“src
        └── .DS_Store
        └── README.md
        └── πŸ“assets [ℹ️ Assets used in the Gradio Dashboard]
            └── ...
            └── πŸ“chroma [ℹ️ ChromaDB Vector Store with all Chunks]
            └── embedder.pkl [ℹ️ Embedder used for retrieval querying]
        └── dashboard.py [ℹ️ Gradio Dashboard Script]
        └── πŸ“flagged [ℹ️ Results flagged in Gradio]
        └── tools.py

Setup

Prerequisites

  1. Clone the Repository: Clone this repository to your local machine.
  2. Python Environment: Create a new virtual Python environment to ensure isolated package management (we used Python 3.11.6).
  3. Installation: Navigate to the repository's root directory and install the required packages:
    pip install -r requirements.txt

Configuring Environment Variables

  1. Environment File: Copy .env.template to a new file named .env.
  2. API Keys: Add your API keys and tokens in the .env file:
    HF_ACCESS_TOKEN=your-token-here
    OPENAI_API_KEY=your-key-here
    
    Replace your-token-here and your-key-here with your actual Hugging Face and OpenAI API keys.

Running the Chatbot Data πŸ€–

Make sure to add the current embedder as embedder.pkl and the chroma vector db folder chroma produced by the DVC pipeline in the [Question Answering] (https://github.com/NLP-Challenges/llm-qa-path) repository to src/assets/ before running the dashboard.

Then, to run the dashboard, execute the following command:

python src/dashboard.py

This will load all required models onto the GPU and start the Gradio dashboard. You can now access the dashboard at http://localhost:7860/, or the public link provided in the terminal.

Using Data

You can chat with Data via the Gradio Dashboard. In this section, we'll go over how you use it to chat with Data, and to debug and understand Data's behavior (be it for educational purposes or for helping improve it). Data's main language is German, so you should use German πŸ‡©πŸ‡ͺ to talk to it for best results.

How it works in the background: Data first utilizes the classifier to determine the type of the user input. If the input is a question, it is passed to the Question Answering module, which retrieves relevant contex and generates the answer. If the input is a concern, the Concern Path is triggered, which generates an appropriate messasge to continue the conversation. If the input is harm, the chatbot responds with a predefined message that denies response to the user.

Tab 1: Chat

In the chat tab, you can talk to Data πŸ€– and ask it questions about the Data Science study programme, concerns you have, or just have a chat with it.

Dashboard

Automatic Path Selection

If Data πŸ€– responds going the wrong path, e.g. classifies your input as harm instead of concern (yes, it can happen 🦧), you can follow the instructions below to overwrite the Automatic Path Selection and re-generate the response.

Dashboard

Tab 2: Classification

The classification tab allows you to classify a prompt into one of the three categories: question, concern, or harm. This is helpful for understanding how `Data πŸ€–`` classifies your input, which controls the path it takes. It can also be helpful for debugging and understanding why data might be lead down the wrong path.

Dashboard

Tab 3: Retrieval

The retrieval tab allows you query Data's πŸ€– knowledge base with a question and get the most similar document chunks from the vector store. This is helpful for understanding what Data is basing its answers on. For example, you can copy-paste a question from the chat tab and see what Data was basing its answer on.

Dashboard

About

Data πŸ€– - A chatbot to support Data Science students, utilising RAG to provide information from a knowledge base including degree program details and module specifics, while following ethical guidelines.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages