RAG-Based Chatbot with Streamlit

This project implements a Retrieval-Augmented Generation (RAG) chatbot using LangChain and Streamlit. It allows users to upload files (e.g., PDF, TXT, CSV, DOCX) and query their content, or chat generally with an AI assistant without uploading files. The chatbot supports conversational history, robust document loading, and error handling for issues like missing Poppler dependencies for PDFs.

Features

General Chat Mode: Chat with an OpenAI gpt-4o-mini model immediately after entering an API key, without needing to upload files.
RAG Chat Mode: Upload files, index them, and query their content using a retrieval-augmented approach.
Supported File Formats: PDF, TXT, CSV, DOCX, Markdown, HTML, JSON, with fallbacks for unsupported formats.
Robust Error Handling: Handles Poppler-related issues for PDFs and provides clear error messages.
Debugging Tools: Displays document/chunk counts and retrieved context for troubleshooting.
Conversational Memory: Maintains chat history across general and RAG modes for seamless interactions.
Browser-Based UI: Built with Streamlit for an intuitive, web-based interface.

Prerequisites

Python: 3.8 or higher.
OpenAI API Key: Required for LLM and embeddings. Sign up at OpenAI and obtain an API key.
Poppler (for PDFs): Optional but recommended for PDF processing.
- Windows: Download from Poppler for Windows, extract, and add bin to PATH (e.g., C:\poppler\bin).
- macOS: brew install poppler
- Linux: sudo apt update && sudo apt install poppler-utils
- Verify: pdftotext -v
System Requirements: At least 4GB RAM for small datasets; more for large files or complex queries.

Installation

Clone or download this repository.

Install Python dependencies:

pip install streamlit langchain langchain-openai langchain-community langchain-text-splitters langchainhub faiss-cpu openai pypdf docx2txt unstructured
pip install "unstructured[pdf]"

(Optional) Install Poppler for PDF support (see Prerequisites).
Save the script as app.py.

Usage

Run the Streamlit app:
```
streamlit run app.py
```
Open the provided URL in your browser (e.g., http://localhost:8501).
Enter your OpenAI API key in the sidebar.
General Chat:
- Start chatting immediately by typing questions in the chat input (e.g., "What is RAG?").
- The AI responds using gpt-4o-mini with conversational history.
RAG Chat:
- Upload files (e.g., TXT, DOCX, PDF) via the file uploader.
- Click "Index Uploaded Files" to process and index the documents.
- Ask questions about the document content (e.g., "Summarize the document").
- Check the "Retrieved Context" expander to debug retrieved chunks.
Reset: Click "Reset Chat History" to clear the conversation and indexed data.

Example

General Chat:
- Input: "What is AI?"
- Output: "AI is the simulation of human intelligence in machines, enabling tasks like learning and problem-solving."
RAG Chat:
- Upload a test.txt with: "This document discusses Retrieval-Augmented Generation (RAG)..."
- Index the file.
- Input: "What is RAG?"
- Output: "RAG is Retrieval-Augmented Generation, a method combining retrieval and generation for better AI responses."

Troubleshooting

"I don't know" Responses:
- Ensure documents are indexed (check document/chunk counts in the UI).
- Verify query matches document content (use specific phrases or keywords).
- Check "Retrieved Context" expander to see if relevant chunks were retrieved.
- Try larger chunk sizes or more retrieved documents (edit chunk_size or k in the script).
Poppler Errors:
- If you see "Unable to get page count" or similar, install Poppler and ensure it’s in PATH.
- Alternatively, use non-PDF files (e.g., TXT, DOCX) or enable UnstructuredFileLoader’s fast strategy.
- Verify: pdftotext -v
Document Loading Errors:
- Check the error message in the UI.
- Ensure files are not empty or corrupted.
- Try a simple TXT file to test the pipeline.
API Key Issues:
- Ensure the OpenAI API key is valid and has sufficient quota.
- Check for typos in the sidebar input.

Project Structure

app.py: Main script containing the Streamlit app, document loading, RAG pipeline, and chat logic.
Dependencies: Managed via pip (see Installation).

Future Enhancements

Local LLMs: Integrate Hugging Face models for offline use.
Multimodal Support: Add image or audio processing for advanced file types.
Advanced UI: Enhance Streamlit with real-time file previews or query suggestions.
Vector Store Options: Support Chroma or Pinecone for persistent storage.

License

This project is licensed under the MIT License.

Acknowledgments

Built with LangChain for RAG and document processing.
Powered by Streamlit for the web interface.
Uses OpenAI for LLM and embeddings.
Inspired by the need for a flexible, user-friendly RAG chatbot.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docqa		docqa
vectorstore		vectorstore
.DS_Store		.DS_Store
LICENSE		LICENSE
README.markdown		README.markdown
README.md		README.md
chatbot.py		chatbot.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-Based Chatbot with Streamlit

Features

Prerequisites

Installation

Usage

Example

Troubleshooting

Project Structure

Future Enhancements

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

LakshmiSravyaVedantham/RAG-Based-Chatbot-with-Streamlit

Folders and files

Latest commit

History

Repository files navigation

RAG-Based Chatbot with Streamlit

Features

Prerequisites

Installation

Usage

Example

Troubleshooting

Project Structure

Future Enhancements

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages