Skip to content

gstarnet/chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Project: Custom Knowledge Q&A Chatbot

πŸ’» Project Overview

How to build a custom Q&A chatbot using OpenAI, LangChain, and Chroma.
The OpenAI API generates answers to questions, LangChain handles prompt construction and retrieval, and ChromaDB serves as a vector database to search relevant content chunks.


πŸ› οΈ Requirements: Installation & Setup

Python 3.12.9

brew install pyenv
pyenv install 3.12.9
pyenv local 3.12.9

Python Packages

Installed via requirements.txt:

  • LangChain: Framework to interface with LLMs and orchestrate prompt chaining.
  • Chroma: Lightweight vector database for fast retrieval.
  • OpenAI: Language model and embedding API.
  • python-dotenv: Loads environment variables.
  • Streamlit: Interactive UI framework.
  • Others: tiktoken, colorama, requests, dateutil.

🌐 Virtual Environment Setup

MacOS/Linux:

python3 -m venv env
source env/bin/activate

Windows:

python -m venv env
env\Scripts\activate

πŸ“¦ Installation

pip install -r requirements.txt

πŸ”‘ API Key

Get your key from OpenAI

Set it via environment variable:

export OPENAI_API_KEY='sk-...'

Or store in a .env file:

OPENAI_API_KEY=sk-...

Or duplicate template:

cp .env.example .env

▢️ Run the Application

CLI Mode

python main.py

Web UI (Streamlit)

streamlit run app.py

Alternative (minimalist UI):

streamlit run app-nb.py

Then open http://localhost:8501


βš™οΈ Technology Stack

Component Purpose
LangChain Manages prompt templates, chaining, and LLM interactions.
OpenAI API Provides natural language understanding and embedding generation.
ChromaDB Stores document embeddings for similarity search.
Streamlit Builds a user-friendly, interactive web interface.
Docker Containers for environment consistency and ease of deployment.
Docker Compose Orchestrates CLI and UI services simultaneously with shared config.
dotenv Loads and manages API keys securely in local development.

🧱 Architecture Summary

  1. Document Ingestion

    • Raw text (faq_real_estate.txt) is loaded and split into 100-character chunks using CharacterTextSplitter.
  2. Embedding & Vector Storage

    • Chunks are embedded using OpenAIEmbeddings and stored in a ChromaDB vector store.
  3. Query Flow

    • User questions are embedded, compared to stored chunks for similarity, and the top matches are passed as context.
  4. Prompt Assembly & LLM Output

    • LangChain constructs a system + human prompt using the retrieved context and sends it to OpenAI’s chat model.
  5. Response Output

    • The chatbot returns a refined, context-aware response through CLI or Streamlit UI.

πŸ“ Source Structure

.
β”œβ”€β”€ app.py              # Streamlit app (model selector)
β”œβ”€β”€ app-nb.py           # Streamlit app (simplified)
β”œβ”€β”€ main.py             # CLI chatbot + core logic
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ docs/
β”‚   └── faq_real_estate.txt
β”œβ”€β”€ requirements.txt
└── .env.example

🧠 Core Code Snippets

Document Loading

raw_documents = TextLoader("./docs/faq_real_estate.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=100)
documents = text_splitter.split_documents(raw_documents)

Embedding & Chroma Vector Store

embedding_function = OpenAIEmbeddings()
db = Chroma.from_documents(documents, embedding_function)
retriever = db.as_retriever()

Prompt & Chain with LangChain

template = (
    "You are a knowledgeable assistant. Use the following info:\n{context}"
)
chat_prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(template),
    HumanMessagePromptTemplate.from_template("{question}")
])

Chain Execution

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | chat_prompt
    | ChatOpenAI(...)
    | StrOutputParser()
)
response = chain.invoke("What are the closing costs?")

🐳 Docker Setup

Build Image

docker build -t custom-chatbot-cli .

Run CLI in Container

docker run -it --rm --env-file .env custom-chatbot-cli

🧩 Docker Compose (Preferred)

docker-compose up --build

Rebuild with changes:

docker-compose up --build --force-recreate

🧼 Dockerignore Example

Make builds faster by ignoring:

env/
.idea/
__pycache__/

βœ… Use Cases

  • Real Estate Agents – e.g., Sunrise Realty FAQ bot
  • Internal Knowledgebase – HR, IT support, SOPs
  • Legal/Compliance Q&A – Clause-specific search
  • Education – Course notes and FAQ retrieval

πŸ’‘ Tips for Customization

  • βœ… Swap out faq_real_estate.txt with any domain-specific .txt content in docs/.
  • βœ… Update prompt template in main.py to reflect your brand tone.
  • βœ… Modify vector store to use alternatives like FAISS or Weaviate for scale.
  • βœ… Replace OpenAIEmbeddings with Hugging Face or Cohere embeddings.
  • βœ… Store chat history with SQLite or connect Streamlit to Supabase for persistence.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors