🧠 Project: Custom Knowledge Q&A Chatbot

💻 Project Overview

How to build a custom Q&A chatbot using OpenAI, LangChain, and Chroma.
The OpenAI API generates answers to questions, LangChain handles prompt construction and retrieval, and ChromaDB serves as a vector database to search relevant content chunks.

🛠️ Requirements: Installation & Setup

Python 3.12.9

brew install pyenv
pyenv install 3.12.9
pyenv local 3.12.9

Python Packages

Installed via requirements.txt:

LangChain: Framework to interface with LLMs and orchestrate prompt chaining.
Chroma: Lightweight vector database for fast retrieval.
OpenAI: Language model and embedding API.
python-dotenv: Loads environment variables.
Streamlit: Interactive UI framework.
Others: tiktoken, colorama, requests, dateutil.

🌐 Virtual Environment Setup

MacOS/Linux:

python3 -m venv env
source env/bin/activate

Windows:

python -m venv env
env\Scripts\activate

📦 Installation

pip install -r requirements.txt

🔑 API Key

Get your key from OpenAI

Set it via environment variable:

export OPENAI_API_KEY='sk-...'

Or store in a .env file:

OPENAI_API_KEY=sk-...

Or duplicate template:

cp .env.example .env

▶️ Run the Application

CLI Mode

python main.py

Web UI (Streamlit)

streamlit run app.py

Alternative (minimalist UI):

streamlit run app-nb.py

Then open http://localhost:8501

⚙️ Technology Stack

Component	Purpose
LangChain	Manages prompt templates, chaining, and LLM interactions.
OpenAI API	Provides natural language understanding and embedding generation.
ChromaDB	Stores document embeddings for similarity search.
Streamlit	Builds a user-friendly, interactive web interface.
Docker	Containers for environment consistency and ease of deployment.
Docker Compose	Orchestrates CLI and UI services simultaneously with shared config.
dotenv	Loads and manages API keys securely in local development.

🧱 Architecture Summary

Document Ingestion
- Raw text (faq_real_estate.txt) is loaded and split into 100-character chunks using CharacterTextSplitter.
Embedding & Vector Storage
- Chunks are embedded using OpenAIEmbeddings and stored in a ChromaDB vector store.
Query Flow
- User questions are embedded, compared to stored chunks for similarity, and the top matches are passed as context.
Prompt Assembly & LLM Output
- LangChain constructs a system + human prompt using the retrieved context and sends it to OpenAI’s chat model.
Response Output
- The chatbot returns a refined, context-aware response through CLI or Streamlit UI.

📁 Source Structure

.
├── app.py              # Streamlit app (model selector)
├── app-nb.py           # Streamlit app (simplified)
├── main.py             # CLI chatbot + core logic
├── Dockerfile
├── docker-compose.yml
├── docs/
│   └── faq_real_estate.txt
├── requirements.txt
└── .env.example

🧠 Core Code Snippets

Document Loading

raw_documents = TextLoader("./docs/faq_real_estate.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=100)
documents = text_splitter.split_documents(raw_documents)

Embedding & Chroma Vector Store

embedding_function = OpenAIEmbeddings()
db = Chroma.from_documents(documents, embedding_function)
retriever = db.as_retriever()

Prompt & Chain with LangChain

template = (
    "You are a knowledgeable assistant. Use the following info:\n{context}"
)
chat_prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(template),
    HumanMessagePromptTemplate.from_template("{question}")
])

Chain Execution

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | chat_prompt
    | ChatOpenAI(...)
    | StrOutputParser()
)
response = chain.invoke("What are the closing costs?")

🐳 Docker Setup

Build Image

docker build -t custom-chatbot-cli .

Run CLI in Container

docker run -it --rm --env-file .env custom-chatbot-cli

🧩 Docker Compose (Preferred)

docker-compose up --build

Rebuild with changes:

docker-compose up --build --force-recreate

🧼 Dockerignore Example

Make builds faster by ignoring:

env/
.idea/
__pycache__/

✅ Use Cases

Real Estate Agents – e.g., Sunrise Realty FAQ bot
Internal Knowledgebase – HR, IT support, SOPs
Legal/Compliance Q&A – Clause-specific search
Education – Course notes and FAQ retrieval

💡 Tips for Customization

✅ Swap out faq_real_estate.txt with any domain-specific .txt content in docs/.
✅ Update prompt template in main.py to reflect your brand tone.
✅ Modify vector store to use alternatives like FAISS or Weaviate for scale.
✅ Replace OpenAIEmbeddings with Hugging Face or Cohere embeddings.
✅ Store chat history with SQLite or connect Streamlit to Supabase for persistence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Project: Custom Knowledge Q&A Chatbot

💻 Project Overview

🛠️ Requirements: Installation & Setup

Python 3.12.9

Python Packages

🌐 Virtual Environment Setup

📦 Installation

🔑 API Key

▶️ Run the Application

CLI Mode

Web UI (Streamlit)

⚙️ Technology Stack

🧱 Architecture Summary

📁 Source Structure

🧠 Core Code Snippets

Document Loading

Embedding & Chroma Vector Store

Prompt & Chain with LangChain

Chain Execution

🐳 Docker Setup

Build Image

Run CLI in Container

🧩 Docker Compose (Preferred)

🧼 Dockerignore Example

✅ Use Cases

💡 Tips for Customization

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app-nb.py		app-nb.py
app.py		app.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🧠 Project: Custom Knowledge Q&A Chatbot

💻 Project Overview

🛠️ Requirements: Installation & Setup

Python 3.12.9

Python Packages

🌐 Virtual Environment Setup

📦 Installation

🔑 API Key

▶️ Run the Application

CLI Mode

Web UI (Streamlit)

⚙️ Technology Stack

🧱 Architecture Summary

📁 Source Structure

🧠 Core Code Snippets

Document Loading

Embedding & Chroma Vector Store

Prompt & Chain with LangChain

Chain Execution

🐳 Docker Setup

Build Image

Run CLI in Container

🧩 Docker Compose (Preferred)

🧼 Dockerignore Example

✅ Use Cases

💡 Tips for Customization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages