PuppetGPT is a Retrieval-Augmented Generation (RAG) powered document assistant that allows users to upload a PDF and ask questions about its contents.
Instead of relying only on the language modelโs internal knowledge, PuppetGPT retrieves relevant sections from the document using semantic search and feeds them to the LLM to generate accurate, grounded responses.
Built with LangChain, Groq LLaMA models, Chroma vector database, and Streamlit, this project demonstrates how modern AI applications combine retrieval systems with large language models to build reliable document assistants.
You can try PuppetGPT directly in your browser:
https://puppetgpt.streamlit.app/
Upload a PDF and start asking questions about the document.
Most LLMs generate responses freely based on their training data, which can sometimes result in hallucinations or incorrect answers.
PuppetGPT takes a different approach.
Instead of letting the model respond freely, the system guides the LLM using retrieved document context. The retrieved chunks act like strings controlling the modelโs responses, ensuring answers remain grounded in the document.
In simple terms:
Document Context (strings)
โ
Retriever pulls relevant chunks
โ
LLM generates grounded response
โ
Accurate Answer
Just like a puppet moves according to the strings controlling it, the language model generates answers based on the document context provided.
This design significantly improves accuracy, reliability, and transparency.
๐ Upload Any PDF - Upload any document and instantly start querying it.
๐ง Retrieval-Augmented Generation (RAG) - Responses are generated using retrieved document context.
โก Fast LLM Responses - Powered by Groqโs ultra-fast LLaMA models.
๐ Semantic Document Search - Embeddings + vector similarity search retrieve relevant document chunks.
๐ Source Transparency - Shows which document sections were used to generate the answer.
๐ฅ Interactive Streamlit Interface - Clean and simple UI for chatting with documents.
User Uploads PDF
โ
Document Loader
โ
Text Chunking
โ
Embedding Generation
โ
Chroma Vector Database
โ
Semantic Retrieval (Top-K)
โ
Prompt Construction
โ
Groq LLaMA Model
โ
Answer + Sources
This architecture improves accuracy, contextual relevance, and trustworthiness compared to traditional LLM responses.
| Component | Technology |
|---|---|
| Frontend | Streamlit |
| Framework | LangChain |
| LLM | Groq LLaMA |
| Embeddings | HuggingFace Sentence Transformers |
| Vector Database | ChromaDB |
| Language | Python |
Clone the repository:
git clone https://github.com/aawhan0/PuppetGPT.git
cd PuppetGPTCreate a virtual environment:
python -m venv venvActivate the environment.
Windows:
venv\Scripts\activateMac/Linux:
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtCreate a .env file in the project root and add your Groq API key:
GROQ_API_KEY=your_api_key_here
You can obtain a key from:
streamlit run app.pyThen open:
http://localhost:8501
Upload a PDF and start chatting with your document.
PuppetGPT
โ
โโโ app.py # Streamlit interface
โโโ ingest.py # Document ingestion pipeline
โโโ rag_pipeline.py # Retrieval + LLM logic
โโโ requirements.txt
โโโ README.md
โ
โโโ uploaded_docs/ # Uploaded PDFs
โโโ vectorstore/ # Chroma vector database
This project demonstrates important AI engineering concepts:
- Retrieval-Augmented Generation (RAG)
- Semantic search using embeddings
- Vector databases
- Prompt grounding
- LLM integration with external knowledge
- Document-based AI assistants
- Chat with research papers
- Extract insights from reports
- Query technical documentation
- Summarize books or PDFs
- Build internal knowledge assistants
While building PuppetGPT, several practical engineering issues arose related to environment setup, dependency management, RAG architecture, and LLM behavior. The key challenges and solutions are summarized below.
Issue
Streamlit Cloud defaulted to Python 3.14, which caused compatibility issues with AI libraries such as pydantic and LangChain.
Fix
Deployment environment was changed to Python 3.11, which is currently the most stable version for LangChain-based applications.
Issue
LangChain recently split into multiple packages, which caused import errors in the original implementation.
Fix
Imports were updated to the new modular architecture:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import ChromaIssue
Version conflicts between LangChain-related packages caused installation failures.
Fix
Rebuilt the Python virtual environment and reinstalled dependencies to ensure clean resolution.
Issue
The original model llama3-8b-8192 was deprecated.
Fix
Updated to:
model_name="llama-3.1-8b-instant"Issue
LangChain memory conflicted with RetrievalQA because the chain returns multiple outputs.
Fix
Chat history was managed using Streamlit session state:
st.session_state.chat_historyIssue
Rebuilding the Chroma vector database on every query caused runtime errors.
Fix
Vectorstore creation was cached so embeddings are generated once per document upload.
Issue
Some logic was placed after a return statement, making it unreachable.
Fix
Refactored the architecture into two functions:
get_vectorstore()
get_qa_chain()Issue
The model occasionally generated answers not present in the document.
Fix
Added a strict prompt rule requiring the model to respond:
"I cannot find this information in the document."when the answer is not in the retrieved context.
Issue
The model sometimes produced compressed bullet lists.
Fix
Added a formatting step before displaying answers:
answer = answer.replace("โข ", "\nโข ").strip()- Multi-document retrieval
- Hybrid search (BM25 + embeddings)
- Conversation memory
- Streaming responses
- Evaluation metrics dashboard
This project is licensed under the MIT License.
- LangChain
- Groq
- HuggingFace
- Chroma
- Streamlit
โญ If you found this project useful, consider giving it a star!