A Retrieval-Augmented Generation (RAG) system for querying the Constitution of India using LangChain, Pinecone, HuggingFace embeddings, and GitHub Marketplace models.
- 🔍 Semantic search through Constitution text
- 🤖 AI-powered answer generation using GitHub Marketplace models
- 📊 Source attribution with relevance scores
- ⚡ Fast vector-based retrieval using Pinecone
- 🎯 Context-aware responses
- 🔧 HuggingFace embeddings (no OpenAI API required)
- 📄 Support for both PDF and TXT file formats
- Python 3.8 or higher
- Pinecone account and API key
- GitHub token with access to marketplace models
- Clone the repository:
git clone <your-repo-url>
cd COI
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.example .env
# Edit .env with your API keys
-
Add the Constitution document:
- Place the Constitution of India file (PDF or TXT format) in the
documents/
folder - Supported filenames:
documents/COI.pdf
documents/constitution_of_india.pdf
documents/constitution_of_india.txt
- Place the Constitution of India file (PDF or TXT format) in the
-
Run setup validation:
python setup.py
streamlit run src/app.py
- The app will automatically process the Constitution document on first run
- Enter your question in the text input
- Click "Get Answer" to receive AI-generated responses with sources
COI/
├── src/
│ ├── app.py # Main Streamlit application
│ ├── config/
│ │ └── settings.py # Configuration settings
│ ├── data/
│ │ └── constitution_processor.py # Document processing
│ ├── embeddings/
│ │ └── embedding_service.py # HuggingFace embedding generation
│ ├── vector_store/
│ │ └── pinecone_client.py # Pinecone operations
│ ├── llm/
│ │ └── github_model_client.py # GitHub model client
│ ├── retrieval/
│ │ └── rag_pipeline.py # RAG pipeline
│ └── utils/
│ └── helpers.py # Utility functions
├── documents/
│ └── COI.pdf # Constitution document (PDF or TXT supported)
├── requirements.txt
├── .env.example
├── setup.py
└── README.md
Edit src/config/settings.py
to modify:
- Chunk size and overlap for document processing
- Number of retrieved documents
- HuggingFace model parameters
Required environment variables in .env
:
PINECONE_API_KEY
: Your Pinecone API keyPINECONE_ENVIRONMENT
: Your Pinecone environment (default: gcp-starter)GITHUB_TOKEN
: Your GitHub token for marketplace modelsGITHUB_MODEL_ENDPOINT
: Your GitHub marketplace model endpointHUGGINGFACE_MODEL_NAME
: HuggingFace model name (default: sentence-transformers/all-MiniLM-L6-v2)
- Import errors: Make sure all dependencies are installed with
pip install -r requirements.txt
- Environment variables: Ensure all required variables are set in
.env
- Constitution file: Place the constitution file (PDF or TXT) in the
documents/
folder with supported filename - Pinecone connection: Verify your Pinecone API key and environment
MIT License