A sophisticated LLM-powered e-commerce product assistant built with LangChain and LangGraph that implements a complete RAG (Retrieval-Augmented Generation) pipeline for intelligent product recommendations and customer support.
This project combines modern AI techniques with robust data engineering to create an enterprise-grade e-commerce assistant that can:
- Scrape product data from e-commerce platforms (Flipkart)
- Store and retrieve product information using vector databases
- Provide intelligent responses through agentic RAG workflows
- Evaluate response quality using RAGAS metrics
- LLM Framework: LangChain 0.3.27 + LangGraph 0.6.7
- Vector Database: AstraDB (DataStax)
- Embeddings: Google Generative AI (text-embedding-004)
- LLM Providers: Google Gemini 2.0 Flash + Groq DeepSeek
- Web Scraping: Selenium + BeautifulSoup + Undetected ChromeDriver
- Web Framework: FastAPI + Streamlit
- Evaluation: RAGAS metrics
-
Web Scraping (
data_scrapper.py):- Scrapes Flipkart product data using Selenium
- Extracts: product titles, prices, ratings, reviews
- Handles anti-bot detection with undetected-chromedriver
- Saves data to CSV format
-
Data Ingestion (
data_ingestion.py):- Transforms CSV data into LangChain Documents
- Creates embeddings using Google's text-embedding-004
- Stores documents in AstraDB vector store
- Includes metadata: product_id, title, rating, price
- Advanced Retrieval (
retrieval.py):- Uses MMR (Maximal Marginal Relevance) retrieval
- Implements contextual compression with LLM filtering
- Configurable top-k results with score thresholds
- Integrates RAGAS evaluation metrics
- LangGraph-based RAG (
agentic_rag_workflow.py):- State Management: Tracks conversation state
- Intelligent Routing: Determines when to use retrieval vs direct response
- Document Grading: Evaluates relevance of retrieved documents
- Query Rewriting: Improves unclear queries
- Response Generation: Uses specialized prompts for product recommendations
- FastAPI Application (
main.py):- RESTful endpoints for chat functionality
- CORS middleware for cross-origin requests
- Template rendering for web interface
- Automated Flipkart product data extraction
- Handles dynamic content and anti-bot measures
- Configurable scraping parameters
- Streamlit UI for easy scraping operations
- Multi-step Agentic Workflow:
- Assistant β Retriever β Grader β Generator/Rewriter
- Context-aware decision making
- Automatic query refinement
- Document relevance scoring
- AstraDB Vector Store: Scalable, managed vector database
- Embedding Management: Google's latest embedding models
- Metadata Preservation: Rich product information storage
- Similarity Search: Semantic product matching
- RAGAS Metrics: Context precision and response relevancy
- Automated Evaluation: Built-in quality scoring
- Performance Monitoring: Real-time evaluation metrics
- Google Gemini 2.0 Flash: Primary LLM for responses
- Groq DeepSeek: Alternative high-performance option
- Configurable Switching: Environment-based provider selection
- API Key Management: Secure credential handling
- Product Recommendation Engine: Intelligent product suggestions based on user queries
- Customer Support Bot: Automated responses to product-related questions
- Price Comparison: Real-time price and review analysis
- Product Research: Comprehensive product information retrieval
- E-commerce Analytics: Product performance insights
- Python 3.10+
- Chrome browser (for web scraping)
- AstraDB account (for vector storage)
- Google API key (for embeddings)
- Groq API key (for LLM)
-
Clone the Repository
git clone https://github.com/anil-reddaboina/ecomm-product-assistant.git cd ecomm-product-assistant -
Create Virtual Environment
uv venv --python 3.10 source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install Dependencies
uv pip install -r requirements.txt
-
Environment Configuration Create a
.envfile with the following variables:GOOGLE_API_KEY=your_google_api_key GROQ_API_KEY=your_groq_api_key ASTRA_DB_API_ENDPOINT=your_astradb_endpoint ASTRA_DB_APPLICATION_TOKEN=your_astradb_token ASTRA_DB_KEYSPACE=your_keyspace
-
Web Scraping Interface
streamlit run scrapper_ui.py
-
Chat API Server
uvicorn product_assistant.router.main:app --reload
-
Data Ingestion Pipeline
python -m product_assistant.etl.data_ingestion
The project uses YAML-based configuration (config/config.yaml):
astra_db:
collection_name: "ecommercedata"
embedding_model:
provider: "google"
model_name: "models/text-embedding-004"
retriever:
top_k: 10
llm:
groq:
provider: "groq"
model_name: "deepseek-r1-distill-llama-70b"
temperature: 0
max_output_tokens: 2048
google:
provider: "google"
model_name: "gemini-2.0-flash"
temperature: 0
max_output_tokens: 2048ecomm-product-assistant/
βββ product_assistant/ # Main package
β βββ etl/ # Data extraction and ingestion
β βββ retriever/ # Vector retrieval system
β βββ workflow/ # LangGraph agentic workflows
β βββ router/ # FastAPI web interface
β βββ prompt_library/ # LLM prompt templates
β βββ evaluation/ # RAGAS evaluation metrics
β βββ utils/ # Utility functions
β βββ logger/ # Logging configuration
β βββ exception/ # Custom exceptions
βββ config/ # Configuration files
βββ data/ # Scraped data storage
βββ templates/ # HTML templates
βββ static/ # Static web assets
βββ logs/ # Application logs
βββ infra/ # Infrastructure configs
βββ k8/ # Kubernetes manifests
βββ notebook/ # Jupyter notebooks
The project includes comprehensive evaluation using RAGAS metrics:
# Example evaluation usage
from product_assistant.evaluation.ragas_eval import evaluate_context_precision, evaluate_response_relevancy
context_score = evaluate_context_precision(query, response, retrieved_contexts)
relevancy_score = evaluate_response_relevancy(query, response, retrieved_contexts)docker build -t ecomm-assistant .
docker run -p 8000:8000 ecomm-assistantkubectl apply -f k8/deployment.yaml
kubectl apply -f k8/service.yaml- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Proprietary License - see the LICENSE file for details.
Anil Reddaboina
- GitHub: @anil-reddaboina
- LangChain team for the amazing framework
- DataStax for AstraDB vector database
- Google AI for embedding models
- Groq for high-performance LLM inference
Built with β€οΈ using modern AI technologies