🛒 E-commerce Product Assistant

A sophisticated LLM-powered e-commerce product assistant built with LangChain and LangGraph that implements a complete RAG (Retrieval-Augmented Generation) pipeline for intelligent product recommendations and customer support.

🏗️ Architecture Overview

This project combines modern AI techniques with robust data engineering to create an enterprise-grade e-commerce assistant that can:

Scrape product data from e-commerce platforms (Flipkart)
Store and retrieve product information using vector databases
Provide intelligent responses through agentic RAG workflows
Evaluate response quality using RAGAS metrics

Core Technologies Stack

LLM Framework: LangChain 0.3.27 + LangGraph 0.6.7
Vector Database: AstraDB (DataStax)
Embeddings: Google Generative AI (text-embedding-004)
LLM Providers: Google Gemini 2.0 Flash + Groq DeepSeek
Web Scraping: Selenium + BeautifulSoup + Undetected ChromeDriver
Web Framework: FastAPI + Streamlit
Evaluation: RAGAS metrics

🔄 Data Flow Architecture

1. Data Ingestion Pipeline (`ETL/`)

Web Scraping (data_scrapper.py):
- Scrapes Flipkart product data using Selenium
- Extracts: product titles, prices, ratings, reviews
- Handles anti-bot detection with undetected-chromedriver
- Saves data to CSV format
Data Ingestion (data_ingestion.py):
- Transforms CSV data into LangChain Documents
- Creates embeddings using Google's text-embedding-004
- Stores documents in AstraDB vector store
- Includes metadata: product_id, title, rating, price

2. Retrieval System (`retriever/`)

Advanced Retrieval (retrieval.py):
- Uses MMR (Maximal Marginal Relevance) retrieval
- Implements contextual compression with LLM filtering
- Configurable top-k results with score thresholds
- Integrates RAGAS evaluation metrics

3. Agentic Workflow (`workflow/`)

LangGraph-based RAG (agentic_rag_workflow.py):
- State Management: Tracks conversation state
- Intelligent Routing: Determines when to use retrieval vs direct response
- Document Grading: Evaluates relevance of retrieved documents
- Query Rewriting: Improves unclear queries
- Response Generation: Uses specialized prompts for product recommendations

4. API Layer (`router/`)

FastAPI Application (main.py):
- RESTful endpoints for chat functionality
- CORS middleware for cross-origin requests
- Template rendering for web interface

🎯 Key Features

Intelligent Product Scraping

Automated Flipkart product data extraction
Handles dynamic content and anti-bot measures
Configurable scraping parameters
Streamlit UI for easy scraping operations

Advanced RAG Pipeline

Multi-step Agentic Workflow:
- Assistant → Retriever → Grader → Generator/Rewriter
- Context-aware decision making
- Automatic query refinement
- Document relevance scoring

Vector Database Integration

AstraDB Vector Store: Scalable, managed vector database
Embedding Management: Google's latest embedding models
Metadata Preservation: Rich product information storage
Similarity Search: Semantic product matching

Evaluation & Quality Assurance

RAGAS Metrics: Context precision and response relevancy
Automated Evaluation: Built-in quality scoring
Performance Monitoring: Real-time evaluation metrics

Multi-Provider LLM Support

Google Gemini 2.0 Flash: Primary LLM for responses
Groq DeepSeek: Alternative high-performance option
Configurable Switching: Environment-based provider selection
API Key Management: Secure credential handling

📊 Use Cases

Product Recommendation Engine: Intelligent product suggestions based on user queries
Customer Support Bot: Automated responses to product-related questions
Price Comparison: Real-time price and review analysis
Product Research: Comprehensive product information retrieval
E-commerce Analytics: Product performance insights

🚀 Quick Start

Prerequisites

Python 3.10+
Chrome browser (for web scraping)
AstraDB account (for vector storage)
Google API key (for embeddings)
Groq API key (for LLM)

Environment Setup

Clone the Repository

git clone https://github.com/anil-reddaboina/ecomm-product-assistant.git
cd ecomm-product-assistant

Create Virtual Environment

uv venv --python 3.10
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Dependencies
```
uv pip install -r requirements.txt
```

Environment Configuration Create a .env file with the following variables:

GOOGLE_API_KEY=your_google_api_key
GROQ_API_KEY=your_groq_api_key
ASTRA_DB_API_ENDPOINT=your_astradb_endpoint
ASTRA_DB_APPLICATION_TOKEN=your_astradb_token
ASTRA_DB_KEYSPACE=your_keyspace

Running the Application

Web Scraping Interface
```
streamlit run scrapper_ui.py
```

Chat API Server

uvicorn product_assistant.router.main:app --reload

Data Ingestion Pipeline

python -m product_assistant.etl.data_ingestion

🔧 Configuration

The project uses YAML-based configuration (config/config.yaml):

astra_db:
  collection_name: "ecommercedata"

embedding_model:
  provider: "google"
  model_name: "models/text-embedding-004"

retriever:
  top_k: 10

llm:
  groq:
    provider: "groq"
    model_name: "deepseek-r1-distill-llama-70b"
    temperature: 0
    max_output_tokens: 2048
  google:
    provider: "google"
    model_name: "gemini-2.0-flash"
    temperature: 0
    max_output_tokens: 2048

📁 Project Structure

ecomm-product-assistant/
├── product_assistant/          # Main package
│   ├── etl/                   # Data extraction and ingestion
│   ├── retriever/             # Vector retrieval system
│   ├── workflow/              # LangGraph agentic workflows
│   ├── router/                # FastAPI web interface
│   ├── prompt_library/        # LLM prompt templates
│   ├── evaluation/            # RAGAS evaluation metrics
│   ├── utils/                 # Utility functions
│   ├── logger/                # Logging configuration
│   └── exception/             # Custom exceptions
├── config/                    # Configuration files
├── data/                      # Scraped data storage
├── templates/                 # HTML templates
├── static/                    # Static web assets
├── logs/                      # Application logs
├── infra/                     # Infrastructure configs
├── k8/                       # Kubernetes manifests
└── notebook/                 # Jupyter notebooks

🧪 Testing & Evaluation

The project includes comprehensive evaluation using RAGAS metrics:

# Example evaluation usage
from product_assistant.evaluation.ragas_eval import evaluate_context_precision, evaluate_response_relevancy

context_score = evaluate_context_precision(query, response, retrieved_contexts)
relevancy_score = evaluate_response_relevancy(query, response, retrieved_contexts)

🚀 Deployment

Docker Deployment

docker build -t ecomm-assistant .
docker run -p 8000:8000 ecomm-assistant

Kubernetes Deployment

kubectl apply -f k8/deployment.yaml
kubectl apply -f k8/service.yaml

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the Proprietary License - see the LICENSE file for details.

👨‍💻 Author

Anil Reddaboina

GitHub: @anil-reddaboina

🙏 Acknowledgments

LangChain team for the amazing framework
DataStax for AstraDB vector database
Google AI for embedding models
Groq for high-performance LLM inference

📚 Additional Resources

Built with ❤️ using modern AI technologies

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
config		config
data		data
infra		infra
k8		k8
notebook		notebook
product_assistant		product_assistant
static		static
templates		templates
.env		.env
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
get_lib_versions.py		get_lib_versions.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scrapper_ui.py		scrapper_ui.py

Folders and files

Latest commit

History

Repository files navigation

🛒 E-commerce Product Assistant

🏗️ Architecture Overview

Core Technologies Stack

🔄 Data Flow Architecture

1. Data Ingestion Pipeline (ETL/)

2. Retrieval System (retriever/)

3. Agentic Workflow (workflow/)

4. API Layer (router/)

🎯 Key Features

Intelligent Product Scraping

Advanced RAG Pipeline

Vector Database Integration

Evaluation & Quality Assurance

Multi-Provider LLM Support

📊 Use Cases

🚀 Quick Start

Prerequisites

Environment Setup

Running the Application

🔧 Configuration

📁 Project Structure

🧪 Testing & Evaluation

🚀 Deployment

Docker Deployment

Kubernetes Deployment

🤝 Contributing

📝 License

👨‍💻 Author

🙏 Acknowledgments

📚 Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Ingestion Pipeline (`ETL/`)

2. Retrieval System (`retriever/`)

3. Agentic Workflow (`workflow/`)

4. API Layer (`router/`)

Packages