Skip to content

anil-reddaboina/ecomm-product-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ E-commerce Product Assistant

A sophisticated LLM-powered e-commerce product assistant built with LangChain and LangGraph that implements a complete RAG (Retrieval-Augmented Generation) pipeline for intelligent product recommendations and customer support.

πŸ—οΈ Architecture Overview

This project combines modern AI techniques with robust data engineering to create an enterprise-grade e-commerce assistant that can:

  • Scrape product data from e-commerce platforms (Flipkart)
  • Store and retrieve product information using vector databases
  • Provide intelligent responses through agentic RAG workflows
  • Evaluate response quality using RAGAS metrics

Core Technologies Stack

  • LLM Framework: LangChain 0.3.27 + LangGraph 0.6.7
  • Vector Database: AstraDB (DataStax)
  • Embeddings: Google Generative AI (text-embedding-004)
  • LLM Providers: Google Gemini 2.0 Flash + Groq DeepSeek
  • Web Scraping: Selenium + BeautifulSoup + Undetected ChromeDriver
  • Web Framework: FastAPI + Streamlit
  • Evaluation: RAGAS metrics

πŸ”„ Data Flow Architecture

1. Data Ingestion Pipeline (ETL/)

  • Web Scraping (data_scrapper.py):

    • Scrapes Flipkart product data using Selenium
    • Extracts: product titles, prices, ratings, reviews
    • Handles anti-bot detection with undetected-chromedriver
    • Saves data to CSV format
  • Data Ingestion (data_ingestion.py):

    • Transforms CSV data into LangChain Documents
    • Creates embeddings using Google's text-embedding-004
    • Stores documents in AstraDB vector store
    • Includes metadata: product_id, title, rating, price

2. Retrieval System (retriever/)

  • Advanced Retrieval (retrieval.py):
    • Uses MMR (Maximal Marginal Relevance) retrieval
    • Implements contextual compression with LLM filtering
    • Configurable top-k results with score thresholds
    • Integrates RAGAS evaluation metrics

3. Agentic Workflow (workflow/)

  • LangGraph-based RAG (agentic_rag_workflow.py):
    • State Management: Tracks conversation state
    • Intelligent Routing: Determines when to use retrieval vs direct response
    • Document Grading: Evaluates relevance of retrieved documents
    • Query Rewriting: Improves unclear queries
    • Response Generation: Uses specialized prompts for product recommendations

4. API Layer (router/)

  • FastAPI Application (main.py):
    • RESTful endpoints for chat functionality
    • CORS middleware for cross-origin requests
    • Template rendering for web interface

🎯 Key Features

Intelligent Product Scraping

  • Automated Flipkart product data extraction
  • Handles dynamic content and anti-bot measures
  • Configurable scraping parameters
  • Streamlit UI for easy scraping operations

Advanced RAG Pipeline

  • Multi-step Agentic Workflow:
    • Assistant β†’ Retriever β†’ Grader β†’ Generator/Rewriter
    • Context-aware decision making
    • Automatic query refinement
    • Document relevance scoring

Vector Database Integration

  • AstraDB Vector Store: Scalable, managed vector database
  • Embedding Management: Google's latest embedding models
  • Metadata Preservation: Rich product information storage
  • Similarity Search: Semantic product matching

Evaluation & Quality Assurance

  • RAGAS Metrics: Context precision and response relevancy
  • Automated Evaluation: Built-in quality scoring
  • Performance Monitoring: Real-time evaluation metrics

Multi-Provider LLM Support

  • Google Gemini 2.0 Flash: Primary LLM for responses
  • Groq DeepSeek: Alternative high-performance option
  • Configurable Switching: Environment-based provider selection
  • API Key Management: Secure credential handling

πŸ“Š Use Cases

  1. Product Recommendation Engine: Intelligent product suggestions based on user queries
  2. Customer Support Bot: Automated responses to product-related questions
  3. Price Comparison: Real-time price and review analysis
  4. Product Research: Comprehensive product information retrieval
  5. E-commerce Analytics: Product performance insights

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Chrome browser (for web scraping)
  • AstraDB account (for vector storage)
  • Google API key (for embeddings)
  • Groq API key (for LLM)

Environment Setup

  1. Clone the Repository

    git clone https://github.com/anil-reddaboina/ecomm-product-assistant.git
    cd ecomm-product-assistant
  2. Create Virtual Environment

    uv venv --python 3.10
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install Dependencies

    uv pip install -r requirements.txt
  4. Environment Configuration Create a .env file with the following variables:

    GOOGLE_API_KEY=your_google_api_key
    GROQ_API_KEY=your_groq_api_key
    ASTRA_DB_API_ENDPOINT=your_astradb_endpoint
    ASTRA_DB_APPLICATION_TOKEN=your_astradb_token
    ASTRA_DB_KEYSPACE=your_keyspace

Running the Application

  1. Web Scraping Interface

    streamlit run scrapper_ui.py
  2. Chat API Server

    uvicorn product_assistant.router.main:app --reload
  3. Data Ingestion Pipeline

    python -m product_assistant.etl.data_ingestion

πŸ”§ Configuration

The project uses YAML-based configuration (config/config.yaml):

astra_db:
  collection_name: "ecommercedata"

embedding_model:
  provider: "google"
  model_name: "models/text-embedding-004"

retriever:
  top_k: 10

llm:
  groq:
    provider: "groq"
    model_name: "deepseek-r1-distill-llama-70b"
    temperature: 0
    max_output_tokens: 2048
  google:
    provider: "google"
    model_name: "gemini-2.0-flash"
    temperature: 0
    max_output_tokens: 2048

πŸ“ Project Structure

ecomm-product-assistant/
β”œβ”€β”€ product_assistant/          # Main package
β”‚   β”œβ”€β”€ etl/                   # Data extraction and ingestion
β”‚   β”œβ”€β”€ retriever/             # Vector retrieval system
β”‚   β”œβ”€β”€ workflow/              # LangGraph agentic workflows
β”‚   β”œβ”€β”€ router/                # FastAPI web interface
β”‚   β”œβ”€β”€ prompt_library/        # LLM prompt templates
β”‚   β”œβ”€β”€ evaluation/            # RAGAS evaluation metrics
β”‚   β”œβ”€β”€ utils/                 # Utility functions
β”‚   β”œβ”€β”€ logger/                # Logging configuration
β”‚   └── exception/             # Custom exceptions
β”œβ”€β”€ config/                    # Configuration files
β”œβ”€β”€ data/                      # Scraped data storage
β”œβ”€β”€ templates/                 # HTML templates
β”œβ”€β”€ static/                    # Static web assets
β”œβ”€β”€ logs/                      # Application logs
β”œβ”€β”€ infra/                     # Infrastructure configs
β”œβ”€β”€ k8/                       # Kubernetes manifests
└── notebook/                 # Jupyter notebooks

πŸ§ͺ Testing & Evaluation

The project includes comprehensive evaluation using RAGAS metrics:

# Example evaluation usage
from product_assistant.evaluation.ragas_eval import evaluate_context_precision, evaluate_response_relevancy

context_score = evaluate_context_precision(query, response, retrieved_contexts)
relevancy_score = evaluate_response_relevancy(query, response, retrieved_contexts)

πŸš€ Deployment

Docker Deployment

docker build -t ecomm-assistant .
docker run -p 8000:8000 ecomm-assistant

Kubernetes Deployment

kubectl apply -f k8/deployment.yaml
kubectl apply -f k8/service.yaml

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the Proprietary License - see the LICENSE file for details.

πŸ‘¨β€πŸ’» Author

Anil Reddaboina

πŸ™ Acknowledgments

  • LangChain team for the amazing framework
  • DataStax for AstraDB vector database
  • Google AI for embedding models
  • Groq for high-performance LLM inference

πŸ“š Additional Resources


Built with ❀️ using modern AI technologies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors